Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a Loop of All Possible Combinations of Variables

    Hi, I am having trouble with writing a loop. My dataset involves 40 states, and approximately 700 observations in each state. I need to generate a loop that does the following:
    • Identify pairs of states, one at a time.
    • Calculate the necessary mins for those pairs of states, likely using rowmin depending on how the first problem is solved.
    • Generate the various alignment indices (this is equal to the sum of minima across all of the observations, which would be using the total function)
    • Drop the mins
    • Move on to the next pair and do it again.
    How would one write a loop that does all of the above? I would be fine with a loop that even compares states to themselves, as those observations could just be dropped with another line in the loop.

    Thanks in advance for your help

  • #2
    Here's a demonstration of some technique that might help you, based on some guesses about what you mean and what your situation is. If this is not relevant, I'd suggest you repost with answers to my questions below.

    Code:
    clear
    // simulate example data
    input str30 state
    "Colorado"
    "New Mexico"
    "Kansas"
    "Nebraska"
    end
    expand 700
    set seed 78676
    gen y = runiform()
    //
    //  Do something involving minima for each pair of states
    encode state, gen(statenum) // state as numeric is easier
    summ statenum // confirm number of diff. states)
    local max = r(max)
    forval i = 1/ `=`max'-1' {
       forval j = `=`i' + 1'/`max' {
          quiet summ y if inlist(statenum, `i', `j')
          di _newline "min of y for States `i' and `j' = " r(min)
       }
    }
    Questions that might avoid the guessing underlying the preceding:

    0) What is an example of your data structure? (Please look at the StataList FAQ and search for -dataex-, and follow the instructions.)
    1) What you mean by "identify"? Do you mean "create each pair?"
    2) How are your states stored? ( Item 0) should help here)
    3) What do you mean by a minimum for a *pair* of states?
    4) What is a "necessary" minimum?
    5) Minimum of *what*?
    6) Do you mean combinations of *values* of a variable, as I assumed, or combinations of names of variables?
    Last edited by Mike Lacy; 11 Mar 2020, 11:14.

    Comment


    • #3
      Hi Mike,

      I have the data saved in Excel in both wide and long form; which I use depends on the advice I receive.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str5 ID str14 State str17(cc1 cc2 cc207)
      "US.1" "California" "0"                 "0" ".0095238095238095"
      "US.2" "Texas"      ".0065789473684211" "0" "0"                
      "US.3" "Iowa"       "0"                 "0" "0"                
      "US.4" "Michigan"   "0"                 "0" "0"                
      "US.5" "Minnesota"  "0"                 "0" "0"                
      end

      Correct, I want to create all of the pairs of states.

      The states are stored with ID numbers and another categorical variable with their state names.

      I'm measuring the presence or absence of material in each state. Each ccvalue represents the percent to which that topic is represented in materials made by each state. So if there is a non-zero score for both states, then they both cover said material. If the min on cc _n is zero, that means that either one or both states did not cover it.

      To give a simplified example, Minnesota has 100 pieces of material and covers CC_n once, its score would be .01. If Kansas also has 100 pieces of material and covers it once, their alignment on that would be .01. If Florida has 100 pieces of material and doesn't cover it at all, then its alignment would be 0.

      I want to create all pairs, where the value in that column is equal to the rowmin of the two states that make up those pairs. After that is done, I will sum all of the values in each individual column of twopair columns for a total amount where the two states overlap.

      I hope that helps

      Comment


      • #4
        This is the other way that the data could be organized


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str17(US1 US4 US9 US16 US28)
        "California"        "Michigan"          "Colorado" "Massachusetts" "Nevada"
        "0"                 "0"                 "0"        "0"             "0"     
        "0"                 "0"                 "0"        "0"             "0"     
        ".0095238095238095" "0"                 "0"        "0"             "0"     
        "0"                 ".0049261083743842" "0"        "0"             "0"     
        end
        I am debating whether they should be in wide form, or long form. In long form, I could imagine it would just be generating a loop of concats, one in ascending order, another in descending order? I don't think I have run concat on a loop before, though it would be applicable here maybe?

        Comment


        • #5
          At the moment, I am using the long form of the data, and I am thinking of running a loop that looks like this

          foreach v of varlist US* {
          local i = _n
          local var : var `v'
          local var = `v'`i'
          egen `v'`v' = rowmin(`v' `v')
          }

          Without the first local i, it generated US1US1-US39US39... Which means that the loop did not compare it to all of the others, only to itself.
          What I am trying to do is get this loop to generate a variable called US1US1 all the way to US1US39, and then repeat, ie. US2US1 through US2US39.

          Thanks in advance to whoever can help me out with this

          Comment


          • #6
            I don't understand your explanation well enough to offer any very direct help. (If I were you, I'd ask some colleague to help you better describe what you want.) Until you are able to do that, my only suggestion would be to use the long form and look at either -cross- or -joinby- as a way to create a data set of pairs. To reduce the file size, I'd get rid of your unnecessary string variables like state name, as I showed you above with -encode-.

            More importantly, it appears that your Stata data set contains string values for the numeric variables cc*, which is not going to work for you. That likely means you have a bad Excel file, with blanks for missing data that force Stata to treat columns with numeric data as strings.. I now revise my advice to be that you first need to get a decent data file to work with in Stata, with numeric variables stored as numeric. Then, you need to get help in explaining what you want.

            Comment


            • #7
              Hi Mike, thanks for the advice. I appreciate your support
              1) I destringed the values
              2) I made an update post, and think I explained it better there but I'll do it here and hopefully that makes more sense
              3) I'm using the data in long form now

              Here's what my data looks like now


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input double(US1 US7 US14 US24 US33)
                              0 0 0                 0 0
                              0 0 0                 0 0
              .0095238095238095 0 0                 0 0
                              0 0 0 .0076923076923077 0
                              0 0 0                 0 0
              end
              In short, each observation for an individual state is the percent of coverage that material is given. I want to compare each state, at every observation, to each other state, including itself. So, the main thing I want the loop to accomplish is to generate all of the permutations (or combinations if that's easier, but permutations are fine considering they could be found and dropped in post) possible of US*, with each as its own variable name. For example, the first loop would look like
              egen US1US1 = rowmin (US1 US1)
              egen US1US2 = rowmin (US1 US2)

              until egen US1US39 (the last state where data is available)

              And then I want the cycle to repeat.

              Comment


              • #8
                Question Im new to the forum, is it ok if I post a question here, for a post I already created? My problem is similar but I think the solution may be different. Dont want to hijack the conversation.

                Comment

                Working...
                X