Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combinations of binary variables

    Hello,

    I have a dataset with 10 variables and 5 binary variables (A, B, C,D,E). I'm trying to get all possible combinations of 2,3, 4 and 5 of all but I'm not sure how to go about this using a loop/permin,combin in Stata. Moreover, I want Stata to count each combination iteration and tell me the sum of times each variable (intersection in combinations/permutations) is is 1.

    The data is such that the number of A=1 (say 5) instances adds upto A=1 in all other iterations ( A=1 + B=1 +C=1 (2), A=1+C=1 (3). For example, I've been trying:

    tab A if A !=0 & B !=0 & C !=0


    I'm having difficulties getting the combinations with a loop (minimum code) and also tallying up all iterations to 5.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(A B C D E)
    1 0 0 0 1
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    1 0 0 0 1
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    1 0 0 0 1
    1 0 0 0 0
    0 0 0 0 0
    1 0 0 0 0
    1 0 0 0 0
    1 0 0 0 0
    1 0 0 0 0
    0 0 0 0 0
    1 0 0 0 0
    1 0 0 0 0
    1 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    1 0 0 0 0
    1 0 0 0 1
    1 0 0 0 1
    0 0 0 0 0
    1 0 0 0 0
    0 0 0 0 0
    1 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    1 0 0 0 0
    0 0 0 0 0
    1 0 0 0 0
    1 0 0 0 0
    1 0 0 0 0
    0 0 0 0 0
    1 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    1 0 0 0 1
    1 0 0 0 1
    1 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 1
    0 0 0 0 0
    0 0 0 0 0
    1 0 0 0 0
    0 0 0 0 0
    1 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    1 0 0 0 0
    1 0 0 0 0
    1 0 0 0 0
    1 0 0 0 0



  • #2
    I am not at all clear about what you want as your output/new_variable; here is a guess; if the guess is not correct, please clarify
    Code:
    . egen group=group(A-E), label
    r; t=0.03 9:50:57
    
    . ta group
    
    group(A B C |
           D E) |      Freq.     Percent        Cum.
    ------------+-----------------------------------
      0 0 0 0 0 |         69       69.00       69.00
      0 0 0 0 1 |          1        1.00       70.00
      1 0 0 0 0 |         23       23.00       93.00
      1 0 0 0 1 |          7        7.00      100.00
    ------------+-----------------------------------
          Total |        100      100.00

    Comment


    • #3
      Hello Rich,

      I had looked group command up but this is not what I want to do. I want the following(the combinations with all=1):
      Frequency where all A=B=C=1.
      AB
      AC
      AD
      AE
      ABC
      ABD
      ABCD
      ABDE
      and so forth without any combinations repeated.

      What I'm stuck with at the moment is the process to form a loop to make these combinations and how to get stata to tell me all at once where these are all 1 (the frequency) together.

      I hope this makes it clearer.


      Comment


      • #4
        Does this help?

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte(A B C D E)
        1 0 0 0 1
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        1 0 0 0 1
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        1 0 0 0 1
        1 0 0 0 0
        0 0 0 0 0
        1 0 0 0 0
        1 0 0 0 0
        1 0 0 0 0
        1 0 0 0 0
        0 0 0 0 0
        1 0 0 0 0
        1 0 0 0 0
        1 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        1 0 0 0 0
        1 0 0 0 1
        1 0 0 0 1
        0 0 0 0 0
        1 0 0 0 0
        0 0 0 0 0
        1 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        1 0 0 0 0
        0 0 0 0 0
        1 0 0 0 0
        1 0 0 0 0
        1 0 0 0 0
        0 0 0 0 0
        1 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        1 0 0 0 1
        1 0 0 0 1
        1 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        0 0 0 0 1
        0 0 0 0 0
        0 0 0 0 0
        1 0 0 0 0
        0 0 0 0 0
        1 0 0 0 0
        0 0 0 0 0
        0 0 0 0 0
        1 0 0 0 0
        1 0 0 0 0
        1 0 0 0 0
        1 0 0 0 0
        end 
        
        gen which = "" 
        quietly foreach v of var A B C D E { 
            replace which = which + "`v'" if `v' == 1 
        }
        
        contract which, freq(count)
        gen length = length(which)
        sort length which 
        
        list which count, noobs 
        
          +---------------+
          | which   count |
          |---------------|
          |            69 |
          |     A      23 |
          |     E       1 |
          |    AE       7 |
          +---------------+

        Comment


        • #5
          Could be silly, but this should serve the purpose:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte(A B C D E)
          1 0 0 0 1
          0 0 0 0 0
          0 0 0 0 1
          1 0 1 0 0
          0 1 0 1 0
          0 0 1 0 0
          0 0 0 0 0
          end
          
          gen temp = A*10000 + B*1000 + C*100 + D*10 + E
          gen str5 combo = string(temp, "%05.0f")
          
          tab combo if substr(combo,1,1) == "1"
          Last edited by Ken Chui; 31 Mar 2021, 08:52.

          Comment


          • #6
            Ken Chui That's an alternative to egen, concat().

            Comment


            • #7
              Nick Cox -The code worked but I'm certain Stata is not picking up the right frequency numbers.

              Ken Chui -The code worked but I'm not sure from the output(the frequencies are same as Nick's code) how to demarcate the combinations which appears as below:


              combo Freq. Percent Cum.

              10000 145 45.74 45.74
              10001 34 10.73 56.47
              10011 8 2.52 58.99
              10100 10 3.15 62.15
              10101 4 1.26 63.41
              10111 3 0.95 64.35
              11000 73 23.03 87.38
              11001 20 6.31 93.69
              11011 4 1.26 94.95
              11100 12 3.79 98.74
              11101 3 0.95 99.68
              11881 1 0.32 100.00

              Total 317 100.00

              Comment


              • #8
                The 1/0 in each position reflects if that it's a 1 or 0 under each variable. So,

                10000 is the pattern for A1, B0, C0, D0, E0, there are 145 of them

                10001 is the pattern for A1, B0, C0, D0, E1, there are 34 of them
                10100 is the pattern for A1, B0, C1, D0, E0, there are 10 of them

                So on, so forth. If you want to see which patterns has A1 and B1, and all else being 0, look for pattern "11000". I hope that's clear.

                Comment


                • #9
                  The code worked but I'm certain Stata is not picking up the right frequency numbers.
                  Nothing to discuss without evidence or an explanation of faulty logic.

                  The mapping between 10000, 10001, and so on is to A, AE and so on.

                  Comment

                  Working...
                  X