Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Outputting spreadsheet of counts by group

    I have a somewhat odd task that I'm attempting to figure out the best way to handle. Imagine I have 5 variables, and I basically want to produce a matrix that counts the number of respondents/IDs that fall into each bucket.

    The first column is the combination of var1-var4. So '0011' would mean var1=0, var2=0, var3=1, var4=1. And then these are broken out by the 'freq' variable categories.

    And then all I'd want is to count the number of respondents that have each possible combination of var1-var4. So in the below table, 4 respondents had a 0 on all four variables (0000) and freq=never.

    I have no clue how to even start wrapping my head around how to do this. But I think the resulting table would look something like this:

    var group freq never freq once freq more than once
    0000 4 1 0
    0011 0 0 1
    0100 0 1 0
    1000 0 1 1
    1010 0 1 0
    1110 1 0 0
    2100 1 0 0




    Code:
    * Define the data with 'input'
    clear
    input id var1 var2 var3 var4 str15 freq
    1 0 0 0 0 "never"
    2 0 0 0 0 "never"
    3 0 0 0 0 "never"
    4 0 0 0 0 "once"
    5 0 0 1 1 "more than once"
    6 0 0 0 0 "never"
    7 0 1 0 0 "once"
    8 1 0 0 0 "once"
    9 1 0 0 0 "never"
    10 1 0 1 0 "once"
    11 1 1 1 0 "never"
    12 2 1 0 0 "never"
    end
    Last edited by Anne Todd; 06 Nov 2024, 15:20.

  • #2
    Code:
    contract var1-var4 freq, freq(count) zero
    replace freq = strtoname(freq)
    gen `c(obs_t)' obs_no = _n
    reshape wide count, i(obs_no) j(freq) string
    foreach v of varlist count* {
        replace `v' = 0 if missing(`v')
    }
    rename count* *
    drop obs_no
    
    list, noobs clean abbrev(16)

    Comment


    • #3
      Thanks Clyde Schechter . The -contract- technique is especially useful. If I do -drop if count==0- right after -contract-, that actually gets me pretty close to where I think I want to end up (see below).

      But I'm not exactly sure where your code leaves me--I end up with 72 rows and it's unclear to me how to collapse or otherwise wrangle it in a way that gets me close to the final table I show in my original post.

      Code:
       contract var1-var4 freq, freq(count) zero
      
      drop if count==0  
      
      list, noobs clean  
      
      
          var1   var2   var3   var4             freq   count  
             0      0      0      0            never       4  
             0      0      0      0             once       1  
             0      0      1      1   more_than_once       1  
             0      1      0      0             once       1  
             1      0      0      0            never       1  
             1      0      0      0             once       1  
             1      0      1      0             once       1  
             1      1      1      0            never       1  
             2      1      0      0            never       1

      Comment


      • #4
        Oh, it's easy:
        Code:
        contract var1-var4 freq, freq(count) zero
        replace freq = strtoname(freq)
        drop if count == 0
        gen `c(obs_t)' obs_no = _n
        reshape wide count, i(obs_no) j(freq) string
        foreach v of varlist count* {
        replace `v' = 0 if missing(`v')
        }
        rename count* *
        drop obs_no
        
        list, noobs clean abbrev(16)
        I actually thought you wanted all the rows with zero counts. But since you don't, the above is what you want.

        Comment


        • #5
          Thanks Clyde! My real data have a large N (likely resulting in hundreds of var1-var4 combinations) so I think I'll actually probably use some type of cutoff where I don't include rows if the the row total is less than 10 or something like that.

          Is it possible to transpose the resulting matrix such that it's organized like the table I made in my first post, where the columns are the different levels of freq?

          Comment


          • #6
            Is it possible to transpose the resulting matrix such that it's organized like the table I made in my first post, where the columns are the different levels of freq?
            What? The data set already is in that configuration. Do you mean you want to put the var1-var4 variables together into a single variable like the var group column of what you showed in #1? If so just insert the lines
            Code:
            egen var_group = concat(var1 var2 var3 var4)
            drop var1 var2 var3 var4
            order var_group, first
            after -drop obs_no-.

            Comment


            • #7
              Oh yes sorry, I think before these var1-var4 variables were combined I was looking at the resulting data set incorrectly. It's very clear now! Thank you.

              Comment

              Working...
              X