Outputting spreadsheet of counts by group

Anne Todd

Join Date: Dec 2018

Posts: 163
#1

Outputting spreadsheet of counts by group

06 Nov 2024, 14:10

I have a somewhat odd task that I'm attempting to figure out the best way to handle. Imagine I have 5 variables, and I basically want to produce a matrix that counts the number of respondents/IDs that fall into each bucket.

The first column is the combination of var1-var4. So '0011' would mean var1=0, var2=0, var3=1, var4=1. And then these are broken out by the 'freq' variable categories.

And then all I'd want is to count the number of respondents that have each possible combination of var1-var4. So in the below table, 4 respondents had a 0 on all four variables (0000) and freq=never.

I have no clue how to even start wrapping my head around how to do this. But I think the resulting table would look something like this:

var group freq never freq once freq more than once

0000 4 1 0

0011 0 0 1

0100 0 1 0

1000 0 1 1

1010 0 1 0

1110 1 0 0

2100 1 0 0

Code:

* Define the data with 'input' clear input id var1 var2 var3 var4 str15 freq 1 0 0 0 0 "never" 2 0 0 0 0 "never" 3 0 0 0 0 "never" 4 0 0 0 0 "once" 5 0 0 1 1 "more than once" 6 0 0 0 0 "never" 7 0 1 0 0 "once" 8 1 0 0 0 "once" 9 1 0 0 0 "never" 10 1 0 1 0 "once" 11 1 1 1 0 "never" 12 2 1 0 0 "never" end

Last edited by Anne Todd; 06 Nov 2024, 14:20.
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

06 Nov 2024, 14:34

Code:

contract var1-var4 freq, freq(count) zero
replace freq = strtoname(freq)
gen `c(obs_t)' obs_no = _n
reshape wide count, i(obs_no) j(freq) string
foreach v of varlist count* {
    replace `v' = 0 if missing(`v')
}
rename count* *
drop obs_no

list, noobs clean abbrev(16)

Comment

Anne Todd

Join Date: Dec 2018
Posts: 163

06 Nov 2024, 18:45

Thanks Clyde Schechter . The -contract- technique is especially useful. If I do -drop if count==0- right after -contract-, that actually gets me pretty close to where I think I want to end up (see below).

But I'm not exactly sure where your code leaves me--I end up with 72 rows and it's unclear to me how to collapse or otherwise wrangle it in a way that gets me close to the final table I show in my original post.

Code:

 contract var1-var4 freq, freq(count) zero

drop if count==0  

list, noobs clean  


    var1   var2   var3   var4             freq   count  
       0      0      0      0            never       4  
       0      0      0      0             once       1  
       0      0      1      1   more_than_once       1  
       0      1      0      0             once       1  
       1      0      0      0            never       1  
       1      0      0      0             once       1  
       1      0      1      0             once       1  
       1      1      1      0            never       1  
       2      1      0      0            never       1

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

06 Nov 2024, 20:01

Oh, it's easy:

Code:

contract var1-var4 freq, freq(count) zero
replace freq = strtoname(freq)
drop if count == 0
gen `c(obs_t)' obs_no = _n
reshape wide count, i(obs_no) j(freq) string
foreach v of varlist count* {
replace `v' = 0 if missing(`v')
}
rename count* *
drop obs_no

list, noobs clean abbrev(16)

I actually thought you wanted all the rows with zero counts. But since you don't, the above is what you want.

Comment

Anne Todd

Join Date: Dec 2018

Posts: 163
#5

06 Nov 2024, 20:27

Thanks Clyde! My real data have a large N (likely resulting in hundreds of var1-var4 combinations) so I think I'll actually probably use some type of cutoff where I don't include rows if the the row total is less than 10 or something like that.

Is it possible to transpose the resulting matrix such that it's organized like the table I made in my first post, where the columns are the different levels of freq?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

06 Nov 2024, 22:05

Is it possible to transpose the resulting matrix such that it's organized like the table I made in my first post, where the columns are the different levels of freq?

What? The data set already is in that configuration. Do you mean you want to put the var1-var4 variables together into a single variable like the var group column of what you showed in #1? If so just insert the lines

Code:

egen var_group = concat(var1 var2 var3 var4) drop var1 var2 var3 var4 order var_group, first

after -drop obs_no-.
Comment
Anne Todd

Join Date: Dec 2018

Posts: 163
#7

07 Nov 2024, 06:29

Oh yes sorry, I think before these var1-var4 variables were combined I was looking at the resulting data set incorrectly. It's very clear now! Thank you.
Comment

var group	freq never	freq once	freq more than once
0000	4	1	0
0011	0	0	1
0100	0	1	0
1000	0	1	1
1010	0	1	0
1110	1	0	0
2100	1	0	0

Announcement