Summary statistics - number of observations

Jade Li

Join Date: Apr 2021
Posts: 98

Summary statistics - number of observations

11 Feb 2022, 06:47

Hi Everyone!

I would like to have a summary statistic table where I have the number of observations for every variable instead of the total number of observations. I am using the following code:

Code:

cd "$My_tables"
putexcel set firstreport.xlsx, replace
putexcel A1= "Sum_stat"
putexcel A1:F1, border(bottom) merge hcenter
estpost tabstat  age gender income dummy dummy_fm dummy_m  labor output pop  , by(treatment) ///
statistics(mean sd n) columns(statistics) listwise
esttab ., main(mean) aux(sd) nostar unstack ///
/*noobs*/ nonote label   
return list
putexcel A2= matrix(r(StatTotal)'), names nformat(number_d2)

But what I get is this result:

Code:


----------------------------------------------------------
                              (1)                          
                                                           
                                0            1        Total
-----------------------------------------------------------
age                        28.18        46.57        28.36
                          (19.99)      (31.33)      (20.22)

gender                     2.556        5.056        2.581
                          (1.505)      (1.112)      (1.523)

income                    15.50        18.33        15.53
                          (1.768)      (1.698)      (1.790)

dummy                     10.51        10.94        10.52
                          (0.681)      (0.513)      (0.681)

dummy_fm                  3.823        6.275        3.848
                          (1.240)      (1.162)      (1.263)

dummy_m                    3.761        6.086        3.785
                          (1.210)      (1.147)      (1.232)

labor                      1.315        3.771        1.340
                          (1.309)      (1.833)      (1.338)

output                     0.283        0.259        0.282
                          (0.197)      (0.138)      (0.196)

pop                       3.405        3.647        3.408
                          (1.131)      (0.740)      (1.128)

-----------------------------------------------------------
Observations                 7080                          
-----------------------------------------------------------

. return list

The thing is that I do not need the total column. However, I need the number of observations used for the variable age, gender, etc.. for column zero (treatment=0) and for column 1 (treatment = 1) separately

Thank you in advance!
JL

Tags: None

Nate Tillern

Join Date: Jun 2017

Posts: 32
#2

11 Feb 2022, 07:37

Read here to learn about using the 'nmissing' and 'npresent' commands. I believe the latter is specifically what you are looking for.
Comment
Jade Li

Join Date: Apr 2021

Posts: 98
#3

11 Feb 2022, 08:00

Thank you, Ante. I will try it.
Comment
Jade Li

Join Date: Apr 2021

Posts: 98
#4

11 Feb 2022, 08:02

But do you know how to fit it with my code? I tried to add it to the code I listed above, it did not work!
Comment
Nate Tillern

Join Date: Jun 2017

Posts: 32
#5

11 Feb 2022, 17:15

I would need to see your .dta file to specifically fit code. Apparently npresent is somewhat outdated. There are other suggestions in this thread.

countvalues seems like an option, some theoretical code would be:

Code:

countvalues, values(.)

This should create a list of all your variables, with each variable given a number equal to the number of missing values. Number of total observations minus number of missing values in a variable equals the number of observations in that variable. It is probably fastest to do that last subtraction part manually considering how few variables you have.

Last edited by Nate Tillern; 11 Feb 2022, 17:17. Reason: typo
1 like
Comment

Announcement

Summary statistics - number of observations

Comment

Comment

Comment

Comment