Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summary statistics - number of observations

    Hi Everyone!

    I would like to have a summary statistic table where I have the number of observations for every variable instead of the total number of observations. I am using the following code:
    Code:
    cd "$My_tables"
    putexcel set firstreport.xlsx, replace
    putexcel A1= "Sum_stat"
    putexcel A1:F1, border(bottom) merge hcenter
    estpost tabstat  age gender income dummy dummy_fm dummy_m  labor output pop  , by(treatment) ///
    statistics(mean sd n) columns(statistics) listwise
    esttab ., main(mean) aux(sd) nostar unstack ///
    /*noobs*/ nonote label   
    return list
    putexcel A2= matrix(r(StatTotal)'), names nformat(number_d2)
    But what I get is this result:

    Code:
    
    ----------------------------------------------------------
                                  (1)                          
                                                               
                                    0            1        Total
    -----------------------------------------------------------
    age                        28.18        46.57        28.36
                              (19.99)      (31.33)      (20.22)
    
    gender                     2.556        5.056        2.581
                              (1.505)      (1.112)      (1.523)
    
    income                    15.50        18.33        15.53
                              (1.768)      (1.698)      (1.790)
    
    dummy                     10.51        10.94        10.52
                              (0.681)      (0.513)      (0.681)
    
    dummy_fm                  3.823        6.275        3.848
                              (1.240)      (1.162)      (1.263)
    
    dummy_m                    3.761        6.086        3.785
                              (1.210)      (1.147)      (1.232)
    
    labor                      1.315        3.771        1.340
                              (1.309)      (1.833)      (1.338)
    
    output                     0.283        0.259        0.282
                              (0.197)      (0.138)      (0.196)
    
    pop                       3.405        3.647        3.408
                              (1.131)      (0.740)      (1.128)
    
    -----------------------------------------------------------
    Observations                 7080                          
    -----------------------------------------------------------
    
    . return list
    The thing is that I do not need the total column. However, I need the number of observations used for the variable age, gender, etc.. for column zero (treatment=0) and for column 1 (treatment = 1) separately

    Thank you in advance!
    JL

  • #2
    Read here to learn about using the 'nmissing' and 'npresent' commands. I believe the latter is specifically what you are looking for.

    Comment


    • #3
      Thank you, Ante. I will try it.

      Comment


      • #4
        But do you know how to fit it with my code? I tried to add it to the code I listed above, it did not work!

        Comment


        • #5
          I would need to see your .dta file to specifically fit code. Apparently npresent is somewhat outdated. There are other suggestions in this thread.

          countvalues seems like an option, some theoretical code would be:

          Code:
          countvalues, values(.)
          This should create a list of all your variables, with each variable given a number equal to the number of missing values. Number of total observations minus number of missing values in a variable equals the number of observations in that variable. It is probably fastest to do that last subtraction part manually considering how few variables you have.
          Last edited by Nate Tillern; 11 Feb 2022, 17:17. Reason: typo

          Comment

          Working...
          X