Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I add the percentage of missing observations for each variable in the outputted summary table?

    Hello everyone,
    I would like to create a descriptive statistics table with general statistics like count, SD; min, max, etc. and output it using esttab or outtreg or any other outputting format. In addition to these statistics, I would like to add a new column that capture the percentage of missing observations for each variable. Is there a way to do this?

    The following is my current code:

    estpost tabstat price mpg weight length foreign, c(stat) stat(mean sd min max n)

    # delimit;
    esttab using "${outtex}",
    replace cells("mean sd min max count") nomtitle noobs
    title("Basic Summary Statistics)
    collabels("Mean" "SD" "Min" "Max" "N")
    addnote("Source: Stata's Automobile dataset.")
    coeflabels(price "Price" mpg "Miles per Gallon" weight "Weight (Lbs)" length "Length (Ft)" foreign "1 if Produced Abroad");

    Many thanks

  • #2
    If you have Stata 17 or newer, you can use the new collect suite of commands to build your table. In the following I use table (redesigned to use collections) to build the summary statistics, appending the percent of missing values, to produce a table with the indicated statistics of interest.
    Code:
    sysuse auto
    
    * put list of summary variables in macro so we can loop over them later
    unab vlist : price mpg weight length foreign rep78
    
    * reproduce -tabstat- statistics
    table,    statistic(mean `vlist') ///
        statistic(sd `vlist') ///
        statistic(min `vlist') ///
        statistic(max `vlist') ///
        statistic(count `vlist')
    
    * use shorter result labels
    collect label levels result ///
        sd "SD" ///
        min "Minimum" ///
        max "Maximum" ///
        count "Not missing" ///
        , modify
    
    * temporarily modify the variables to identify observations with missing
    * values and use FV notation to compute the percent of missing
    
    preserve
        foreach var of local vlist {
            replace `var' = missing(`var')
        }
        * use FV notation to compute the percentage of missing for each variable
        table, statistic(fvpercent 1.(`vlist')) append
    restore
    
    * recode to remove the FV notation
    foreach var of local vlist {
        collect recode var 1.`var' = `var'
    }
    
    * relabel the missing value percents
    collect label levels result fvpercent "% Missing", modify
    
    * arrange the results in columns
    collect layout (var) (result)
    
    * fix some decimal formats
    collect style cell result[mean sd], nformat(%12.3fc)
    collect preview
    Here is the resulting table.
    Code:
    -----------------------------------------------------------------------------------------
                       |       Mean          SD   Minimum   Maximum   Not missing   % Missing
    -------------------+---------------------------------------------------------------------
    Price              |  6,165.257   2,949.496      3291     15906            74        0.00
    Mileage (mpg)      |     21.297       5.786        12        41            74        0.00
    Weight (lbs.)      |  3,019.459     777.194      1760      4840            74        0.00
    Length (in.)       |    187.932      22.266       142       233            74        0.00
    Car origin         |      0.297       0.460         0         1            74        0.00
    Repair record 1978 |      3.406       0.990         1         5            69        6.76
    -----------------------------------------------------------------------------------------

    Comment


    • #3
      Dear Jeff,

      Many thanks for the amazing response! I have two questions, if you don't mind me asking.

      On some occasions when running the do-file, I often get an error that the requested statistic mean, SD, etc., is incorrectly specified; "varlist required," even though I have correctly specified the variables in the local macros. Is there a way to bypass this?

      Second, do you have recommendations for how to extract the result from collect preview into editable formats like RTF, CSV, or TeX?

      I appreciate your kind response.

      Comment


      • #4
        If you are selecting code in the do-file editor and pressing the "Do" button, you have to include the line that defines vlist -- otherwise the local macro will not be defined in the temporary do-file that is created to run your selected code. I recommend running the entire do-file instead of using line selections.

        See the documentation for collect export, click here.

        Comment

        Working...
        X