Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summary Statistics

    Hi,
    I would like to do summary statistics (mean, st. dv., min, max, number of observations) for a single metric variable (income). I want to do it for the whole sample, and then for 2 binary dummy variables (full_time_employed and gender), so having 5 columns in total. Also, i need to export this to Word via estout. Really struggling with this at the moment. Anyone knows how to do this?
    Last edited by Thomas Yes; 28 Feb 2024, 11:00.

  • #2
    Hi Thomas, welcome to the forum. Suppose you have data that looks something like this:

    Code:
    clear
    set obs 100
    gen income = runiformint(5, 100) * 1000
    gen full_time_employed = runiformint(0, 1)
    gen gender = runiformint(0, 1)
    label define empl 1 "employed" 0 "unemployed"
    label values full_time_employed empl
    label define genl 1 "women" 0 "men"
    label values gender genl
    Probably the most straightforward way to do this is with the -table- command.

    Code:
    table(full_time_employed gender), statistic(mean income) statistic(sd income) ///
        statistic(min income) statistic(max income) statistic(count income)
    But that doesn't quite get you what you appear to be asking for because you don't want the multidimensional nesting structure.

    Code:
    -----------------------------------------------------------------------------------------------------------------
                       |      Mean   Standard deviation   Minimum value   Maximum value   Number of nonmissing values
    -------------------+---------------------------------------------------------------------------------------------
    full_time_employed |                                                                                             
      unemployed       |                                                                                             
        gender         |                                                                                             
          men          |  58538.46             31156.68            5000           99000                            26
          women        |  49692.31             26729.41            5000           90000                            26
          Total        |  54115.38             29086.52            5000           99000                            52
      employed         |                                                                                             
        gender         |                                                                                             
          men          |  51458.33             27741.01            5000           99000                            24
          women        |     51750             29266.76            9000           98000                            24
          Total        |  51604.17              28209.5            5000           99000                            48
      Total            |                                                                                             
        gender         |                                                                                             
          men          |     55140             29483.31            5000           99000                            50
          women        |     50680             27706.51            5000           98000                            50
          Total        |     52910             28551.95            5000           99000                           100
    -----------------------------------------------------------------------------------------------------------------
    I think this code gives you something more like what you are picturing:

    Code:
    gen employed = income if full_time_employed
    gen unemployed = income if !full_time_employed
    gen women = income if gender
    gen men = income if !gender
    table var result, statistic(mean employed unemployed women men income) ///
                      statistic(sd employed unemployed women men income) /// 
                      statistic(min employed unemployed women men income) ///
                      statistic(max employed unemployed women men income) ///
                      statistic(count employed unemployed women men income)
    Which gives the following table of average incomes:

    Code:
    ---------------------------------------------------------------------------------------------------------
               |      Mean   Standard deviation   Minimum value   Maximum value   Number of nonmissing values
    -----------+---------------------------------------------------------------------------------------------
    employed   |  51329.83             27748.62            5000          100000                           476
    unemployed |  52940.84             27404.25            5000          100000                           524
    women      |  51323.59             27465.68            5000          100000                           513
    men        |  53069.82             27672.53            5000          100000                           487
    income     |     52174             27566.64            5000          100000                         1,000
    ---------------------------------------------------------------------------------------------------------
    That second solution might be improved by manually creating your own tags and dimensions using -collect- rather than creating new variables as I do, but the above seamed a bit simpler and easier to understand for someone new to tables. You may want to change the income variable name or the label in the table to "total". In any case, you should probably export the table to word with -putdocx- rather than -estout-.

    Code:
    putdocx begin
    putdocx collect
    putdocx save results.docx, replace
    Finally, I would strongly recommend you read through the documentation for -table-, -collect-, and -putdox-. Automatically creating and exporting tables can get complicated. You probably want to familiarize yourself with the documentation.

    Comment


    • #3
      Daniel provides a solution facilitated by generating variables for each sample grouping.

      Here is an alternative that builds a separate collection for each grouping, then combines them to produce a single table. Using the data from Daniel's example:
      Code:
      * summary statistics for the entire sample
      table,  statistic(mean income)          ///
              statistic(sd income)            ///
              statistic(min income)           ///
              statistic(max income)           ///
              statistic(count income)         ///
              name(Total)
      collect addtags Total[_hide]
      
      * summary statistics within levels of full_time_employed
      table (full_time_employed),             ///
              statistic(mean income)          ///
              statistic(sd income)            ///
              statistic(min income)           ///
              statistic(max income)           ///
              statistic(count income)         ///
              nototals                        ///
              name(full_time_employed)
      
      * summary statistics within levels of gender
      table (gender), ///
              statistic(mean income)          ///
              statistic(sd income)            ///
              statistic(min income)           ///
              statistic(max income)           ///
              statistic(count income)         ///
              nototals                        ///
              name(gender)
      
      * combine these collections into a new collection named 'all'
      collect combine all = full_time_employed gender Total
      
      * add a title
      collect title "Table: Summary of Income"
      
      * shorten some result labels
      collect label levels result     ///
              sd "Std. dev."          ///
              min "Minimum"           ///
              max "Maximum"           ///
              count "N", modify
      
      * arrange the collected results into a table
      collect layout (full_time_employed gender Total) (result)
      Here is the resulting table.
      Code:
      Table: Summary of Income
      --------------------------------------------------------------------
                         |      Mean   Std. dev.   Minimum   Maximum     N
      -------------------+------------------------------------------------
      full_time_employed |
        unemployed       |     56880     28722.2      8000    100000    50
        employed         |     54480     28382.4      7000    100000    50
      gender             |
        men              |  58319.15    30107.85      8000    100000    47
        women            |  53339.62    26934.35      7000    100000    53
      Total              |     55680    28433.83      7000    100000   100
      --------------------------------------------------------------------

      Comment


      • #4
        Thanks, that helped a lot!

        Comment

        Working...
        X