Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Table mean SD of continuous variable over categorical variables

    Hi,

    I have to make a table that gives me the mean and standard deviation of a continuous variable (column) over categorical variables (row). I can't seem to find the right command, how do I do it?


    Thanks a lot, Annika

  • #2
    Annika:
    welcome to this forum.
    Do you mean something along the following lines?
    Code:
    . sysuse auto.dta
    (1978 automobile data)
    
    . tabstat price, stat(N mean sd p25 p50 p75 min max) by(foreign)
    
    Summary for variables: price
    Group variable: foreign (Car origin)
    
     foreign |         N      Mean        SD       p25       p50       p75       Min       Max
    ---------+--------------------------------------------------------------------------------
    Domestic |        52  6072.423  3097.104      4184    4782.5      6234      3291     15906
     Foreign |        22  6384.682  2621.915      4499      5759      7140      3748     12990
    ---------+--------------------------------------------------------------------------------
       Total |        74  6165.257  2949.496      4195    5006.5      6342      3291     15906
    ------------------------------------------------------------------------------------------
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Yes, exactly. Though I would really like to use more categorical variables. In your example I can only run one variable at a time. Right?

      Comment


      • #4
        Annika,
        no, you cannot use more categorical variables in the same code via the -by()- option .
        Code:
        . sysuse auto
        (1978 automobile data)
        
        . tabstat price, stat(N mean sd p25 p50 p75 min max) by(foreign rep78)
        by():  too many variables specified
        r(103);
        Last edited by Carlo Lazzaro; 10 May 2023, 05:14.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo,

          But that is what I want to do (if possible in Stata). I want to make a table that presents the mean and SD of one continuous variable to more categorical variables. And if possible, export the table to microsoft word or excel.

          /Annika

          Comment


          • #6
            Anniika:
            the only way that springs to my mind to accomplish what you're after is the following one:
            Code:
             sysuse auto.dta
            (1978 automobile data)
            
            . bysort foreign rep78: tabstat price, stat(N mean sd p25 p50 p75 min max)
            
            --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Domestic, rep78 = 1
            
                Variable |         N      Mean        SD       p25       p50       p75       Min       Max
            -------------+--------------------------------------------------------------------------------
                   price |         2    4564.5  522.5519      4195    4564.5      4934      4195      4934
            ----------------------------------------------------------------------------------------------
            
            --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Domestic, rep78 = 2
            
                Variable |         N      Mean        SD       p25       p50       p75       Min       Max
            -------------+--------------------------------------------------------------------------------
                   price |         8  5967.625  3579.357      4035      4638      6114      3667     14500
            ----------------------------------------------------------------------------------------------
            
            --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Domestic, rep78 = 3
            
                Variable |         N      Mean        SD       p25       p50       p75       Min       Max
            -------------+--------------------------------------------------------------------------------
                   price |        27  6607.074  3661.267      4187      4749     10371      3291     15906
            ----------------------------------------------------------------------------------------------
            
            --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Domestic, rep78 = 4
            
                Variable |         N      Mean        SD       p25       p50       p75       Min       Max
            -------------+--------------------------------------------------------------------------------
                   price |         9  5881.556  1592.019      4890      5705      6303      3829      8814
            ----------------------------------------------------------------------------------------------
            
            --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Domestic, rep78 = 5
            
                Variable |         N      Mean        SD       p25       p50       p75       Min       Max
            -------------+--------------------------------------------------------------------------------
                   price |         2    4204.5  311.8341      3984    4204.5      4425      3984      4425
            ----------------------------------------------------------------------------------------------
            
            --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Domestic, rep78 = .
            
                Variable |         N      Mean        SD       p25       p50       p75       Min       Max
            -------------+--------------------------------------------------------------------------------
                   price |         4    4790.5  1169.903    4111.5    4438.5    5469.5      3799      6486
            ----------------------------------------------------------------------------------------------
            
            --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Foreign, rep78 = 3
            
                Variable |         N      Mean        SD       p25       p50       p75       Min       Max
            -------------+--------------------------------------------------------------------------------
                   price |         3  4828.667  1285.613      3895      4296      6295      3895      6295
            ----------------------------------------------------------------------------------------------
            
            --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Foreign, rep78 = 4
            
                Variable |         N      Mean        SD       p25       p50       p75       Min       Max
            -------------+--------------------------------------------------------------------------------
                   price |         9  6261.444  1896.092      4697      6229      7140      3995      9735
            ----------------------------------------------------------------------------------------------
            
            --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Foreign, rep78 = 5
            
                Variable |         N      Mean        SD       p25       p50       p75       Min       Max
            -------------+--------------------------------------------------------------------------------
                   price |         9  6292.667  2765.629      4589      5719      5899      3748     11995
            ----------------------------------------------------------------------------------------------
            
            --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Foreign, rep78 = .
            
                Variable |         N      Mean        SD       p25       p50       p75       Min       Max
            -------------+--------------------------------------------------------------------------------
                   price |         1     12990         .     12990     12990     12990     12990     12990
            ----------------------------------------------------------------------------------------------
            
            .
            Exporting tables to microsoft word or excel cn be made via the -table- suite of commands available from Stata 17.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              If you are using Stata 17 or newer, the table command supports exporting to Word and Excel.

              It is not clear if you want a table composed of the mean and SD of a continuous variable for each level of each categorical variable separately, or for each level combination of the categorical variables.

              Here is an example that produces a table for the level combinations of 2 categorical variables. It takes one call to table; I include the frequency in the table for reasons I will explain in the next example.
              Code:
              . table (foreign rep78), ///
              >         statistic(frequency) ///
              >         statistic(mean mpg) ///
              >         statistic(sd mpg)
              
              -------------------------------------------------------------------
                                     |  Frequency       Mean   Standard deviation
              -----------------------+-------------------------------------------
              Car origin             |                                           
                Domestic             |                                           
                  Repair record 1978 |                                           
                    1                |          2         21             4.242641
                    2                |          8     19.125             3.758324
                    3                |         27         19             4.085622
                    4                |          9   18.44444             4.585605
                    5                |          2         32             2.828427
                    Total            |         48   19.54167             4.753312
                Foreign              |                                           
                  Repair record 1978 |                                           
                    3                |          3   23.33333             2.516611
                    4                |          9   24.88889             2.713137
                    5                |          9   26.33333             9.367497
                    Total            |         21   25.28571             6.309856
                Total                |                                           
                  Repair record 1978 |                                           
                    1                |          2         21             4.242641
                    2                |          8     19.125             3.758324
                    3                |         30   19.43333             4.141325
                    4                |         18   21.66667              4.93487
                    5                |         11   27.36364             8.732385
                    Total            |         69   21.28986             5.866408
              -------------------------------------------------------------------
              If this is the goal, then you can use collect style to make some format or other style changes, then collect export the table to Word or Excel.

              If you want a table composed of the mean SD of a continuous variable for each level of separate categorical variables, you will have to use table with each categorical variable separately, then combine the resulting collections. Since each categorical variable can have its own missing value pattern, you might want to show their frequencies (or use option missing). In the following I elected to show the frequencies with the mean and SD of variable mpg for categorical variables foreign and rep78, taking care to name each table collection so that I can then combine them and then use collect layout to stack the tables.
              Code:
              . table foreign, ///
              >         statistic(frequency) ///
              >         statistic(mean mpg) ///
              >         statistic(sd mpg) ///
              >         name(t1)
              
              -------------------------------------------------------
                         |  Frequency       Mean   Standard deviation
              -----------+-------------------------------------------
              Car origin |                                           
                Domestic |         52   19.82692             4.743297
                Foreign  |         22   24.77273             6.611187
                Total    |         74    21.2973             5.785503
              -------------------------------------------------------
              
              . table rep78, ///
              >         statistic(frequency) ///
              >         statistic(mean mpg) ///
              >         statistic(sd mpg) ///
              >         name(t2)
              
              ---------------------------------------------------------------
                                 |  Frequency       Mean   Standard deviation
              -------------------+-------------------------------------------
              Repair record 1978 |                                           
                1                |          2         21             4.242641
                2                |          8     19.125             3.758324
                3                |         30   19.43333             4.141325
                4                |         18   21.66667              4.93487
                5                |         11   27.36364             8.732385
                Total            |         69   21.28986             5.866408
              ---------------------------------------------------------------
              
              . 
              . collect combine all = t1 t2
              (current collection is all)
              
              . collect layout (foreign rep78) (result)
              
              Collection: all
                    Rows: foreign rep78
                 Columns: result
                 Table 1: 11 x 3
              
              ---------------------------------------------------------------
                                 |  Frequency       Mean   Standard deviation
              -------------------+-------------------------------------------
              Car origin         |                                           
                Domestic         |         52   19.82692             4.743297
                Foreign          |         22   24.77273             6.611187
                Total            |         74    21.2973             5.785503
              Repair record 1978 |                                           
                1                |          2         21             4.242641
                2                |          8     19.125             3.758324
                3                |         30   19.43333             4.141325
                4                |         18   21.66667              4.93487
                5                |         11   27.36364             8.732385
                Total            |         69   21.28986             5.866408
              ---------------------------------------------------------------
              As for the first example, you can use collect style to make some format or other style changes, then collect export this table to Word or Excel.

              Comment


              • #8
                Dear Jeff,

                Thanks for you replay. I'll try to explain again. I would like to make a table, that presents the mean and SD from one continuous variable to each level of different categorical variables. So the continuous variable is not changing and should be in the row of the table top, while the categorical variables (more than one variable) should be in the collums at the left side of the table.

                I'm not interested in the mean and SD from the categorical variable, only for the continuous variable. As I understand you examples, you show the mean and SD from the categorical variable and not from a continuous variable? If I misunderstod, sorry. In that case, what is the name of the continuous variable you use?

                Comment


                • #9
                  mpg is the continuous variable in #7.

                  Comment


                  • #10
                    Thanks a lot Nick! Now it makes sense to me.

                    Comment


                    • #11
                      I see how the row and column headers might suggest I was summarizing the wrong variables. Here is a redo of my second example, but I switch to using option missing instead of showing frequencies. I also change the label for the standard deviations to "SD". In the layout, I add the var dimension so that it is clear where the mean and SD values are computed from.
                      Code:
                      sysuse auto
                      
                      table foreign, ///
                              statistic(mean mpg) ///
                              statistic(sd mpg) ///
                              missing ///
                              name(t1)
                      
                      table rep78, ///
                              statistic(mean mpg) ///
                              statistic(sd mpg) ///
                              missing ///
                              name(t2)
                      
                      collect combine all = t1 t2
                      
                      collect label values result sd "SD", modify
                      
                      collect layout (foreign rep78) (var#result)
                      Here is the resulting table
                      Code:
                      -----------------------------------------
                                         |     Mileage (mpg)   
                                         |      Mean         SD
                      -------------------+---------------------
                      Car origin         |                     
                        Domestic         |  19.82692   4.743297
                        Foreign          |  24.77273   6.611187
                        Total            |   21.2973   5.785503
                      Repair record 1978 |                     
                        1                |        21   4.242641
                        2                |    19.125   3.758324
                        3                |  19.43333   4.141325
                        4                |  21.66667    4.93487
                        5                |  27.36364   8.732385
                        .                |      21.4    5.07937
                        Total            |   21.2973   5.785503
                      -----------------------------------------

                      Comment

                      Working...
                      X