Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using dtable to calculate row percentage

    Hi,
    I am exploring the dtable option to create summary tables, but it looks like the help file doesn't show how to get row percentages (instead of column-wise percentages). Does dtable have this option?

  • #2

    There is no option for dtable to produce row percentages.

    However, you can combine the collection from dtable for continuous variables with collections from one or more table commands each having row percentages of a factor variable.

    Here is an example using the auto data, using foreign as the by() variable, and adding row percetages for factor variable rep78.
    Code:
    sysuse auto
    
    * get summary statistics for continuous variables, and sample
    dtable mpg turn trunk displ, by(foreign)
    
    * note layout for the combined table later
    collect layout
    
    * compute frequencies and row percetages for your factor variables;
    * each factor variable needs a separate call to -table-
    table rep78 foreign, ///
        name(rep78) ///
        statistic(frequency) ///
        statistic(percent, across(foreign)) ///
        total(rep78)
    
    * combine collections
    collect combine both = DTable rep78
    
    * this composite already contains -frequency- and -percent-
    collect query composite _dtable_stats
    
    * fix the autolevels for dimension result
    collect query autolevels result
    collect style autolevels result _dtable_stats, clear
    
    * style your factor variable headers as you like
    collect style header rep78, title(label) level(label)
    
    * add factor variables to the row specficiation in the layout
    collect layout (var rep78) (foreign#result)
    Here is the resulting table.
    Code:
    . collect layout (var rep78) (foreign#result)
    
    Collection: both
          Rows: var rep78
       Columns: foreign#result
       Table 1: 11 x 3
    
    --------------------------------------------------------------------------------
                           |                        Car origin
                           |      Domestic            Foreign             Total
    -----------------------+--------------------------------------------------------
    N                      |        52 (70.3%)         22 (29.7%)        74 (100.0%)
    Mileage (mpg)          |    19.827 (4.743)     24.773 (6.611)     21.297 (5.786)
    Turn circle (ft.)      |    41.442 (3.968)     35.409 (1.501)     39.649 (4.399)
    Trunk space (cu. ft.)  |    14.750 (4.306)     11.409 (3.217)     13.757 (4.277)
    Displacement (cu. in.) |  233.712 (85.263)   111.227 (24.881)   197.297 (91.837)
    Repair record 1978     |
      1                    |        2 (100.0%)                            2 (100.0%)
      2                    |        8 (100.0%)                            8 (100.0%)
      3                    |        27 (90.0%)          3 (10.0%)        30 (100.0%)
      4                    |         9 (50.0%)          9 (50.0%)        18 (100.0%)
      5                    |         2 (18.2%)          9 (81.8%)        11 (100.0%)
    --------------------------------------------------------------------------------

    Comment


    • #3
      How can one create a Table 1 with survey data using the "svy" command using the "table" or "dtable" syntax? For example, when using a command like:

      svy: tab b08sex vlns, se ci obs format(%8.3g) percent row
      I have several categorical and continuous sociodemographic variables that I would like to cross-tabulate with a binary variable called vlns to generate descriptive statistics for Table 1. Additionally, I would like to include 95% confidence intervals (CI) in this table.

      Any guidance on how to achieve this would be appreciated. I am using Stata 18.

      Comment


      • #4
        This is very nice and saves time Jeff Pitblado (StataCorp). Is it possible to only include proportion and standard error with survey function (svy)? Thank you!

        Comment


        • #5
          If you have survey data and want to build a table that cross-tabs continuous and factor variables across the levels of a common factor variable (your binary variable), you probably should use subpopulation methods. dtable supports the svy option, but you want CIs for the cross-tabbed means and percents, so I think your best option is to take advantage of the fact that collect works with svy: mean and svy: proportion, even with options subpop() and over(). The trick here is knowing how the collected values are tagged.

          Here is an example I modified from another thread. In it I use svy: mean to collect CIs for subpopulation and population means, followed by postestimation command estat sd to collect the corresponding means and standard deviations. Then I loop over the factor variables to compute the subpopulation and population, frequencies and percents. Then it is a matter of constructing a composite result named ci for the CIs and another composite result named stats that I use to stack the various variable-specific results of interest. The remaining code is for style/formatting and arranging the collected items into a table.
          Code:
          sysuse auto
          * add some randomly generated sampling weights
          set seed 18
          gen double sampw = runiformint(10,50) + runiform()
          svyset [pw=sampw]
          
          * variable to help collect results for the total sample
          gen one = 1 if !missing(foreign)
          
          collect clear
          
          collect style use dtable, replace
          
          * local macro for continuous variables
          unab cvars : mpg turn trunk
          
          * local macros for factor variables
          fvunab fvars : i.rep78
          fvrevar `fvars' , list
          local fdims = r(varlist)
          
          * In this example, we want proportion across levels of -foreign- instead
          * of within, so we need to change how we call -svy: proportion-.
          
          * continuous variable statistics
          * collect CI limits -- subpopulation estimates
          collect : svy: mean `cvars', over(foreign)
          * collect mean and sd 
          collect : estat sd
          * collect CI limits -- population estimates
          collect, tags(foreign[Total]) : svy: mean `cvars', over(one)
          * collect mean and sd 
          collect, tags(foreign[Total]) : estat sd
          
          * factor variable statistics
          foreach fvar of local fdims {
              * collect fvar-level frequencies and percents across foreign levels
              collect percent=_r_b: svy: proportion foreign, percent over(`fvar')
              * collect fvar-level frequencies (and 100%) for whole sample
              collect percent=_r_b, tags(foreign[Total]): ///
                  svy: proportion one, percent over(`fvar')
          }
          
          * custom composite result for CIs
          collect composite define ci = _r_lb _r_ub, trim
          collect style cell result[ci], sformat("[%s]")
          
          * organize the results we want to see in the table
          collect composite define stats = mean sd freq percent, trim
          collect style autolevels result stats ci, clear
          
          * custom format for standard deviation estimates
          collect style cell result[sd], sformat("(%s)")
          
          * custom formats for the factor-variable percents and their CI limits
          collect style cell (`fdims')#result[percent _r_lb _r_ub], sformat("%s%%")
          
          * custom format for the factor-variable weighted frequencies
          collect style cell result[freq], nformat("%9.1fc")
          
          * customize header styles
          collect style header foreign `fdims', title(label) level(label)
          collect style header result, title(hide) level(hide)
          
          * arrange items into the table
          collect layout (colname_remainder[`cvars'] `fdims') (foreign#result)
          The resulting table is
          Code:
          ----------------------------------------------------------------------------------------------------------------------
                                                                           Car origin                                           
                                            Domestic                          Foreign                          Total            
          ----------------------------------------------------------------------------------------------------------------------
          Mileage (mpg)         20.133 (4.835)   [18.674 21.592] 24.503 (6.090)   [21.828 27.179] 21.413 (5.599) [20.031 22.795]
          Turn circle (ft.)     41.242 (4.003)   [40.035 42.448] 35.491 (1.271)   [34.967 36.014] 39.558 (4.338) [38.478 40.637]
          Trunk space (cu. ft.) 14.634 (4.459)   [13.241 16.027] 11.707 (3.292)   [10.204 13.211] 13.777 (4.367) [12.658 14.896]
          Repair record 1978                                                                                                    
            1                      44.8 100.0%                         0.0 0.0%                      44.8 100.0%                
            2                     288.1 100.0%                         0.0 0.0%                     288.1 100.0%                
            3                      750.1 89.4% [69.815% 96.844%]     89.0 10.6%  [3.156% 30.185%]   839.1 100.0%                
            4                      291.6 51.1% [27.288% 74.475%]    278.6 48.9% [25.525% 72.712%]   570.2 100.0%                
            5                       59.7 17.7%  [4.108% 52.075%]    276.6 82.3% [47.925% 95.892%]   336.3 100.0%                
          ----------------------------------------------------------------------------------------------------------------------

          Comment


          • #6
            This is fantastic Jeff Pitblado (StataCorp), in the case of all factor variables in the rows often time we do not want to present "N" just the percentage/proportion with 95% CI with all numbers in two decimal places. What adjustment we may need to do for that.

            Comment


            • #7
              If you do not want to report the weighted frequencies ("N") of the factor variables, then remove freq from the definition of composite result stats.
              Code:
              collect composite define stats = mean sd percent, trim
              Use command collect style cell to change the numeric format for the factor variable percent and corresponding CI limits.
              Code:
              collect style cell (`fdims')#result[percent _r_lb _r_ub], ///
                      sformat("%s%%") nformat(%9.2f)
              Here is how you can change to a common numeric format for all the results of interest.
              Code:
              collect style cell result[stats ci], nformat(%9.2f)
              The following is a modified version of the above example with these changes.
              Code:
              sysuse auto
              * add some randomly generated sampling weights
              set seed 18
              gen double sampw = runiformint(10,50) + runiform()
              svyset [pw=sampw]
              
              * variable to help collect results for the total sample
              gen one = 1 if !missing(foreign)
              
              collect clear
              
              collect style use dtable, replace
              
              * local macro for continuous variables
              unab cvars : mpg turn trunk
              
              * local macros for factor variables
              fvunab fvars : i.rep78
              fvrevar `fvars' , list
              local fdims = r(varlist)
              
              * In this example, we want proportion across levels of -foreign- instead
              * of within, so we need to change how we call -svy: proportion-.
              
              * continuous variable statistics
              * collect CI limits -- subpopulation estimates
              collect : svy: mean `cvars', over(foreign)
              * collect mean and sd 
              collect : estat sd
              * collect CI limits -- population estimates
              collect, tags(foreign[Total]) : svy: mean `cvars', over(one)
              * collect mean and sd 
              collect, tags(foreign[Total]) : estat sd
              
              * factor variable statistics
              foreach fvar of local fdims {
                  * collect fvar-level frequencies and percents across foreign levels
                  collect percent=_r_b: svy: proportion foreign, percent over(`fvar')
                  * collect fvar-level frequencies (and 100%) for whole sample
                  collect percent=_r_b, tags(foreign[Total]): ///
                      svy: proportion one, percent over(`fvar')
              }
              
              * custom composite result for CIs
              collect composite define ci = _r_lb _r_ub, trim
              collect style cell result[ci], sformat("[%s]")
              
              * organize the results we want to see in the table
              collect composite define stats = mean sd percent, trim
              collect style autolevels result stats ci, clear
              
              * custom format for standard deviation estimates
              collect style cell result[sd], sformat("(%s)")
              
              * custom formats for the factor-variable percents and their CI limits
              collect style cell (`fdims')#result[percent _r_lb _r_ub], ///
                  sformat("%s%%") nformat(%9.2f)
              
              * customize header styles
              collect style header foreign `fdims', title(label) level(label)
              collect style header result, title(hide) level(hide)
              
              * arrange items into the table
              collect layout (colname_remainder[`cvars'] `fdims') (foreign#result)
              
              * if you want a common numerical format for all reported items
              collect style cell result[stats ci], nformat(%9.2f)
              
              * replay table
              collect preview
              Here is the resulting table.
              Code:
              ----------------------------------------------------------------------------------------------------------
                                                                         Car origin                                     
                                              Domestic                      Foreign                      Total          
              ----------------------------------------------------------------------------------------------------------
              Mileage (mpg)         20.13 (4.84)   [18.67 21.59] 24.50 (6.09)   [21.83 27.18] 21.41 (5.60) [20.03 22.79]
              Turn circle (ft.)     41.24 (4.00)   [40.03 42.45] 35.49 (1.27)   [34.97 36.01] 39.56 (4.34) [38.48 40.64]
              Trunk space (cu. ft.) 14.63 (4.46)   [13.24 16.03] 11.71 (3.29)   [10.20 13.21] 13.78 (4.37) [12.66 14.90]
              Repair record 1978                                                                                        
                1                        100.00%                        0.00%                      100.00%              
                2                        100.00%                        0.00%                      100.00%              
                3                         89.39% [69.82% 96.84%]       10.61%  [3.16% 30.18%]      100.00%              
                4                         51.13% [27.29% 74.48%]       48.87% [25.52% 72.71%]      100.00%              
                5                         17.75%  [4.11% 52.08%]       82.25% [47.92% 95.89%]      100.00%              
              ----------------------------------------------------------------------------------------------------------

              Comment


              • #8
                It does work out perfectly Jeff Pitblado (StataCorp) However my goal was to only include proportion and standard error not proportion with 95% CI

                Comment


                • #9
                  You will need to declare more custom result names to exert finer control over which results to show for the continuous and factor variables.

                  In the following, I declare the mean CI limits mean_lb and mean_ub and compose them using composite result named mean_ci. For factor variables, I add percent_se to select the percent standard errors. The formatting of these results no longer needs the fdims list in the call to collect style cell.

                  Code:
                  sysuse auto
                  * add some randomly generated sampling weights
                  set seed 18
                  gen double sampw = runiformint(10,50) + runiform()
                  svyset [pw=sampw]
                  
                  * variable to help collect results for the total sample
                  gen one = 1 if !missing(foreign)
                  
                  collect clear
                  
                  collect style use dtable, replace
                  
                  * local macro for continuous variables
                  unab cvars : mpg turn trunk
                  
                  * local macros for factor variables
                  fvunab fvars : i.rep78
                  fvrevar `fvars' , list
                  local fdims = r(varlist)
                  
                  * In this example, we want proportion across levels of -foreign- instead
                  * of within, so we need to change how we call -svy: proportion-.
                  
                  * In the following, we declare custom names for the mean CI limits so we
                  * can avoid showing the percent CI limits.
                  
                  * continuous variable statistics
                  * collect mean CI limits -- subpopulation estimates
                  collect mean_lb=_r_lb mean_ub=_r_ub: svy: mean `cvars', over(foreign)
                  * collect mean and sd 
                  collect : estat sd
                  * collect mean CI limits -- population estimates
                  collect mean_lb=_r_lb mean_ub=_r_ub, tags(foreign[Total]) : ///
                          svy: mean `cvars', over(one)
                  * collect mean and sd 
                  collect, tags(foreign[Total]) : estat sd
                  
                  * For factor variables, we declare custom names for the percent SE
                  * estimates so we can avoid showing the mean SE estimates.
                  
                  * factor variable statistics
                  foreach fvar of local fdims {
                      * collect fvar-level frequencies and percents across foreign levels
                      collect percent=_r_b percent_se=_r_se: ///
                          svy: proportion foreign, percent over(`fvar')
                      * collect fvar-level frequencies (and 100%) for whole sample
                      collect percent=_r_b percent_se=_r_se, tags(foreign[Total]): ///
                          svy: proportion one, percent over(`fvar')
                  }
                  
                  * custom composite result for mean CIs
                  collect composite define mean_ci = mean_lb mean_ub, trim
                  collect style cell result[mean_ci], sformat("[%s]")
                  
                  * organize the results we want to see in the table
                  collect composite define stats = mean sd percent percent_se, trim
                  collect style autolevels result stats mean_ci, clear
                  
                  * custom format for standard deviation estimates
                  collect style cell result[sd], sformat("(%s)")
                  
                  * custom format for the factor-variable percents
                  collect style cell result[percent], sformat("%s%%")
                  
                  * custom format for standard errors of percents
                  collect style cell result[percent_se], sformat("(%s%%)")
                  
                  * customize header styles
                  collect style header foreign `fdims', title(label) level(label)
                  collect style header result, title(hide) level(hide)
                  
                  * common numerical format for all reported items
                  collect style cell result[stats mean_ci], nformat(%9.2f)
                  
                  * arrange items into the table
                  collect layout (colname_remainder[`cvars'] `fdims') (foreign#result)
                  Here is the resulting table.
                  Code:
                  ---------------------------------------------------------------------------------------------------------------
                                                                                Car origin                                       
                                                   Domestic                      Foreign                        Total            
                  ---------------------------------------------------------------------------------------------------------------
                  Mileage (mpg)            20.13 (4.84) [18.67 21.59]    24.50 (6.09) [21.83 27.18]    21.41 (5.60) [20.03 22.79]
                  Turn circle (ft.)        41.24 (4.00) [40.03 42.45]    35.49 (1.27) [34.97 36.01]    39.56 (4.34) [38.48 40.64]
                  Trunk space (cu. ft.)    14.63 (4.46) [13.24 16.03]    11.71 (3.29) [10.20 13.21]    13.78 (4.37) [12.66 14.90]
                  Repair record 1978                                                                                             
                    1                   100.00% (0.00%)                 0.00% (0.00%)               100.00% (0.00%)              
                    2                   100.00% (0.00%)                 0.00% (0.00%)               100.00% (0.00%)              
                    3                    89.39% (6.14%)                10.61% (6.14%)               100.00% (0.00%)              
                    4                   51.13% (12.84%)               48.87% (12.84%)               100.00% (0.00%)              
                    5                   17.75% (11.83%)               82.25% (11.83%)               100.00% (0.00%)              
                  ---------------------------------------------------------------------------------------------------------------

                  Comment

                  Working...
                  X