Using dtable to calculate row percentage

Sonnen Blume

Join Date: Aug 2018

Posts: 342
#1

Using dtable to calculate row percentage

06 Aug 2023, 06:27

Hi,
I am exploring the dtable option to create summary tables, but it looks like the help file doesn't show how to get row percentages (instead of column-wise percentages). Does dtable have this option?
Tags: None

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 684

06 Aug 2023, 12:15

There is no option for dtable to produce row percentages.

However, you can combine the collection from dtable for continuous variables with collections from one or more table commands each having row percentages of a factor variable.

Here is an example using the auto data, using foreign as the by() variable, and adding row percetages for factor variable rep78.

Code:

sysuse auto

* get summary statistics for continuous variables, and sample
dtable mpg turn trunk displ, by(foreign)

* note layout for the combined table later
collect layout

* compute frequencies and row percetages for your factor variables;
* each factor variable needs a separate call to -table-
table rep78 foreign, ///
    name(rep78) ///
    statistic(frequency) ///
    statistic(percent, across(foreign)) ///
    total(rep78)

* combine collections
collect combine both = DTable rep78

* this composite already contains -frequency- and -percent-
collect query composite _dtable_stats

* fix the autolevels for dimension result
collect query autolevels result
collect style autolevels result _dtable_stats, clear

* style your factor variable headers as you like
collect style header rep78, title(label) level(label)

* add factor variables to the row specficiation in the layout
collect layout (var rep78) (foreign#result)

Here is the resulting table.

Code:

. collect layout (var rep78) (foreign#result)

Collection: both
      Rows: var rep78
   Columns: foreign#result
   Table 1: 11 x 3

--------------------------------------------------------------------------------
                       |                        Car origin
                       |      Domestic            Foreign             Total
-----------------------+--------------------------------------------------------
N                      |        52 (70.3%)         22 (29.7%)        74 (100.0%)
Mileage (mpg)          |    19.827 (4.743)     24.773 (6.611)     21.297 (5.786)
Turn circle (ft.)      |    41.442 (3.968)     35.409 (1.501)     39.649 (4.399)
Trunk space (cu. ft.)  |    14.750 (4.306)     11.409 (3.217)     13.757 (4.277)
Displacement (cu. in.) |  233.712 (85.263)   111.227 (24.881)   197.297 (91.837)
Repair record 1978     |
  1                    |        2 (100.0%)                            2 (100.0%)
  2                    |        8 (100.0%)                            8 (100.0%)
  3                    |        27 (90.0%)          3 (10.0%)        30 (100.0%)
  4                    |         9 (50.0%)          9 (50.0%)        18 (100.0%)
  5                    |         2 (18.2%)          9 (81.8%)        11 (100.0%)
--------------------------------------------------------------------------------

Comment

Steven Azizi

Join Date: Mar 2016

Posts: 12
#3

07 Sep 2024, 06:27

How can one create a Table 1 with survey data using the "svy" command using the "table" or "dtable" syntax? For example, when using a command like:

svy: tab b08sex vlns, se ci obs format(%8.3g) percent row
I have several categorical and continuous sociodemographic variables that I would like to cross-tabulate with a binary variable called vlns to generate descriptive statistics for Table 1. Additionally, I would like to include 95% confidence intervals (CI) in this table.

Any guidance on how to achieve this would be appreciated. I am using Stata 18.
1 like
Comment
Baker Chowdhury

Join Date: Jul 2017

Posts: 20
#4

04 Dec 2024, 10:30

This is very nice and saves time Jeff Pitblado (StataCorp). Is it possible to only include proportion and standard error with survey function (svy)? Thank you!
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 684

04 Dec 2024, 17:49

If you have survey data and want to build a table that cross-tabs continuous and factor variables across the levels of a common factor variable (your binary variable), you probably should use subpopulation methods. dtable supports the svy option, but you want CIs for the cross-tabbed means and percents, so I think your best option is to take advantage of the fact that collect works with svy: mean and svy: proportion, even with options subpop() and over(). The trick here is knowing how the collected values are tagged.

Here is an example I modified from another thread. In it I use svy: mean to collect CIs for subpopulation and population means, followed by postestimation command estat sd to collect the corresponding means and standard deviations. Then I loop over the factor variables to compute the subpopulation and population, frequencies and percents. Then it is a matter of constructing a composite result named ci for the CIs and another composite result named stats that I use to stack the various variable-specific results of interest. The remaining code is for style/formatting and arranging the collected items into a table.

Code:

sysuse auto
* add some randomly generated sampling weights
set seed 18
gen double sampw = runiformint(10,50) + runiform()
svyset [pw=sampw]

* variable to help collect results for the total sample
gen one = 1 if !missing(foreign)

collect clear

collect style use dtable, replace

* local macro for continuous variables
unab cvars : mpg turn trunk

* local macros for factor variables
fvunab fvars : i.rep78
fvrevar `fvars' , list
local fdims = r(varlist)

* In this example, we want proportion across levels of -foreign- instead
* of within, so we need to change how we call -svy: proportion-.

* continuous variable statistics
* collect CI limits -- subpopulation estimates
collect : svy: mean `cvars', over(foreign)
* collect mean and sd 
collect : estat sd
* collect CI limits -- population estimates
collect, tags(foreign[Total]) : svy: mean `cvars', over(one)
* collect mean and sd 
collect, tags(foreign[Total]) : estat sd

* factor variable statistics
foreach fvar of local fdims {
    * collect fvar-level frequencies and percents across foreign levels
    collect percent=_r_b: svy: proportion foreign, percent over(`fvar')
    * collect fvar-level frequencies (and 100%) for whole sample
    collect percent=_r_b, tags(foreign[Total]): ///
        svy: proportion one, percent over(`fvar')
}

* custom composite result for CIs
collect composite define ci = _r_lb _r_ub, trim
collect style cell result[ci], sformat("[%s]")

* organize the results we want to see in the table
collect composite define stats = mean sd freq percent, trim
collect style autolevels result stats ci, clear

* custom format for standard deviation estimates
collect style cell result[sd], sformat("(%s)")

* custom formats for the factor-variable percents and their CI limits
collect style cell (`fdims')#result[percent _r_lb _r_ub], sformat("%s%%")

* custom format for the factor-variable weighted frequencies
collect style cell result[freq], nformat("%9.1fc")

* customize header styles
collect style header foreign `fdims', title(label) level(label)
collect style header result, title(hide) level(hide)

* arrange items into the table
collect layout (colname_remainder[`cvars'] `fdims') (foreign#result)

The resulting table is

Code:

----------------------------------------------------------------------------------------------------------------------
                                                                 Car origin                                           
                                  Domestic                          Foreign                          Total            
----------------------------------------------------------------------------------------------------------------------
Mileage (mpg)         20.133 (4.835)   [18.674 21.592] 24.503 (6.090)   [21.828 27.179] 21.413 (5.599) [20.031 22.795]
Turn circle (ft.)     41.242 (4.003)   [40.035 42.448] 35.491 (1.271)   [34.967 36.014] 39.558 (4.338) [38.478 40.637]
Trunk space (cu. ft.) 14.634 (4.459)   [13.241 16.027] 11.707 (3.292)   [10.204 13.211] 13.777 (4.367) [12.658 14.896]
Repair record 1978                                                                                                    
  1                      44.8 100.0%                         0.0 0.0%                      44.8 100.0%                
  2                     288.1 100.0%                         0.0 0.0%                     288.1 100.0%                
  3                      750.1 89.4% [69.815% 96.844%]     89.0 10.6%  [3.156% 30.185%]   839.1 100.0%                
  4                      291.6 51.1% [27.288% 74.475%]    278.6 48.9% [25.525% 72.712%]   570.2 100.0%                
  5                       59.7 17.7%  [4.108% 52.075%]    276.6 82.3% [47.925% 95.892%]   336.3 100.0%                
----------------------------------------------------------------------------------------------------------------------

Comment

Baker Chowdhury

Join Date: Jul 2017

Posts: 20
#6

05 Dec 2024, 11:10

This is fantastic Jeff Pitblado (StataCorp), in the case of all factor variables in the rows often time we do not want to present "N" just the percentage/proportion with 95% CI with all numbers in two decimal places. What adjustment we may need to do for that.
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 684

05 Dec 2024, 13:59

If you do not want to report the weighted frequencies ("N") of the factor variables, then remove freq from the definition of composite result stats.

Code:

collect composite define stats = mean sd percent, trim

Use command collect style cell to change the numeric format for the factor variable percent and corresponding CI limits.

Code:

collect style cell (`fdims')#result[percent _r_lb _r_ub], ///
        sformat("%s%%") nformat(%9.2f)

Here is how you can change to a common numeric format for all the results of interest.

Code:

collect style cell result[stats ci], nformat(%9.2f)

The following is a modified version of the above example with these changes.

Code:

sysuse auto
* add some randomly generated sampling weights
set seed 18
gen double sampw = runiformint(10,50) + runiform()
svyset [pw=sampw]

* variable to help collect results for the total sample
gen one = 1 if !missing(foreign)

collect clear

collect style use dtable, replace

* local macro for continuous variables
unab cvars : mpg turn trunk

* local macros for factor variables
fvunab fvars : i.rep78
fvrevar `fvars' , list
local fdims = r(varlist)

* In this example, we want proportion across levels of -foreign- instead
* of within, so we need to change how we call -svy: proportion-.

* continuous variable statistics
* collect CI limits -- subpopulation estimates
collect : svy: mean `cvars', over(foreign)
* collect mean and sd 
collect : estat sd
* collect CI limits -- population estimates
collect, tags(foreign[Total]) : svy: mean `cvars', over(one)
* collect mean and sd 
collect, tags(foreign[Total]) : estat sd

* factor variable statistics
foreach fvar of local fdims {
    * collect fvar-level frequencies and percents across foreign levels
    collect percent=_r_b: svy: proportion foreign, percent over(`fvar')
    * collect fvar-level frequencies (and 100%) for whole sample
    collect percent=_r_b, tags(foreign[Total]): ///
        svy: proportion one, percent over(`fvar')
}

* custom composite result for CIs
collect composite define ci = _r_lb _r_ub, trim
collect style cell result[ci], sformat("[%s]")

* organize the results we want to see in the table
collect composite define stats = mean sd percent, trim
collect style autolevels result stats ci, clear

* custom format for standard deviation estimates
collect style cell result[sd], sformat("(%s)")

* custom formats for the factor-variable percents and their CI limits
collect style cell (`fdims')#result[percent _r_lb _r_ub], ///
    sformat("%s%%") nformat(%9.2f)

* customize header styles
collect style header foreign `fdims', title(label) level(label)
collect style header result, title(hide) level(hide)

* arrange items into the table
collect layout (colname_remainder[`cvars'] `fdims') (foreign#result)

* if you want a common numerical format for all reported items
collect style cell result[stats ci], nformat(%9.2f)

* replay table
collect preview

Here is the resulting table.

Code:

----------------------------------------------------------------------------------------------------------
                                                           Car origin                                     
                                Domestic                      Foreign                      Total          
----------------------------------------------------------------------------------------------------------
Mileage (mpg)         20.13 (4.84)   [18.67 21.59] 24.50 (6.09)   [21.83 27.18] 21.41 (5.60) [20.03 22.79]
Turn circle (ft.)     41.24 (4.00)   [40.03 42.45] 35.49 (1.27)   [34.97 36.01] 39.56 (4.34) [38.48 40.64]
Trunk space (cu. ft.) 14.63 (4.46)   [13.24 16.03] 11.71 (3.29)   [10.20 13.21] 13.78 (4.37) [12.66 14.90]
Repair record 1978                                                                                        
  1                        100.00%                        0.00%                      100.00%              
  2                        100.00%                        0.00%                      100.00%              
  3                         89.39% [69.82% 96.84%]       10.61%  [3.16% 30.18%]      100.00%              
  4                         51.13% [27.29% 74.48%]       48.87% [25.52% 72.71%]      100.00%              
  5                         17.75%  [4.11% 52.08%]       82.25% [47.92% 95.89%]      100.00%              
----------------------------------------------------------------------------------------------------------

Comment

Baker Chowdhury

Join Date: Jul 2017

Posts: 20
#8

05 Dec 2024, 14:33

It does work out perfectly Jeff Pitblado (StataCorp) However my goal was to only include proportion and standard error not proportion with 95% CI
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 684

05 Dec 2024, 15:08

You will need to declare more custom result names to exert finer control over which results to show for the continuous and factor variables.

In the following, I declare the mean CI limits mean_lb and mean_ub and compose them using composite result named mean_ci. For factor variables, I add percent_se to select the percent standard errors. The formatting of these results no longer needs the fdims list in the call to collect style cell.

Code:

sysuse auto
* add some randomly generated sampling weights
set seed 18
gen double sampw = runiformint(10,50) + runiform()
svyset [pw=sampw]

* variable to help collect results for the total sample
gen one = 1 if !missing(foreign)

collect clear

collect style use dtable, replace

* local macro for continuous variables
unab cvars : mpg turn trunk

* local macros for factor variables
fvunab fvars : i.rep78
fvrevar `fvars' , list
local fdims = r(varlist)

* In this example, we want proportion across levels of -foreign- instead
* of within, so we need to change how we call -svy: proportion-.

* In the following, we declare custom names for the mean CI limits so we
* can avoid showing the percent CI limits.

* continuous variable statistics
* collect mean CI limits -- subpopulation estimates
collect mean_lb=_r_lb mean_ub=_r_ub: svy: mean `cvars', over(foreign)
* collect mean and sd 
collect : estat sd
* collect mean CI limits -- population estimates
collect mean_lb=_r_lb mean_ub=_r_ub, tags(foreign[Total]) : ///
        svy: mean `cvars', over(one)
* collect mean and sd 
collect, tags(foreign[Total]) : estat sd

* For factor variables, we declare custom names for the percent SE
* estimates so we can avoid showing the mean SE estimates.

* factor variable statistics
foreach fvar of local fdims {
    * collect fvar-level frequencies and percents across foreign levels
    collect percent=_r_b percent_se=_r_se: ///
        svy: proportion foreign, percent over(`fvar')
    * collect fvar-level frequencies (and 100%) for whole sample
    collect percent=_r_b percent_se=_r_se, tags(foreign[Total]): ///
        svy: proportion one, percent over(`fvar')
}

* custom composite result for mean CIs
collect composite define mean_ci = mean_lb mean_ub, trim
collect style cell result[mean_ci], sformat("[%s]")

* organize the results we want to see in the table
collect composite define stats = mean sd percent percent_se, trim
collect style autolevels result stats mean_ci, clear

* custom format for standard deviation estimates
collect style cell result[sd], sformat("(%s)")

* custom format for the factor-variable percents
collect style cell result[percent], sformat("%s%%")

* custom format for standard errors of percents
collect style cell result[percent_se], sformat("(%s%%)")

* customize header styles
collect style header foreign `fdims', title(label) level(label)
collect style header result, title(hide) level(hide)

* common numerical format for all reported items
collect style cell result[stats mean_ci], nformat(%9.2f)

* arrange items into the table
collect layout (colname_remainder[`cvars'] `fdims') (foreign#result)

Here is the resulting table.

Code:

---------------------------------------------------------------------------------------------------------------
                                                              Car origin                                       
                                 Domestic                      Foreign                        Total            
---------------------------------------------------------------------------------------------------------------
Mileage (mpg)            20.13 (4.84) [18.67 21.59]    24.50 (6.09) [21.83 27.18]    21.41 (5.60) [20.03 22.79]
Turn circle (ft.)        41.24 (4.00) [40.03 42.45]    35.49 (1.27) [34.97 36.01]    39.56 (4.34) [38.48 40.64]
Trunk space (cu. ft.)    14.63 (4.46) [13.24 16.03]    11.71 (3.29) [10.20 13.21]    13.78 (4.37) [12.66 14.90]
Repair record 1978                                                                                             
  1                   100.00% (0.00%)                 0.00% (0.00%)               100.00% (0.00%)              
  2                   100.00% (0.00%)                 0.00% (0.00%)               100.00% (0.00%)              
  3                    89.39% (6.14%)                10.61% (6.14%)               100.00% (0.00%)              
  4                   51.13% (12.84%)               48.87% (12.84%)               100.00% (0.00%)              
  5                   17.75% (11.83%)               82.25% (11.83%)               100.00% (0.00%)              
---------------------------------------------------------------------------------------------------------------

Announcement