dtable and Specifying Missing Values Be Included in Percent Totals

Kevin Blaine

Join Date: Jul 2017

Posts: 66
#1

dtable and Specifying Missing Values Be Included in Percent Totals

17 May 2023, 14:20

Hi there -- for dtable, is it possible to specify missing values be included in percent totals, similar to tab [var], mi?

I have a one-way summary table wherein most values are missing due to skip logic, but I need the count of those missing to be included in total percentage calculations. I know I can use replace to substitute "0" for all the missing values, but I was hoping I could keep the integrity of the data by invoking an option in dtable to include missing values.

Thanks,
Kevin
Tags: None

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 686

18 May 2023, 09:02

Missing values cannot be supported with factor variables notation.

However, I believe you can combine collections from table and dtable to construct the table you describe.

Here is an example using the auto data, where variable rep78 has missing values.

Code:

. sysuse auto
(1978 automobile data)

. 
. * use -dtable- for factor variables without missing values
. dtable i.for mpg turn trunk

------------------------------------
                          Summary   
------------------------------------
N                                 74
Car origin                          
  Domestic                52 (70.3%)
  Foreign                 22 (29.7%)
Mileage (mpg)         21.297 (5.786)
Turn circle (ft.)     39.649 (4.399)
Trunk space (cu. ft.) 13.757 (4.277)
------------------------------------

. * look at the autolevels for later
. collect query autolevels result

Automatic dimension levels
Collection: DTable
 Dimension: result
    Levels: _dtable_stats

. sreturn list

macros:
             s(levels) : "_dtable_stats"
            s(dimname) : "result"
         s(collection) : "DTable"

. local auto_result = s(levels)

. 
. * use -table- for each variable with missing values
. table rep78, missing ///
>         statistic(frequency) ///
>         statistic(percent) ///
>         nototal ///
>         name(rep78Table)

-----------------------------------------
                   |  Frequency   Percent
-------------------+---------------------
Repair record 1978 |                     
  1                |          2      2.70
  2                |          8     10.81
  3                |         30     40.54
  4                |         18     24.32
  5                |         11     14.86
  .                |          5      6.76
-----------------------------------------

. * recode the result levels to match the ones used by -dtable-
. collect recode result frequency=fvfrequency
(6 items recoded in collection rep78Table)

. collect recode result percent=fvpercent
(6 items recoded in collection rep78Table)

. 
. * combine the tables, use DTable first to make sure its style takes
. * precedence
. collect combine both = DTable rep78Table
(current collection is both)

. * fix the header style to show the factor variable's label
. collect style header rep78 foreign, title(label)

. * restore -dtable-'s autolevels for dimension result
. collect style autolevels result `auto_result', clear

. * update the layout for the combined collection
. collect layout (var rep78) (result)

Collection: both
      Rows: var rep78
   Columns: result
   Table 1: 14 x 1

---------------------------------------
                      |      Summary   
----------------------+----------------
N                     |              74
Car origin            |                
  Domestic            |      52 (70.3%)
  Foreign             |      22 (29.7%)
Mileage (mpg)         |  21.297 (5.786)
Turn circle (ft.)     |  39.649 (4.399)
Trunk space (cu. ft.) |  13.757 (4.277)
Repair record 1978    |                
  1                   |        2 (2.7%)
  2                   |       8 (10.8%)
  3                   |      30 (40.5%)
  4                   |      18 (24.3%)
  5                   |      11 (14.9%)
  .                   |        5 (6.8%)
---------------------------------------

. 
. * take more control of the variable order
. collect layout (var[_N] var[i.foreign] rep78 var[mpg turn trunk]) (result)

Collection: both
      Rows: var[_N] var[i.foreign] rep78 var[mpg turn trunk]
   Columns: result
   Table 1: 14 x 1

---------------------------------------
                      |      Summary   
----------------------+----------------
N                     |              74
Car origin            |                
  Domestic            |      52 (70.3%)
  Foreign             |      22 (29.7%)
Repair record 1978    |                
  1                   |        2 (2.7%)
  2                   |       8 (10.8%)
  3                   |      30 (40.5%)
  4                   |      18 (24.3%)
  5                   |      11 (14.9%)
  .                   |        5 (6.8%)
Mileage (mpg)         |  21.297 (5.786)
Turn circle (ft.)     |  39.649 (4.399)
Trunk space (cu. ft.) |  13.757 (4.277)
---------------------------------------

. 
. * factor variables can be specified outside of dimension -var-
. collect layout (var[_N] foreign rep78 var[mpg turn trunk]) (result)

Collection: both
      Rows: var[_N] foreign rep78 var[mpg turn trunk]
   Columns: result
   Table 1: 14 x 1

---------------------------------------
                      |      Summary   
----------------------+----------------
N                     |              74
Car origin            |                
  Domestic            |      52 (70.3%)
  Foreign             |      22 (29.7%)
Repair record 1978    |                
  1                   |        2 (2.7%)
  2                   |       8 (10.8%)
  3                   |      30 (40.5%)
  4                   |      18 (24.3%)
  5                   |      11 (14.9%)
  .                   |        5 (6.8%)
Mileage (mpg)         |  21.297 (5.786)
Turn circle (ft.)     |  39.649 (4.399)
Trunk space (cu. ft.) |  13.757 (4.277)
---------------------------------------

Comment

Kevin Blaine

Join Date: Jul 2017

Posts: 66
#3

18 May 2023, 10:49

Thanks Jeff for the incredibly detailed reply. Much appreciated! This certainly answers part of my question, and fresh eyes this morning reveal that I wasn't as clear as I could be in my initial question. Let me elaborate and clarify using your rep78 example:

I'm interested in all the variables present in the final version of your table, except I actually don't want to look at all the repair records; I only am interested in Repair Record #1. So, this let's say I code appropriately (and correctly!) to get to the table below:

Code:

--------------------------------------- | Summary ----------------------+---------------- N | 74 Car origin | Domestic | 52 (70.3%) Foreign | 22 (29.7%) Repair Record #1 | 2 (2.7%) Mileage (mpg) | 21.297 (5.786) Turn circle (ft.) | 39.649 (4.399) Trunk space (cu. ft.) | 13.757 (4.277) ---------------------------------------

Now, while that 2 (2.7%) is accurate in terms of 2/74, I'm not interested in reporting the percentage as it pertains to ALL cases, but rather only for those where a repair record is present. What I would like to do (and the root of my other question) is to manipulate dtable or table to be able to report "2 (2.9%)", which is 2/69 (i.e. The number of cases minus the missing values for rep78. My final table would show:

Code:

--------------------------------------- | Summary ----------------------+---------------- N | 74 Car origin | Domestic | 52 (70.3%) Foreign | 22 (29.7%) Repair Record #1 | 2 (2.9%) Mileage (mpg) | 21.297 (5.786) Turn circle (ft.) | 39.649 (4.399) Trunk space (cu. ft.) | 13.757 (4.277) ---------------------------------------

Is this even possible?

Thanks,
Kevin

Last edited by Kevin Blaine; 18 May 2023, 10:59.
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 686

18 May 2023, 12:19

Yes, very possible. You can do this with dtable without combining its collection with other collections from table, but you will need collect to make a small change to get the label in your example.

Here is an example using the auto data where I specify the level for rep78 using 1. instead of all levels via the i. operator.

Code:

. auto
(1978 automobile data)

. dtable i.for 1.rep mpg turn trunk

------------------------------------
                          Summary   
------------------------------------
N                                 74
Car origin                          
  Domestic                52 (70.3%)
  Foreign                 22 (29.7%)
Repair record 1978                  
  1                         2 (2.9%)
Mileage (mpg)         21.297 (5.786)
Turn circle (ft.)     39.649 (4.399)
Trunk space (cu. ft.) 13.757 (4.277)
------------------------------------

. 
. collect label levels var 1.rep78 "Repair Record #1", modify

. collect layout

Collection: DTable
      Rows: var
   Columns: result
   Table 1: 8 x 1

------------------------------------
                          Summary   
------------------------------------
N                                 74
Car origin                          
  Domestic                52 (70.3%)
  Foreign                 22 (29.7%)
Repair Record #1            2 (2.9%)
Mileage (mpg)         21.297 (5.786)
Turn circle (ft.)     39.649 (4.399)
Trunk space (cu. ft.) 13.757 (4.277)
------------------------------------

dtable uses all the non-missing values for each variable separately. If you wanted your statistics based on the same observations across all the variables, you would need to use option listwise -- which is the default behavior is most other Stata commands, but not for dtable.

You can specify more than one level: 1.rep78 3.rep78 5.rep78.
In this varlist context there is also support for a more concise specification: i(1 3 5).rep78.

Announcement