Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dtable and Specifying Missing Values Be Included in Percent Totals

    Hi there -- for dtable, is it possible to specify missing values be included in percent totals, similar to tab [var], mi?

    I have a one-way summary table wherein most values are missing due to skip logic, but I need the count of those missing to be included in total percentage calculations. I know I can use replace to substitute "0" for all the missing values, but I was hoping I could keep the integrity of the data by invoking an option in dtable to include missing values.

    Thanks,
    Kevin

  • #2
    Missing values cannot be supported with factor variables notation.

    However, I believe you can combine collections from table and dtable to construct the table you describe.

    Here is an example using the auto data, where variable rep78 has missing values.
    Code:
    . sysuse auto
    (1978 automobile data)
    
    . 
    . * use -dtable- for factor variables without missing values
    . dtable i.for mpg turn trunk
    
    ------------------------------------
                              Summary   
    ------------------------------------
    N                                 74
    Car origin                          
      Domestic                52 (70.3%)
      Foreign                 22 (29.7%)
    Mileage (mpg)         21.297 (5.786)
    Turn circle (ft.)     39.649 (4.399)
    Trunk space (cu. ft.) 13.757 (4.277)
    ------------------------------------
    
    . * look at the autolevels for later
    . collect query autolevels result
    
    Automatic dimension levels
    Collection: DTable
     Dimension: result
        Levels: _dtable_stats
    
    . sreturn list
    
    macros:
                 s(levels) : "_dtable_stats"
                s(dimname) : "result"
             s(collection) : "DTable"
    
    . local auto_result = s(levels)
    
    . 
    . * use -table- for each variable with missing values
    . table rep78, missing ///
    >         statistic(frequency) ///
    >         statistic(percent) ///
    >         nototal ///
    >         name(rep78Table)
    
    -----------------------------------------
                       |  Frequency   Percent
    -------------------+---------------------
    Repair record 1978 |                     
      1                |          2      2.70
      2                |          8     10.81
      3                |         30     40.54
      4                |         18     24.32
      5                |         11     14.86
      .                |          5      6.76
    -----------------------------------------
    
    . * recode the result levels to match the ones used by -dtable-
    . collect recode result frequency=fvfrequency
    (6 items recoded in collection rep78Table)
    
    . collect recode result percent=fvpercent
    (6 items recoded in collection rep78Table)
    
    . 
    . * combine the tables, use DTable first to make sure its style takes
    . * precedence
    . collect combine both = DTable rep78Table
    (current collection is both)
    
    . * fix the header style to show the factor variable's label
    . collect style header rep78 foreign, title(label)
    
    . * restore -dtable-'s autolevels for dimension result
    . collect style autolevels result `auto_result', clear
    
    . * update the layout for the combined collection
    . collect layout (var rep78) (result)
    
    Collection: both
          Rows: var rep78
       Columns: result
       Table 1: 14 x 1
    
    ---------------------------------------
                          |      Summary   
    ----------------------+----------------
    N                     |              74
    Car origin            |                
      Domestic            |      52 (70.3%)
      Foreign             |      22 (29.7%)
    Mileage (mpg)         |  21.297 (5.786)
    Turn circle (ft.)     |  39.649 (4.399)
    Trunk space (cu. ft.) |  13.757 (4.277)
    Repair record 1978    |                
      1                   |        2 (2.7%)
      2                   |       8 (10.8%)
      3                   |      30 (40.5%)
      4                   |      18 (24.3%)
      5                   |      11 (14.9%)
      .                   |        5 (6.8%)
    ---------------------------------------
    
    . 
    . * take more control of the variable order
    . collect layout (var[_N] var[i.foreign] rep78 var[mpg turn trunk]) (result)
    
    Collection: both
          Rows: var[_N] var[i.foreign] rep78 var[mpg turn trunk]
       Columns: result
       Table 1: 14 x 1
    
    ---------------------------------------
                          |      Summary   
    ----------------------+----------------
    N                     |              74
    Car origin            |                
      Domestic            |      52 (70.3%)
      Foreign             |      22 (29.7%)
    Repair record 1978    |                
      1                   |        2 (2.7%)
      2                   |       8 (10.8%)
      3                   |      30 (40.5%)
      4                   |      18 (24.3%)
      5                   |      11 (14.9%)
      .                   |        5 (6.8%)
    Mileage (mpg)         |  21.297 (5.786)
    Turn circle (ft.)     |  39.649 (4.399)
    Trunk space (cu. ft.) |  13.757 (4.277)
    ---------------------------------------
    
    . 
    . * factor variables can be specified outside of dimension -var-
    . collect layout (var[_N] foreign rep78 var[mpg turn trunk]) (result)
    
    Collection: both
          Rows: var[_N] foreign rep78 var[mpg turn trunk]
       Columns: result
       Table 1: 14 x 1
    
    ---------------------------------------
                          |      Summary   
    ----------------------+----------------
    N                     |              74
    Car origin            |                
      Domestic            |      52 (70.3%)
      Foreign             |      22 (29.7%)
    Repair record 1978    |                
      1                   |        2 (2.7%)
      2                   |       8 (10.8%)
      3                   |      30 (40.5%)
      4                   |      18 (24.3%)
      5                   |      11 (14.9%)
      .                   |        5 (6.8%)
    Mileage (mpg)         |  21.297 (5.786)
    Turn circle (ft.)     |  39.649 (4.399)
    Trunk space (cu. ft.) |  13.757 (4.277)
    ---------------------------------------

    Comment


    • #3
      Thanks Jeff for the incredibly detailed reply. Much appreciated! This certainly answers part of my question, and fresh eyes this morning reveal that I wasn't as clear as I could be in my initial question. Let me elaborate and clarify using your rep78 example:

      I'm interested in all the variables present in the final version of your table, except I actually don't want to look at all the repair records; I only am interested in Repair Record #1. So, this let's say I code appropriately (and correctly!) to get to the table below:

      Code:
        
      ---------------------------------------
                            |      Summary  
      ----------------------+----------------
      N                     |              74
      Car origin            |                
        Domestic            |      52 (70.3%)
        Foreign             |      22 (29.7%)
      Repair Record #1      |        2 (2.7%)        
      Mileage (mpg)         |  21.297 (5.786)
      Turn circle (ft.)     |  39.649 (4.399)
      Trunk space (cu. ft.) |  13.757 (4.277)
      ---------------------------------------

      Now, while that 2 (2.7%) is accurate in terms of 2/74, I'm not interested in reporting the percentage as it pertains to ALL cases, but rather only for those where a repair record is present. What I would like to do (and the root of my other question) is to manipulate dtable or table to be able to report "2 (2.9%)", which is 2/69 (i.e. The number of cases minus the missing values for rep78. My final table would show:

      Code:
        
      ---------------------------------------
                            |      Summary  
      ----------------------+----------------
      N                     |              74
      Car origin            |                
        Domestic            |      52 (70.3%)
        Foreign             |      22 (29.7%)
      Repair Record #1      |        2 (2.9%)        
      Mileage (mpg)         |  21.297 (5.786)
      Turn circle (ft.)     |  39.649 (4.399)
      Trunk space (cu. ft.) |  13.757 (4.277)
      ---------------------------------------

      Is this even possible?

      Thanks,
      Kevin
      Last edited by Kevin Blaine; 18 May 2023, 10:59.

      Comment


      • #4
        Yes, very possible. You can do this with dtable without combining its collection with other collections from table, but you will need collect to make a small change to get the label in your example.

        Here is an example using the auto data where I specify the level for rep78 using 1. instead of all levels via the i. operator.
        Code:
        . auto
        (1978 automobile data)
        
        . dtable i.for 1.rep mpg turn trunk
        
        ------------------------------------
                                  Summary   
        ------------------------------------
        N                                 74
        Car origin                          
          Domestic                52 (70.3%)
          Foreign                 22 (29.7%)
        Repair record 1978                  
          1                         2 (2.9%)
        Mileage (mpg)         21.297 (5.786)
        Turn circle (ft.)     39.649 (4.399)
        Trunk space (cu. ft.) 13.757 (4.277)
        ------------------------------------
        
        . 
        . collect label levels var 1.rep78 "Repair Record #1", modify
        
        . collect layout
        
        Collection: DTable
              Rows: var
           Columns: result
           Table 1: 8 x 1
        
        ------------------------------------
                                  Summary   
        ------------------------------------
        N                                 74
        Car origin                          
          Domestic                52 (70.3%)
          Foreign                 22 (29.7%)
        Repair Record #1            2 (2.9%)
        Mileage (mpg)         21.297 (5.786)
        Turn circle (ft.)     39.649 (4.399)
        Trunk space (cu. ft.) 13.757 (4.277)
        ------------------------------------
        dtable uses all the non-missing values for each variable separately. If you wanted your statistics based on the same observations across all the variables, you would need to use option listwise -- which is the default behavior is most other Stata commands, but not for dtable.

        You can specify more than one level: 1.rep78 3.rep78 5.rep78.
        In this varlist context there is also support for a more concise specification: i(1 3 5).rep78.

        Comment

        Working...
        X