Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • function dtable missing data



    Hello everyone,

    I just switched to STATA 18 and I'm exploring the "dtable" function for descriptive tables. I have two questions: is it possible to count missing data, and how can I display the parameters of quantitative variables in rows? I'm sharing my code and a copy-paste of what I would like to achieve:




    dtable Age i.sex, ///
    sample(, statistics(freq) place(seplabels)) ///
    continuous(, statistic(mean sd p25 med p75 min max)) ///
    factor(, statistic(fvfrequency fvpercent)) ///
    sformat("(N=%s)" frequency) ///
    note(Total sample: N = 19) ///
    nformat(%7.2f mean sd) ///



    Gender [n (%)]
    n/missing 154/0 146/0 300/0
    Male 99 (64.3) 83 (56.8) 182 (60.7)
    Female 55 (35.7) 63 (43.2) 118 (39.3)
    Age (years)
    n/missing 154/0 146/0 300/0
    Mean (SD) 56.9 (7.48) 56.4 (6.96) 56.7 (7.22)
    Median 58.0 58.0 58.0
    Q1; Q3 52.0; 63.0 52.0; 62.0 52.0; 62.0
    Min; Max 31; 69 30; 68 30; 69


    Thank you in advance for your response.

  • #2
    dtable does not count missing values in variables directly.
    However, you can build your table with a little data management,
    multiple calls to dtable, and some collect management.

    The example table from John has columns that appear to represent groups
    and a total, but the call to dtable in John's code is missing the
    by() option.

    Without an example dataset from John, I'll use something publicly
    available that has similar variables. Also, I'll adding a by()
    option to the calls to dtable so the final table looks similar to
    John's example.

    Here is my attempt to reproduce John's table.
    Code:
    webuse nhanes2l
    
    * collect counts of non-missing values in the variables of interest; do
    * not use 'i.' notation here
    dtable sex age, ///
        by(rural) ///
        continuous(, statistic(count)) ///
        name(CTable)
    * recode the factor variables with a custom level (outside what is
    * observed) so that we can place it at the end of the list
    collect recode var sex = 9999.sex
    collect style header var[9999.sex], level(hide)
    
    * collect counts of missing values in the variables of interest
    preserve
    replace sex = missing(sex)
    replace age = missing(age)
    dtable sex age, ///
        by(rural) ///
        continuous(, statistic(total)) ///
        name(MTable)
    * same as above
    collect recode var sex = 9999.sex
    collect style header var[9999.sex], level(hide)
    * recode our result to -missing- (a more descriptive result level)
    collect recode result total = missing
    * format result -missing- like -count-
    collect style cell result[missing], nformat(%21.0fc)
    restore
    
    * collect all other statistics
    dtable i.sex age, ///
        by(rural) ///
        sample(, statistic(frequency)) ///
        continuous(, statistic(mean sd q1 median q2 min max))
    
    * combine collections
    collect dir
    collect combine all = DTable CTable MTable
    
    * define some composite results that will be arranged in the rows
    
    * factor variable statistics row
    collect composite define ///
        rowf =    frequency ///
            fvfrequency fvpercent ///
        , trim
    * no label here, will be using the factor level
    
    * continuous variable statistics row
    collect composite define ///
        rowc =    mean sd ///
        , trim
    collect label levels result rowc "Mean (SD)"
    
    * quartile row
    collect composite define ///
        qrow = q1 q1 ///
        , trim delimiter("; ")
    collect label levels result qrow "Q1; Q3"
    
    * range row
    collect composite define ///
        rrow = min max ///
        , trim delimiter("; ")
    collect label levels result rrow "Min; Max"
    
    * n/missing row
    collect composite define ///
        nmiss = count missing ///
        , trim delimiter("/")
    collect label levels result nmiss "n/missing"
    
    * augment label on the factor variable
    collect label levels var ///
        sex "`:var label sex' [n (%)]" ///
        , modify
    
    collect layout
    
    * fix the default format to show 1 decimal digit instead of 3
    collect style cell, basestyle nformat("%21.1f")
    
    * use autolevels to fix the order of the results
    collect style autolevels result rowf rowc median qrow rrow nmiss, clear
    
    * -dtable- hides all result levels by default,
    * show the result level labels we want to see
    collect style header result[rowc median qrow rrow nmiss], level(label)
    
    * hide the -by()- variable's variable label
    collect style header rural, title(hide)
    
    * arrange the table
    collect layout (var#result) (rural)
    Here is the table so far.
    Code:
    -----------------------------------------------------
                    Urban         Rural         Total
    -----------------------------------------------------
    N                   6,548         3,803        10,351
    Sex [n (%)]
      Male      3,023 (46.2%) 1,892 (49.8%) 4,915 (47.5%)
      Female    3,525 (53.8%) 1,911 (50.2%) 5,436 (52.5%)
      n/missing       6,548/0       3,803/0      10,351/0
    Age (years)
      Mean (SD)   46.8 (17.5)   48.9 (16.7)   47.6 (17.2)
      Median             48.0          51.0          49.0
      Q1; Q3       30.0; 30.0    33.0; 33.0    31.0; 31.0
      Min; Max     20.0; 74.0    20.0; 74.0    20.0; 74.0
      n/missing       6,548/0       3,803/0      10,351/0
    -----------------------------------------------------
    To make the n/missing row show up first we will have to change the order
    of the levels of the factor variables in var and change the order
    of the results. Use collect style autolevels in both cases ...
    Code:
    * change order of levels of factor variables in -var-
    collect levels var
    collect style autolevels var _N 9999.sex 1.sex 2.sex age, clear
    
    * change order of results, only affects continuous variables
    collect style autolevels result nmiss rowf rowc median qrow rrow, clear
    
    collect preview
    Here is the final table.
    Code:
    -----------------------------------------------------
                    Urban         Rural         Total
    -----------------------------------------------------
    N                   6,548         3,803        10,351
    Sex [n (%)]
      n/missing       6,548/0       3,803/0      10,351/0
      Male      3,023 (46.2%) 1,892 (49.8%) 4,915 (47.5%)
      Female    3,525 (53.8%) 1,911 (50.2%) 5,436 (52.5%)
    Age (years)
      n/missing       6,548/0       3,803/0      10,351/0
      Mean (SD)   46.8 (17.5)   48.9 (16.7)   47.6 (17.2)
      Median             48.0          51.0          49.0
      Q1; Q3       30.0; 30.0    33.0; 33.0    31.0; 31.0
      Min; Max     20.0; 74.0    20.0; 74.0    20.0; 74.0
    -----------------------------------------------------

    Comment


    • #3
      Thank you very much, Jeff, it's exactly what I wanted. I'm sorry again to bother you, but I just have one last question. I adapted your code for single-group data, and for the factor variables, the final result puts the frequency on another line, and I couldn't figure out how to remove that.

      Code:
      * collect counts of non-missing values in the variables of interest; do
      * not use 'i.' notation here
      dtable sex age, ///
          continuous(, statistic(count)) ///
          name(CTable)
      * recode the factor variables with a custom level (outside what is
      * observed) so that we can place it at the end of the list
      collect recode var sex = 9999.sex
      collect style header var[9999.sex], level(hide)
      
      * collect counts of missing values in the variables of interest
      preserve
      replace sex = missing(sex)
      replace age = missing(age)
      dtable sex age, ///
          continuous(, statistic(total)) ///
          name(MTable)
      * same as above
      collect recode var sex = 9999.sex
      collect style header var[9999.sex], level(hide)
      * recode our result to -missing- (a more descriptive result level)
      collect recode result total = missing
      * format result -missing- like -count-
      collect style cell result[missing], nformat(%21.0fc)
      restore
      
      * collect all other statistics
      dtable i.sex age, ///
          sample(, statistic(frequency)) ///
          continuous(, statistic(mean sd q1 median q3 min max))
      
      * combine collections
      collect dir
      collect combine all = DTable CTable MTable
      
      * define some composite results that will be arranged in the rows
      
      * factor variable statistics row
      collect composite define ///
          rowf =    frequency ///
              fvfrequency fvpercent ///
          , trim
      * no label here, will be using the factor level
      
      * continuous variable statistics row
      collect composite define ///
          rowc =    mean sd ///
          , trim
      collect label levels result rowc "Mean (SD)"
      
      * quartile row
      collect composite define ///
          qrow = q1 q3 ///
          , trim delimiter("; ")
      collect label levels result qrow "Q1; Q3"
      
      * range row
      collect composite define ///
          rrow = min max ///
          , trim delimiter("; ")
      collect label levels result rrow "Min; Max"
      
      * n/missing row
      collect composite define ///
          nmiss = count missing ///
          , trim delimiter("/")
      collect label levels result nmiss "n/missing"
      
      collect layout
      
      * fix the default format to show 1 decimal digit instead of 3
      collect style cell, basestyle nformat("%21.1f")
      
      * use autolevels to fix the order of the results
      collect style autolevels result rowf rowc median qrow rrow nmiss, clear
      
      * -dtable- hides all result levels by default,
      * show the result level labels we want to see
      collect style header result[rowc median qrow rrow nmiss], level(label)
      
      * arrange the table
      collect layout (var#result)


      and here is the table :


      Code:
                                  
      Sexe                                       
      Masculin                                       
      Frequency    Factor    variable    frequency    (Factor    variable    percent%)    14 (73.7%)
      Féminin                                       
      Frequency    Factor    variable    frequency    (Factor    variable    percent%)    5 (26.3%)
      n/missing                            19/0
      Age                                       
      Mean (SD)                            55.0 (17.6)
      Median                            58.5
      Q1; Q3                            41.0; 71.0
      Min; Max                            25.0; 75.0
      n/missing                            18/1
      Thank you again for your help and availability

      Comment


      • #4
        Change
        Code:
        * -dtable- hides all result levels by default,
        * show the result level labels we want to see
        collect style header result[rowc median qrow rrow nmiss], level(label)
        to
        Code:
        * hide result levels by default
        collect style header result, level(hide)
        
        * show the result level labels we want to see
        collect style header result[rowc median qrow rrow nmiss], level(label)

        Comment


        • #5
          Thank you very much, it's perfect.
          Have a nice day!

          Comment


          • #6
            Hello,
            Thank you very much for this code. It's perfect for descriptive tables. I wanted to use it and add bivariate tests, so I tried to perform a Student's t-test for the quantitative variable and an exact Fisher test for the qualitative variable, also removing the total column. So, I wrote this code:



            Code:
            clear all
            
            webuse nhanes2l
            
            * collect counts of non-missing values in the variables of interest; do
            * not use 'i.' notation here
            dtable sex age, ///
                by(rural) ///
                continuous(, statistic(count)) ///
                name(CTable)
            * recode the factor variables with a custom level (outside what is
            * observed) so that we can place it at the end of the list
            collect recode var sex = 9999.sex
            collect style header var[9999.sex], level(hide)
            
            * collect counts of missing values in the variables of interest
            preserve
            replace sex = missing(sex)
            replace age = missing(age)
            dtable sex age, ///
                by(rural) ///
                continuous(, statistic(total)) ///
                name(MTable)
            * same as above
            collect recode var sex = 9999.sex
            collect style header var[9999.sex], level(hide)
            * recode our result to -missing- (a more descriptive result level)
            collect recode result total = missing
            * format result -missing- like -count-
            collect style cell result[missing], nformat(%21.0fc)
            restore
            
            * collect all other statistics
            dtable i.sex age, ///
                by(rural, test nototal) ///
                sample(, statistic(frequency)) ///
                continuous(, statistic(mean sd q1 median q2 min max) test(regress)) ///
                factor (, statistics(fvfreq fvpercent) test(fisher))
            
            * combine collections
            collect dir
            collect combine all = DTable CTable MTable
            
            * define some composite results that will be arranged in the rows
            
            * factor variable statistics row
            collect composite define ///
                rowf =    frequency ///
                    fvfrequency fvpercent _dtable_test ///
                , trim
            * no label here, will be using the factor level
            
            * continuous variable statistics row
            collect composite define ///
                rowc =    mean sd ///
                , trim
            collect label levels result rowc "Mean (SD)"
            
            * quartile row
            collect composite define ///
                qrow = q1 q1 ///
                , trim delimiter("; ")
            collect label levels result qrow "Q1; Q3"
            
            * range row
            collect composite define ///
                rrow = min max ///
                , trim delimiter("; ")
            collect label levels result rrow "Min; Max"
            
            * n/missing row
            collect composite define ///
                nmiss = count missing ///
                , trim delimiter("/")
            collect label levels result nmiss "n/missing"
            
            * augment label on the factor variable
            collect label levels var ///
                sex "`:var label sex' [n (%)]" ///
                , modify
            
            collect layout
            
            * fix the default format to show 1 decimal digit instead of 3
            collect style cell, basestyle nformat("%21.1f")
            
            * use autolevels to fix the order of the results
            collect style autolevels result rowf rowc median qrow rrow nmiss, clear
            
            * -dtable- hides all result levels by default,
            * show the result level labels we want to see
            collect style header result[rowc median qrow rrow nmiss], level(label)
            
            * hide the -by()- variable's variable label
            collect style header rural, title(hide)
            
            * arrange the table
            collect layout (var#result) (rural)
            
            * change order of levels of factor variables in -var-
            collect levels var
            collect style autolevels var _N 9999.sex 1.sex 2.sex age, clear
            
            * change order of results, only affects continuous variables
            collect style autolevels result nmiss rowf rowc median qrow rrow, clear
            
            collect preview

            ChatGPT And it gave me this:




            Code:
            -------------------------------------------------------
                            Urban         Rural      Test    Total
            -------------------------------------------------------
            N                   6,548         3,803                
            Sex [n (%)]                                            
              n/missing       6,548/0       3,803/0        10,351/0
              Male      3,023 (46.2%) 1,892 (49.8%) <0.001         
              Female    3,525 (53.8%) 1,911 (50.2%)                
            Age (years)                                            
              n/missing       6,548/0       3,803/0        10,351/0
                                                    <0.001         
              Mean (SD)   46.8 (17.5)   48.9 (16.7)                
              Median             48.0          51.0                
              Q1; Q3       30.0; 30.0    33.0; 33.0                
              Min; Max     20.0; 74.0    20.0; 74.0                
            -------------------------------------------------------
            I'm having difficulty removing the total column and placing the p-value in the "test" column at the level of n/missing. Additionally, I haven't been able to place the N (N=X) below each group.
            Thanks in advance for your response, and have a great day.



            Comment


            • #7
              My edits are indicated with EDIT and highlighted in blue.
              Code:
              clear all
              
              webuse nhanes2l
              
              * collect counts of non-missing values in the variables of interest; do
              * not use 'i.' notation here
              dtable sex age, ///
                  by(rural, nototal) /// <-- EDIT
                  continuous(, statistic(count)) ///
                  name(CTable)
              * recode the factor variables with a custom level (outside what is
              * observed) so that we can place it at the end of the list
              collect recode var sex = 9999.sex
              collect style header var[9999.sex], level(hide)
              
              * collect counts of missing values in the variables of interest
              preserve
              replace sex = missing(sex)
              replace age = missing(age)
              dtable sex age, ///
                  by(rural, nototal) /// <-- EDIT
                  continuous(, statistic(total)) ///
                  name(MTable)
              * same as above
              collect recode var sex = 9999.sex
              collect style header var[9999.sex], level(hide)
              * recode our result to -missing- (a more descriptive result level)
              collect recode result total = missing
              * format result -missing- like -count-
              collect style cell result[missing], nformat(%21.0fc)
              restore
              
              * collect all other statistics
              dtable i.sex age, ///
                  by(rural, test nototal) ///
                  sample(, statistic(frequency)) ///
                  continuous(, statistic(mean sd q1 median q2 min max) test(regress)) ///
                  factor (, statistics(fvfreq fvpercent) test(fisher))
              
              * combine collections
              collect dir
              collect combine all = DTable CTable MTable
              
              * EDIT: use the custom factor level to tag the test statistic
              collect recode sex 1 = 9999, ///
                  fortags(result[fisher])
              collect recode var 1.sex = 9999.sex, ///
                  fortags(result[fisher])
              
              * define some composite results that will be arranged in the rows
              
              * factor variable statistics row
              collect composite define ///
                  rowf =    frequency ///
                      fvfrequency fvpercent /// EDIT: remove _dtable_test
                  , trim
              * no label here, will be using the factor level
              
              * continuous variable statistics row
              collect composite define ///
                  rowc =    mean sd ///
                  , trim
              collect label levels result rowc "Mean (SD)"
              
              * quartile row
              collect composite define ///
                  qrow = q1 q1 ///
                  , trim delimiter("; ")
              collect label levels result qrow "Q1; Q3"
              
              * range row
              collect composite define ///
                  rrow = min max ///
                  , trim delimiter("; ")
              collect label levels result rrow "Min; Max"
              
              * n/missing row
              collect composite define ///
                  nmiss = count missing fisher regress /// <-- EDIT
                  , trim delimiter("/")
              collect label levels result nmiss "n/missing"
              
              * augment label on the factor variable
              collect label levels var ///
                  sex "`:var label sex' [n (%)]" ///
                  , modify
              
              collect layout
              
              * fix the default format to show 1 decimal digit instead of 3
              collect style cell, basestyle nformat("%21.1f")
              
              * use autolevels to fix the order of the results
              collect style autolevels result rowf rowc median qrow rrow nmiss, clear
              
              * -dtable- hides all result levels by default,
              * show the result level labels we want to see
              collect style header result[rowc median qrow rrow nmiss], level(label)
              
              * hide the -by()- variable's variable label
              collect style header rural, title(hide)
              
              * arrange the table
              collect layout (var#result) (rural)
              
              * change order of levels of factor variables in -var-
              collect levels var
              collect style autolevels var _N 9999.sex 1.sex 2.sex age, clear
              
              * change order of results, only affects continuous variables
              collect style autolevels result nmiss rowf rowc median qrow rrow, clear
              
              collect preview
              Here is the resulting table.
              Code:
              ----------------------------------------------
                              Urban         Rural      Test
              ----------------------------------------------
              N                   6,548         3,803
              Sex [n (%)]
                n/missing       6,548/0       3,803/0 <0.001
                Male      3,023 (46.2%) 1,892 (49.8%)
                Female    3,525 (53.8%) 1,911 (50.2%)
              Age (years)
                n/missing       6,548/0       3,803/0 <0.001
                Mean (SD)   46.8 (17.5)   48.9 (16.7)
                Median             48.0          51.0
                Q1; Q3       30.0; 30.0    33.0; 33.0
                Min; Max     20.0; 74.0    20.0; 74.0
              ----------------------------------------------

              Comment


              • #8
                Hello,

                Sorry, I was away and just saw your response, thank you very much, it's perfect. Just one more thing, I tried to place N just below "Urban" and "Rural" in the form (N=XXX) by adding place(seplabels) in the sample option, but it doesn't appear and I haven't found how to do it. I don't think it's a big deal, but I've tried in several ways and can't get it to work. Thanks in advance.



                Comment


                • #9
                  My edits are indicated by EDIT and highlighted in blue.
                  Code:
                  clear all
                  
                  webuse nhanes2l
                  
                  * collect counts of non-missing values in the variables of interest; do
                  * not use 'i.' notation here
                  dtable sex age, ///
                      by(rural, nototal) ///
                      sample(, place(seplabels)) /// <-- EDIT
                      continuous(, statistic(count)) ///
                      name(CTable)
                  * recode the factor variables with a custom level (outside what is
                  * observed) so that we can place it at the end of the list
                  collect recode var sex = 9999.sex
                  collect style header var[9999.sex], level(hide)
                  
                  * collect counts of missing values in the variables of interest
                  preserve
                  replace sex = missing(sex)
                  replace age = missing(age)
                  dtable sex age, ///
                      by(rural, nototal) ///
                      sample(, place(seplabels)) /// <-- EDIT
                      continuous(, statistic(total)) ///
                      name(MTable)
                  * same as above
                  collect recode var sex = 9999.sex
                  collect style header var[9999.sex], level(hide)
                  * recode our result to -missing- (a more descriptive result level)
                  collect recode result total = missing
                  * format result -missing- like -count-
                  collect style cell result[missing], nformat(%21.0fc)
                  restore
                  
                  * collect all other statistics
                  dtable i.sex age, ///
                      by(rural, test nototal) ///
                      sample(, statistic(frequency) place(seplabels)) /// <-- EDIT
                      sformat("(N=%s)" frequency) ///
                      continuous(, statistic(mean sd q1 median q2 min max) test(regress)) ///
                      factor(, statistics(fvfreq fvpercent) test(fisher))
                  
                  * combine collections
                  collect dir
                  collect combine all = DTable CTable MTable
                  
                  * use the custom factor level to tag the test statistic
                  collect recode sex 1 = 9999, ///
                      fortags(result[fisher])
                  collect recode var 1.sex = 9999.sex, ///
                      fortags(result[fisher])
                  
                  * define some composite results that will be arranged in the rows
                  
                  * factor variable statistics row
                  collect composite define ///
                      rowf =    frequency ///
                          fvfrequency fvpercent /// remove _dtable_test
                      , trim
                  * no label here, will be using the factor level
                  
                  * continuous variable statistics row
                  collect composite define ///
                      rowc =    mean sd ///
                      , trim
                  collect label levels result rowc "Mean (SD)"
                  
                  * quartile row
                  collect composite define ///
                      qrow = q1 q1 ///
                      , trim delimiter("; ")
                  collect label levels result qrow "Q1; Q3"
                  
                  * range row
                  collect composite define ///
                      rrow = min max ///
                      , trim delimiter("; ")
                  collect label levels result rrow "Min; Max"
                  
                  * n/missing row
                  collect composite define ///
                      nmiss = count missing fisher regress ///
                      , trim delimiter("/")
                  collect label levels result nmiss "n/missing"
                  
                  * augment label on the factor variable
                  collect label levels var ///
                      sex "`:var label sex' [n (%)]" ///
                      , modify
                  
                  collect layout
                  
                  * fix the default format to show 1 decimal digit instead of 3
                  collect style cell, basestyle nformat("%21.1f")
                  
                  * use autolevels to fix the order of the results
                  collect style autolevels result rowf rowc median qrow rrow nmiss, clear
                  
                  * -dtable- hides all result levels by default,
                  * show the result level labels we want to see
                  collect style header result[rowc median qrow rrow nmiss], level(label)
                  
                  * hide the -by()- variable's variable label
                  collect style header rural, title(hide)
                  
                  * arrange the table
                  * EDIT: add undocumented dimension that puts the sample stats in the column header
                  collect layout (var#result) (rural#_dtable_sample_dim)
                  
                  * change order of levels of factor variables in -var-
                  collect levels var
                  collect style autolevels var _N 9999.sex 1.sex 2.sex age, clear
                  
                  * change order of results, only affects continuous variables
                  collect style autolevels result nmiss rowf rowc median qrow rrow, clear
                  
                  collect preview
                  Here is the final table.
                  Code:
                  ----------------------------------------------
                                  Urban         Rural      Test 
                                (N=6,548)     (N=3,803)         
                  ----------------------------------------------
                  Sex [n (%)]                                   
                    n/missing       6,548/0       3,803/0 <0.001
                    Male      3,023 (46.2%) 1,892 (49.8%)       
                    Female    3,525 (53.8%) 1,911 (50.2%)       
                  Age (years)                                   
                    n/missing       6,548/0       3,803/0 <0.001
                    Mean (SD)   46.8 (17.5)   48.9 (16.7)       
                    Median             48.0          51.0       
                    Q1; Q3       30.0; 30.0    33.0; 33.0       
                    Min; Max     20.0; 74.0    20.0; 74.0       
                  ----------------------------------------------

                  Comment

                  Working...
                  X