Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Formatting binary and categorical variables and p-values using table

    I'm trying to build a table using the table command in Stata 17.0. I've set up code to display means and SDs for continuous variables (cont), and N's and %'s for categorical (cat) and binary variables (bin). This code I have works well for outputting continuous variables and their p-values across groups but the formatting is a bit off for categorical variables and binary variables.

    As pictured below, the categorical variable header is showing up in two rows, and the p-value is on the first row. I'd like to only have the header appear on one row and the p-value along with it.

    For binary variables, I'd like to only see the "success" level, so for this example, only b=1. Currently they're set up the same as the categorical variables.

    As a final note, I'm trying to set the code up to be flexible, where if I wanted to add/subtract variables all I'd need to do is add/subtract them from local macros. Thanks for the help!

    Code:
    clear
    set more off
    collect clear
    
    /* simulating data */
    set seed 82
    set obs 1000
    g female = runiformint(0,1)
    label define female 0 "Male" 1 "Female"
    label values female female
    g cont = female + runiformint(1,7) + runiform() if runiform() > .05
    g cat = 1
    replace cat = 2 if cont > 2
    replace cat = 3 if cont > 5
    g bin = cont > 4
    
    
    /* putting the variables in macros */
    local stratvar "female"
    local catvars "cat"
    local binvars "bin"
    local contvars "cont"
    
    table (var) (`stratvar' result), ///
        name(stats) ///
        statistic(mean `contvars') ///
        statistic(sd `contvars') ///
        statistic(fvfrequency `catvars' `binvars') ///
        statistic(fvpercent `catvars' `binvars')
    
    /* generating p-values */
    collect create fits
    foreach var in `contvars' `catvars' `binvars' {
        collect p = Ftail(e(df_m), e(df_r), e(F)) ///
        , tag(var[`var']) ///
        : anova `var' `stratvar'
    }
    
    /* formatting */
    collect layout (var) (result)
    collect combine full = stats fits
    collect recode result mean = column1 ///
                          sd = column2 ///
                          fvfrequency = column1 ///
                          fvpercent = column2 ///
                          total = column1 ///
                          percent = column2
    collect layout (var) (`stratvar'#result[column1 column2] result[p])
    collect style cell var[`catvars' `binvars']#result[column1], nformat(%6.0fc)
    collect style cell var[`catvars' `binvars']#result[column2], nformat(%6.1f) sformat("%s%%")
    collect style cell var[`contvars']#result[column1 column2], nformat(%6.1f)
    collect style cell var[`contvars']#result[column2], sformat("(%s)")
    collect style cell result[p], nformat(%6.3f)
    collect label levels result p "p-value", modify
    collect style header `stratvar', title(hide)
    collect style header result[column1 column2], level(hide)
    collect style row stack, nobinder spacer
    collect style cell border_block, border(right, pattern(nil))
    collect preview
    
    -------------------------------------------------------
              Male         Female        Total      p-value
    -------------------------------------------------------
    cont  4.5   (2.1)   5.5   (2.0)   5.0   (2.1)     0.000
    cat                                               0.000
                                                           
    cat                                                    
      1    65   13.0%     0    0.0%    65    6.5%          
      2   208   41.5%   206   41.3%   414   41.4%          
      3   228   45.5%   293   58.7%   521   52.1%          
                                                           
    bin                                               0.000
                                                           
    bin                                                    
      0   217   43.3%   138   27.7%   355   35.5%          
      1   284   56.7%   361   72.3%   645   64.5%          
    -------------------------------------------------------

  • #2
    I got this to work by treating the categorical and binary variables differently.
    I use tabulate to get the p-values, and treat these variables as dimensions in the layout and styles.
    This does not require any changes to the table command because collect automatically adds factor variables in dimension var to the item tags as their own dimensions.

    The following is a modified version of the above:
    Code:
    clear
    set more off
    collect clear
    
    /* simulating data */
    set seed 82
    set obs 1000
    g female = runiformint(0,1)
    label define female 0 "Male" 1 "Female"
    label values female female
    g cont = female + runiformint(1,7) + runiform() if runiform() > .05
    g cat = 1
    replace cat = 2 if cont > 2
    replace cat = 3 if cont > 5
    g bin = cont > 4
    
    
    /* putting the variables in macros */
    local stratvar "female"
    local catvars "cat"
    local binvars "bin"
    local contvars "cont"
    
    table (var) (`stratvar' result), ///
        name(stats) ///
        statistic(mean `contvars') ///
        statistic(sd `contvars') ///
        statistic(fvfrequency `catvars' `binvars') ///
        statistic(fvpercent `catvars' `binvars')
    
    /* generating p-values */
    collect create fits
    foreach var in `contvars' {
        collect p = Ftail(e(df_m), e(df_r), e(F)) ///
        , tag(var[`var']) ///
        : anova `var' `stratvar'
    }
    
    foreach var in `catvars' `binvars' {
        tabulate `var' `stratvar', chi2
        collect get p = (r(p)), tag(`var'[_hide])
        collect style header `var', title(label)
    }
    
    /* formatting */
    collect style autolevels var `contvars', clear
    collect layout (var `catvars' `binvars') (result)
    collect combine full = stats fits
    foreach var in `catvars' {
        collect levels `var'
        collect style autolevels `var' _hide `s(levels)', clear
    }
    foreach var in `binvars' {
        collect style autolevels `var' _hide 1, clear
    }
    collect recode result mean = column1 ///
                          sd = column2 ///
                          fvfrequency = column1 ///
                          fvpercent = column2 ///
                          total = column1 ///
                          percent = column2
    collect layout (var `catvars' `binvars') (`stratvar'#result[column1 column2] result[p])
    collect style cell (`catvars' `binvars')#result[column1], nformat(%6.0fc)
    collect style cell (`catvars' `binvars')#result[column2], nformat(%6.1f) sformat("%s%%")
    collect style cell var[`contvars']#result[column1 column2], nformat(%6.1f)
    collect style cell var[`contvars']#result[column2], sformat("(%s)")
    collect style cell result[p], nformat(%6.3f)
    collect label levels result p "p-value", modify
    collect style header `stratvar', title(hide)
    collect style header result[column1 column2], level(hide)
    collect style row stack, nobinder spacer
    collect style cell border_block, border(right, pattern(nil))
    collect preview
    Here is the resulting table
    Code:
    -------------------------------------------------------
              Male         Female        Total      p-value
    -------------------------------------------------------
    cont  4.5   (2.1)   5.5   (2.0)   5.0   (2.1)     0.000
                                                           
    cat                                               0.000
      1    65   13.0%     0    0.0%    65    6.5%          
      2   208   41.5%   206   41.3%   414   41.4%          
      3   228   45.5%   293   58.7%   521   52.1%          
                                                           
    bin                                               0.000
      1   284   56.7%   361   72.3%   645   64.5%          
    -------------------------------------------------------

    Comment


    • #3
      Wow, thanks so much Jeff, this works great! One more question: Is there any way to get the statistics for the binary variable on the same row as the variable header?

      Comment


      • #4
        Yes, but you have to tag the binary variables differently.
        In the following I brake the binary variables out into their own loop for computing the p-value, apply a different tag and header style, and set their auto levels to 1.
        Code:
        clear
        set more off
        collect clear
        
        /* simulating data */
        set seed 82
        set obs 1000
        g female = runiformint(0,1)
        label define female 0 "Male" 1 "Female"
        label values female female
        g cont = female + runiformint(1,7) + runiform() if runiform() > .05
        g cat = 1
        replace cat = 2 if cont > 2
        replace cat = 3 if cont > 5
        g bin = cont > 4
        
        
        /* putting the variables in macros */
        local stratvar "female"
        local catvars "cat"
        local binvars "bin"
        local contvars "cont"
        
        table (var) (`stratvar' result), ///
            name(stats) ///
            statistic(mean `contvars') ///
            statistic(sd `contvars') ///
            statistic(fvfrequency `catvars' `binvars') ///
            statistic(fvpercent `catvars' `binvars')
        
        /* generating p-values */
        collect create fits
        foreach var in `contvars' {
            collect p = Ftail(e(df_m), e(df_r), e(F)) ///
            , tag(var[`var']) ///
            : anova `var' `stratvar'
        }
        
        foreach var in `catvars' {
            tabulate `var' `stratvar', chi2
            collect get p = (r(p)), tag(`var'[_hide])
            collect style header `var', title(label)
        }
        
        foreach var in `binvars' {
            tabulate `var' `stratvar', chi2
            collect get p = (r(p)), tag(var[1.`var'])
            collect style header `var'[1], title(label) level(hide)
        }
        
        /* formatting */
        collect style autolevels var `contvars', clear
        collect layout (var `catvars' `binvars') (result)
        collect combine full = stats fits
        foreach var in `catvars' {
            collect levels `var'
            collect style autolevels `var' _hide `s(levels)', clear
        }
        foreach var in `binvars' {
            collect style autolevels `var' 1, clear
        }
        collect recode result mean = column1 ///
                              sd = column2 ///
                              fvfrequency = column1 ///
                              fvpercent = column2 ///
                              total = column1 ///
                              percent = column2
        collect layout (var `catvars' `binvars') (`stratvar'#result[column1 column2] result[p])
        collect style cell (`catvars' `binvars')#result[column1], nformat(%6.0fc)
        collect style cell (`catvars' `binvars')#result[column2], nformat(%6.1f) sformat("%s%%")
        collect style cell var[`contvars']#result[column1 column2], nformat(%6.1f)
        collect style cell var[`contvars']#result[column2], sformat("(%s)")
        collect style cell result[p], nformat(%6.3f)
        collect label levels result p "p-value", modify
        collect style header `stratvar', title(hide)
        collect style header result[column1 column2], level(hide)
        collect style row stack, nobinder spacer
        collect style cell border_block, border(right, pattern(nil))
        collect preview
        Here is the resulting table
        Code:
        -------------------------------------------------------
                  Male         Female        Total      p-value
        -------------------------------------------------------
        cont  4.5   (2.1)   5.5   (2.0)   5.0   (2.1)     0.000
                                                               
        cat                                               0.000
          1    65   13.0%     0    0.0%    65    6.5%          
          2   208   41.5%   206   41.3%   414   41.4%          
          3   228   45.5%   293   58.7%   521   52.1%          
                                                               
        bin   284   56.7%   361   72.3%   645   64.5%     0.000
        -------------------------------------------------------

        Comment


        • #5
          Perfect. Thanks so much, Jeff.

          Comment


          • #6
            Not sure if I should make a separate post for this, but I'm also trying to get a count for each of the variables on the same row as the header and p-value. This would be a nonmissings count, and it would also be split by the stratification variable. I've tried using the showcounts option but it puts the count on each level of the categorical variable and I haven't been able to fix that.
            Last edited by Matthew Wallace; 11 Jul 2023, 20:09.

            Comment


            • #7
              Try collecting the sample size where you are collecting the p-values.
              In the following, I highlight where I made changes from the above example.
              Code:
              clear
              set more off
              collect clear
              
              /* simulating data */
              set seed 82
              set obs 1000
              g female = runiformint(0,1)
              label define female 0 "Male" 1 "Female"
              label values female female
              g cont = female + runiformint(1,7) + runiform() if runiform() > .05
              g cat = 1
              replace cat = 2 if cont > 2
              replace cat = 3 if cont > 5
              g bin = cont > 4
              
              
              /* putting the variables in macros */
              local stratvar "female"
              local catvars "cat"
              local binvars "bin"
              local contvars "cont"
              
              table (var) (`stratvar' result), ///
                  name(stats) ///
                  statistic(mean `contvars') ///
                  statistic(sd `contvars') ///
                  statistic(fvfrequency `catvars' `binvars') ///
                  statistic(fvpercent `catvars' `binvars')
              
              /* generating p-values */
              collect create fits
              foreach var in `contvars' {
                  collect nobs = (e(N)) p = Ftail(e(df_m), e(df_r), e(F)) ///
                  , tag(var[`var']) ///
                  : anova `var' `stratvar'
              }
              
              foreach var in `catvars' {
                  tabulate `var' `stratvar', chi2
                  collect get nobs = (r(N)) p = (r(p)), tag(`var'[_hide])
                  collect style header `var', title(label)
              }
              
              foreach var in `binvars' {
                  tabulate `var' `stratvar', chi2
                  collect get nobs = (r(N)) p = (r(p)), tag(var[1.`var'])
                  collect style header `var'[1], title(label) level(hide)
              }
              
              /* formatting */
              collect style autolevels var `contvars', clear
              collect layout (var `catvars' `binvars') (result)
              collect combine full = stats fits
              foreach var in `catvars' {
                  collect levels `var'
                  collect style autolevels `var' _hide `s(levels)', clear
              }
              foreach var in `binvars' {
                  collect style autolevels `var' 1, clear
              }
              collect recode result mean = column1 ///
                                    sd = column2 ///
                                    fvfrequency = column1 ///
                                    fvpercent = column2 ///
                                    total = column1 ///
                                    percent = column2
              collect layout (var `catvars' `binvars') (`stratvar'#result[column1 column2] result[p nobs])
              collect style cell (`catvars' `binvars')#result[column1], nformat(%6.0fc)
              collect style cell (`catvars' `binvars')#result[column2], nformat(%6.1f) sformat("%s%%")
              collect style cell var[`contvars']#result[column1 column2], nformat(%6.1f)
              collect style cell var[`contvars']#result[column2], sformat("(%s)")
              collect style cell result[p], nformat(%6.3f)
              collect label levels result p "p-value", modify
              collect style cell result[nobs], nformat(%18.0fc)
              collect label levels result nobs "N", modify
              collect style header `stratvar', title(hide)
              collect style header result[column1 column2], level(hide)
              collect style row stack, nobinder spacer
              collect style cell border_block, border(right, pattern(nil))
              collect preview
              Here is the resulting table.
              Code:
              ---------------------------------------------------------------
                        Male         Female        Total      p-value       N
              ---------------------------------------------------------------
              cont  4.5   (2.1)   5.5   (2.0)   5.0   (2.1)     0.000     956
                                                                             
              cat                                               0.000   1,000
                1    65   13.0%     0    0.0%    65    6.5%                  
                2   208   41.5%   206   41.3%   414   41.4%                  
                3   228   45.5%   293   58.7%   521   52.1%                  
                                                                             
              bin   284   56.7%   361   72.3%   645   64.5%     0.000   1,000
              ---------------------------------------------------------------

              Comment


              • #8
                Got it, thanks so much. One last question and I'll be all set. I'd like to add headers to some of the binary variables (think check-all-that apply-questions). I've been trying to do this using collect addtags, but I can't get it to work. Since we only have one binary variable in this example, adding a header to just that one would be fine, assuming the same code could be applied to additional binary variables later on. Thanks again!

                Comment


                • #9
                  If I'm following what you want, it is a matter of letting the "special" binary variables show a custom title and relabel their 1 level with the variable label/name (or something else).
                  In the following I add a new binary variable named nib to a special binary variable list and add some logic to treat the regular and special binary variables differently.
                  My labelling here is nothing special, but I think you get the idea that you can code up the labelling to suit your needs, maybe even having more than one grouping for your binary variables, each getting their own labelling.
                  Code:
                  clear
                  set more off
                  collect clear
                  
                  /* simulating data */
                  set seed 82
                  set obs 1000
                  g female = runiformint(0,1)
                  label define female 0 "Male" 1 "Female"
                  label values female female
                  g cont = female + runiformint(1,7) + runiform() if runiform() > .05
                  g cat = 1
                  replace cat = 2 if cont > 2
                  replace cat = 3 if cont > 5
                  g bin = cont > 4
                  g nib = inrange(cont, 2, 4)
                  
                  
                  /* putting the variables in macros */
                  local stratvar "female"
                  local catvars "cat"
                  local regular_binvars "bin"
                  local special_binvars "nib"
                  local binvars "`regular_binvars' `special_binvars'"
                  local contvars "cont"
                  
                  table (var) (`stratvar' result), ///
                      name(stats) ///
                      statistic(mean `contvars') ///
                      statistic(sd `contvars') ///
                      statistic(fvfrequency `catvars' `binvars') ///
                      statistic(fvpercent `catvars' `binvars')
                  
                  /* generating p-values */
                  collect create fits
                  foreach var in `contvars' {
                      collect nobs = (e(N)) p = Ftail(e(df_m), e(df_r), e(F)) ///
                      , tag(var[`var']) ///
                      : anova `var' `stratvar'
                  }
                  
                  foreach var in `catvars' {
                      tabulate `var' `stratvar', chi2
                      collect get nobs = (r(N)) p = (r(p)), tag(`var'[_hide])
                      collect style header `var', title(label)
                  }
                  
                  foreach var in `binvars' {
                      tabulate `var' `stratvar', chi2
                      collect get nobs = (r(N)) p = (r(p)), tag(var[1.`var'])
                      if `:list var in regular_binvars' {
                          collect style header `var'[1], title(label) level(hide)
                      }
                  }
                  
                  /* formatting */
                  collect style autolevels var `contvars', clear
                  collect layout (var `catvars' `binvars') (result)
                  collect combine full = stats fits
                  foreach var in `catvars' {
                      collect levels `var'
                      collect style autolevels `var' _hide `s(levels)', clear
                  }
                  foreach var in `binvars' {
                      collect style autolevels `var' 1, clear
                      if `:list var in special_binvars' {
                          local lab : var lab `var'
                          if `"`lab'"' == "" {
                                  local lab `var'
                          }
                          collect label levels `var' 1 `"`lab'"', modify
                          collect label dim `var' "Special `var'", modify
                      }
                  }
                  collect recode result mean = column1 ///
                                        sd = column2 ///
                                        fvfrequency = column1 ///
                                        fvpercent = column2 ///
                                        total = column1 ///
                                        percent = column2
                  collect layout (var `catvars' `binvars') (`stratvar'#result[column1 column2] result[p nobs])
                  collect style cell (`catvars' `binvars')#result[column1], nformat(%6.0fc)
                  collect style cell (`catvars' `binvars')#result[column2], nformat(%6.1f) sformat("%s%%")
                  collect style cell var[`contvars']#result[column1 column2], nformat(%6.1f)
                  collect style cell var[`contvars']#result[column2], sformat("(%s)")
                  collect style cell result[p], nformat(%6.3f)
                  collect label levels result p "p-value", modify
                  collect style cell result[nobs], nformat(%18.0fc)
                  collect label levels result nobs "N", modify
                  collect style header `stratvar', title(hide)
                  collect style header result[column1 column2], level(hide)
                  collect style row stack, nobinder spacer
                  collect style cell border_block, border(right, pattern(nil))
                  collect preview
                  Here is the resulting table.
                  Code:
                  ----------------------------------------------------------------------
                                   Male         Female        Total      p-value       N
                  ----------------------------------------------------------------------
                  cont         4.5   (2.1)   5.5   (2.0)   5.0   (2.1)     0.000     956
                                                                                        
                  cat                                                      0.000   1,000
                    1           65   13.0%     0    0.0%    65    6.5%                  
                    2          208   41.5%   206   41.3%   414   41.4%                  
                    3          228   45.5%   293   58.7%   521   52.1%                  
                                                                                        
                  bin          284   56.7%   361   72.3%   645   64.5%     0.000   1,000
                                                                                        
                  Special mid                                                           
                    mid        152   30.3%   138   27.7%   290   29.0%     0.350   1,000
                  ----------------------------------------------------------------------

                  Comment


                  • #10
                    Yes, this is very close to what I want. How would I go about, in this case, adding more binvars under the "Special mid" label? I've added a couple to the table below (bin1-bin2) for a visual. Could these be listed with the regular_binvars or would they need their own category? I figure this would require reordering the variables in the layout as well.
                    Attached Files
                    Last edited by Matthew Wallace; 14 Jul 2023, 10:57.

                    Comment


                    • #11
                      Thanks for the example table. It helped me better understand what you meant by check-all-that-apply questions.

                      I can envision encountering many such check-all-that-apply questions, so I needed to change my approach a little.

                      The following code block comes from an ado-file I wrote named mw_table.ado, based on the example code so far, but with changes to accommodate multiple grouped binary variables. I added checks to prevent a variable from being used in more than one place. I also replaced the collect recode result command with collect composite definitions for column 1 and 2. This allowed me to eliminate some of the collect style commands since composite results pick up the numeric and string formats of their elements.
                      Code:
                      *! version 1.0.0  14jul2023
                      program mw_table
                          version 17
                      
                          syntax ,            ///
                              by(varname)        ///
                          [                ///
                              BINary(varlist)        ///
                              CATegorical(varlist)    ///
                              CONTinuous(varlist)    ///
                              GROUPed(string asis)    ///
                              *            ///
                          ]
                      
                          // Do not allow variables to be specified in more than one
                          // option.
                      
                          local duplist binary categorical continuous
                          local k_duplist : list sizeof duplist
                      
                          forval i = 1/`k_duplist' {
                              local opt1 : word `i' of `duplist'
                              forval j = `=`i'+1'/`k_duplist' {
                                  local opt2 : word `j' of `duplist'
                                  local both : list `opt1' & `opt2'
                                  CheckDupVars "`both'" `opt1'() `opt2'()
                              }
                          }
                      
                          // Parse -grouped()- options.
                          // Check that grouped variables are not specified in the other
                          // options.
                          // Check that grouped names are not specified in
                          // the other options.
                          // Remaining options go to -table-.
                      
                          local gid 0
                          while `:length local grouped' {
                              local ++gid
                              ParseGroupOption `grouped'
                              local group`gid'vars = s(varlist)
                              local group`gid'name = s(name)
                              local group`gid'label = s(label)
                              forval i = 1/`k_duplist' {
                                  local opt1 : word `i' of `duplist'
                                  local both : list `opt1' & group`gid'vars
                                  CheckDupVars "`both'" `opt1'() group()
                              }
                              local both : list allgroupvars & group`gid'vars
                              CheckDupVars "`both'" group() group()
                              CheckDupNames `group`gid'name' `allgroupnames'
                              local allgroupvars `allgroupvars' `group`gid'vars'
                              local allgroupnames `allgroupnames' `group`gid'name'
                              local 0 `", `options'"'
                              syntax [, GROUPed(string asis) * ]
                          }
                          local k_grouped = `gid'
                      
                          // Check that grouped names are not also being used as
                          // variables.
                      
                          local both : list allgroupnames & binary
                          CheckNameVarConflict "`both'" binary()
                          local both : list allgroupnames & categorical
                          CheckNameVarConflict "`both'" categorical()
                          local both : list allgroupnames & continuous
                          CheckNameVarConflict "`both'" continuous()
                          local both : list allgroupnames & allgroupvars
                          CheckNameVarConflict "`both'" group()
                      
                          // Build the call to -table-.
                      
                          if `:list sizeof continuous' {
                              local CVopts    statistic(mean `continuous') ///
                                      statistic(sd `continuous')
                          }
                          local fvlist `binary' `categorical' `allgroupvars'
                          if `:list sizeof fvlist' {
                              local FVopts    statistic(fvfrequency `fvlist') ///
                                      statistic(fvpercent `fvlist')
                          }
                      
                          quietly table () (`by' result), `CVopts' `FVopts' `options'
                      
                          if `:list sizeof continuous' {
                              // -anova- needs a numercial by variable.
                              local bytype : type `by'
                              if substr("`bytype'",1,3) == "str" {
                                  tempvar numby
                                  encode `by', generate(`numby')
                              }
                              else {
                                  local numby `by'
                              }
                      
                              // continuous variables layout specification
                              local contspec var
                              collect style autolevels var `continuous', clear
                          }
                      
                          foreach x of local continuous {
                              quietly ///
                              collect nobs=(e(N)) p=Ftail(e(df_m),e(df_r),e(F)) ///
                                  , tag(var[`x']) ///
                                  : anova `x' `numby'
                          }
                      
                          foreach x of local categorical {
                              quietly tabulate `x' `by', chi2
                              collect get nobs=(r(N)) p=(r(p)), tag(`x'[_hide])
                              collect style header `x', title(label)
                              quietly collect levelsof `x'
                              collect style autolevels `x' _hide `s(levels)', clear
                          }
                      
                          foreach x of local binary {
                              quietly tabulate `x' `by', chi2
                              collect get nobs=(r(N)) p=(r(p)), tag(var[1.`x'])
                              collect style header `x'[1], title(label) level(hide)
                              collect style autolevels `x' 1, clear
                          }
                      
                          foreach x of local allgroupvars {
                              quietly tabulate `x' `by', chi2
                              collect get nobs=(r(N)) p=(r(p)), tag(var[1.`x'])
                              collect style autolevels `x' 1, clear
                          }
                      
                          forval i = 1/`k_grouped' {
                              local vars : copy local group`i'vars
                              local name : copy local group`i'name
                              local label : copy local group`i'label
                              local j 0
                              foreach x of local vars {
                                  local ++j
                                  local lab : variable label `x'
                                  if `"`lab'"' == "" {
                                      local lab `x'
                                  }
                                  quietly collect remap `x'[1] = `name'[`j']
                                  collect label levels `name' `j' `"`lab'"', modify
                              }
                              collect label dim `name' `"`label'"', modify
                          }
                      
                          // Define some composites for the columns. This allows us to
                          // let -table- handle result formats.
                      
                          collect composite define col1 = mean fvfrequency
                          collect composite define col2 = sd fvpercent
                      
                          // p-value styles
                      
                          collect style cell result[p], nformat(%6.2f)
                          collect label levels result p "p-value", modify
                      
                          // sample size styles
                      
                          collect style cell result[nobs], nformat(%18.0fc)
                          collect label levels result nobs "N", modify
                      
                          // header styles
                      
                          collect style header `by', title(hide)
                          collect style header result[col1 col2], level(hide)
                          collect style row stack, nobinder spacer
                      
                          // border style
                      
                          collect style cell border_block, border(right, pattern(nil))
                      
                          collect layout ///
                              (`contspec' `categorical' `allgroupnames' `binary') ///
                              (`by'#result[col1 col2] result[p nobs])
                      end
                      
                      program ParseGroupOption, sclass
                          syntax varlist , name(name) [label(string)]
                          sreturn local varlist `"`varlist'"'
                          sreturn local name `"`name'"'
                          sreturn local label `"`label'"'
                      end
                      
                      program CheckDupVars
                          args vars opt1 opt2
                      
                          local k : list sizeof vars
                          if `k' == 0 {
                              exit
                          }
                          if `k' > 1 {
                              local s s
                          }
                          if "`opt1'" == "`opt2'" {
                              di as err ///
                              "variable`s' duplicated in separate {bf:`opt1'} options"
                          }
                          else {
                              di as err ///
                              "variable`s' duplicated in options {bf:`opt1'} and {bf:`opt2'}"
                          }
                          di as err "{p}offending variable`s': {bf:`vars'}{p_end}"
                          exit 198
                      end
                      
                      program CheckDupNames
                          gettoken first rest : 0
                          if `:list posof "`first'" in rest' == 0 {
                              exit
                          }
                          di as err "name {bf:`first'} used in more than one {bf:group()} option"
                          exit 198
                      end
                      
                      program CheckNameVarConflict
                          args found opt
                      
                          local k : list sizeof found
                          if `k' == 0 {
                              exit
                          }
                          gettoken first : found
                          di as err "{p}"
                          di as err "{bf:grouped()} suboption {bf:name(`first')} is not allowed;{break}"
                          di as err "variable {bf:`first'} was specified in option {bf:`opt'}"
                          di as err "{p_end}"
                          exit 198
                      end
                      
                      exit
                      Here is a working example (also based on the above example, but with added variables), using the above program, that illustrates the variable and name checks and each of the variable types in action.
                      Code:
                      * play.do
                      
                      clear all
                      
                      set seed 82
                      set obs 1000
                      generate female = runiformint(0,1)
                      label define female 0 "Male" 1 "Female"
                      label values female female
                      // cont vars
                      generate cont1 = female + runiformint(1,7) + runiform() if runiform() > .05
                      generate cont2 = female + rnormal()*3
                      generate cont3 = rnormal()
                      // cat vars
                      generate cat1 = 1 + (cont1>2) + (cont1>5)
                      generate cat2 = runiformint(1,4)
                      generate cat3 = runiformint(11,15)
                      // bin vars
                      generate bin1 = cont1 > 4
                      generate bin2 = runiformint(0,1)
                      // grouped bin vars
                      generate nib1 = inrange(cont1, 2, 4)
                      generate nib2 = inrange(cont1, 1, 3)
                      generate nib3 = inrange(cont1, 3, 5)
                      // grouped bin vars
                      generate bib1 = !inrange(cont1, 2, 4)
                      generate bib2 = !inrange(cont1, 1, 3)
                      generate bib3 = !inrange(cont1, 3, 5)
                      
                      // check error messages
                      
                      rcof "noisily mw_table" == 198
                      rcof "noisily mw_table dude" == 101
                      rcof "noisily mw_table if 1" == 101
                      rcof "noisily mw_table in 1/20" == 101
                      rcof "noisily mw_table, by(dude)" == 111
                      rcof "noisily mw_table, by(female) bin(bin?) cat(bin?)" == 198
                      rcof "noisily mw_table, by(female) bin(bin?) cont(bin?)" == 198
                      rcof "noisily mw_table, by(female) cat(bin?) cont(bin?)" == 198
                      rcof "noisily mw_table, by(female) bin(bin?) group(bin?, name(bins))" == 198
                      rcof "noisily mw_table, by(female) cat(bin?) group(bin?, name(bins))" == 198
                      rcof "noisily mw_table, by(female) cont(bin?) group(bin?, name(bins))" == 198
                      rcof "noisily mw_table, by(female) group(bin1 cat1, name(mix)) group(bin?, name(bins))" == 198
                      rcof "noisily mw_table, by(female) group(bin?, name(mix)) group(bin?, name(bins))" == 198
                      rcof "noisily mw_table, by(female) group(nib?, name(cat1)) cat(cat?)" == 198
                      
                      // working example
                      
                      mw_table, ///
                          by(female) ///
                          binary(bin?) ///
                          categorical(cat?) ///
                          continuous(cont?) ///
                          grouped(nib?, name(nibs) label(Group1 indicators)) ///
                          grouped(bib?, name(bibs) label(Group2 indicators)) ///
                          nformat(%6.1f mean sd) ///
                          nformat(%6.1f fvpercent percent) ///
                          sformat("%s%%" fvpercent percent) ///
                          sformat("(%s)" sd) ///
                          name(mytable)
                      
                      * end: play.do
                      Here is the resulting table.
                      Code:
                      ------------------------------------------------------------------------------
                                             Male          Female         Total      p-value       N
                      ------------------------------------------------------------------------------
                      cont1               4.5   (2.1)    5.5   (2.0)   5.0   (2.1)      0.00     956
                      cont2              -0.1   (3.0)    0.9   (3.1)   0.4   (3.1)      0.00   1,000
                      cont3               0.1   (1.0)   -0.0   (1.1)   0.0   (1.0)      0.20   1,000
                                                                                                    
                      cat1                                                              0.00   1,000
                        1                  65   13.0%      0    0.0%    65    6.5%                  
                        2                 208   41.5%    206   41.3%   414   41.4%                  
                        3                 228   45.5%    293   58.7%   521   52.1%                  
                                                                                                    
                      cat2                                                              0.05   1,000
                        1                 143   28.5%    129   25.9%   272   27.2%                  
                        2                  99   19.8%    129   25.9%   228   22.8%                  
                        3                 113   22.6%    122   24.4%   235   23.5%                  
                        4                 146   29.1%    119   23.8%   265   26.5%                  
                                                                                                    
                      cat3                                                              0.24   1,000
                        11                104   20.8%    109   21.8%   213   21.3%                  
                        12                 88   17.6%    106   21.2%   194   19.4%                  
                        13                 98   19.6%     96   19.2%   194   19.4%                  
                        14                 97   19.4%    101   20.2%   198   19.8%                  
                        15                114   22.8%     87   17.4%   201   20.1%                  
                                                                                                    
                      Group1 indicators                                                            
                        nib1              152   30.3%    138   27.7%   290   29.0%      0.35   1,000
                        nib2              143   28.5%     72   14.4%   215   21.5%      0.00   1,000
                        nib3              130   25.9%    134   26.9%   264   26.4%      0.75   1,000
                                                                                                    
                      Group2 indicators                                                            
                        bib1              349   69.7%    361   72.3%   710   71.0%      0.35   1,000
                        bib2              358   71.5%    427   85.6%   785   78.5%      0.00   1,000
                        bib3              371   74.1%    365   73.1%   736   73.6%      0.75   1,000
                                                                                                    
                      bin1                284   56.7%    361   72.3%   645   64.5%      0.00   1,000
                                                                                                    
                      bin2                257   51.3%    239   47.9%   496   49.6%      0.28   1,000
                      ------------------------------------------------------------------------------
                      Last edited by Jeff Pitblado (StataCorp); 15 Jul 2023, 01:03.

                      Comment


                      • #12
                        This works wonderfully. I was worried my request would greatly expand the code needed to create the table, but this is all easy to understand. One thing I'd like to know is if the total column can be moved and/or labeled differently? I'd like to move it before the male/female split, but I can't find where you specify it in the program. Thanks again for all your help with this.

                        Comment


                        • #13
                          In the following, I added support for options first and label() in option by().
                          I also added a clue when there is a syntax error in a specified grouped() option.
                          Code:
                           *! 1.0.0  14jul2023
                          program mw_table
                              version 17
                          
                              syntax ,            ///
                                  by(string asis)        ///
                              [                ///
                                  BINary(varlist)        ///
                                  CATegorical(varlist)    ///
                                  CONTinuous(varlist)    ///
                                  GROUPed(string asis)    ///
                                  *            ///
                              ]
                          
                              capture noisily ParseByOption `by'
                              if c(rc) {
                                  di as err "in option {bf:by()}"
                                  exit c(rc)
                              }
                              local by = s(by)
                              local byfirst "`s(first)'"
                              local bylabel "`s(label)'"
                          
                              // Do not allow variables to be specified in more than one
                              // option.
                          
                              local duplist binary categorical continuous
                              local k_duplist : list sizeof duplist
                          
                              forval i = 1/`k_duplist' {
                                  local opt1 : word `i' of `duplist'
                                  forval j = `=`i'+1'/`k_duplist' {
                                      local opt2 : word `j' of `duplist'
                                      local both : list `opt1' & `opt2'
                                      CheckDupVars "`both'" `opt1'() `opt2'()
                                  }
                              }
                          
                              // Parse -grouped()- options.
                              // Check that grouped variables are not specified in the other
                              // options.
                              // Check that grouped names are not specified in
                              // the other options.
                              // Remaining options go to -table-.
                          
                              local gid 0
                              while `:length local grouped' {
                                  local ++gid
                                  capture noisily ParseGroupOption `grouped'
                                  if c(rc) {
                                      di as err "in option {bf:grouped()}"
                                      exit c(rc)
                                  }
                                  local group`gid'vars = s(varlist)
                                  local group`gid'name = s(name)
                                  local group`gid'label = s(label)
                                  forval i = 1/`k_duplist' {
                                      local opt1 : word `i' of `duplist'
                                      local both : list `opt1' & group`gid'vars
                                      CheckDupVars "`both'" `opt1'() group()
                                  }
                                  local both : list allgroupvars & group`gid'vars
                                  CheckDupVars "`both'" group() group()
                                  CheckDupNames `group`gid'name' `allgroupnames'
                                  local allgroupvars `allgroupvars' `group`gid'vars'
                                  local allgroupnames `allgroupnames' `group`gid'name'
                                  local 0 `", `options'"'
                                  syntax [, GROUPed(string asis) * ]
                              }
                              local k_grouped = `gid'
                          
                              // Check that grouped names are not also being used as
                              // variables.
                          
                              local both : list allgroupnames & binary
                              CheckNameVarConflict "`both'" binary()
                              local both : list allgroupnames & categorical
                              CheckNameVarConflict "`both'" categorical()
                              local both : list allgroupnames & continuous
                              CheckNameVarConflict "`both'" continuous()
                              local both : list allgroupnames & allgroupvars
                              CheckNameVarConflict "`both'" group()
                          
                              // Build the call to -table-.
                          
                              if `:list sizeof continuous' {
                                  local CVopts    statistic(mean `continuous') ///
                                          statistic(sd `continuous')
                              }
                              local fvlist `binary' `categorical' `allgroupvars'
                              if `:list sizeof fvlist' {
                                  local FVopts    statistic(fvfrequency `fvlist') ///
                                          statistic(fvpercent `fvlist')
                              }
                          
                              quietly table () (`by' result), `CVopts' `FVopts' `options'
                          
                              if `:list sizeof continuous' {
                                  // -anova- needs a numercial by variable.
                                  local bytype : type `by'
                                  if substr("`bytype'",1,3) == "str" {
                                      tempvar numby
                                      encode `by', generate(`numby')
                                  }
                                  else {
                                      local numby `by'
                                  }
                          
                                  // continuous variables layout specification
                                  local contspec var
                                  collect style autolevels var `continuous', clear
                              }
                          
                              foreach x of local continuous {
                                  quietly ///
                                  collect nobs=(e(N)) p=Ftail(e(df_m),e(df_r),e(F)) ///
                                      , tag(var[`x']) ///
                                      : anova `x' `numby'
                              }
                          
                              foreach x of local categorical {
                                  quietly tabulate `x' `by', chi2
                                  collect get nobs=(r(N)) p=(r(p)), tag(`x'[_hide])
                                  collect style header `x', title(label)
                                  quietly collect levelsof `x'
                                  collect style autolevels `x' _hide `s(levels)', clear
                              }
                          
                              foreach x of local binary {
                                  quietly tabulate `x' `by', chi2
                                  collect get nobs=(r(N)) p=(r(p)), tag(var[1.`x'])
                                  collect style header `x'[1], title(label) level(hide)
                                  collect style autolevels `x' 1, clear
                              }
                          
                              foreach x of local allgroupvars {
                                  quietly tabulate `x' `by', chi2
                                  collect get nobs=(r(N)) p=(r(p)), tag(var[1.`x'])
                                  collect style autolevels `x' 1, clear
                              }
                          
                              forval i = 1/`k_grouped' {
                                  local vars : copy local group`i'vars
                                  local name : copy local group`i'name
                                  local label : copy local group`i'label
                                  local j 0
                                  foreach x of local vars {
                                      local ++j
                                      local lab : variable label `x'
                                      if `"`lab'"' == "" {
                                          local lab `x'
                                      }
                                      quietly collect remap `x'[1] = `name'[`j']
                                      collect label levels `name' `j' `"`lab'"', modify
                                  }
                                  collect label dim `name' `"`label'"', modify
                              }
                          
                              // Define some composites for the columns. This allows us to
                              // let -table- handle result formats.
                          
                              collect composite define col1 = mean fvfrequency
                              collect composite define col2 = sd fvpercent
                          
                              // p-value styles
                          
                              collect style cell result[p], nformat(%6.2f)
                              collect label levels result p "p-value", modify
                          
                              // sample size styles
                          
                              collect style cell result[nobs], nformat(%18.0fc)
                              collect label levels result nobs "N", modify
                          
                              // header styles
                          
                              collect style header `by', title(hide)
                              collect style header result[col1 col2], level(hide)
                              collect style row stack, nobinder spacer
                          
                              // border style
                          
                              collect style cell border_block, border(right, pattern(nil))
                          
                              // handle by Total styles
                          
                              if "`byfirst'" != "" {
                                  quietly collect levels `by'
                                  collect style autolevels `by' .m `s(levels)', clear
                              }
                              if `"`bylabel'"' != "" {
                                  collect label levels `by' .m `"`bylabel'"', modify
                              }
                          
                              collect layout ///
                                  (`contspec' `categorical' `allgroupnames' `binary') ///
                                  (`by'#result[col1 col2] result[p nobs])
                          end
                          
                          program ParseByOption, sclass
                              syntax varname [, first label(string)]
                              sreturn local by "`varlist'"
                              sreturn local first `"`first'"'
                              sreturn local label `"`label'"'
                          end
                          
                          program ParseGroupOption, sclass
                              syntax varlist , name(name) [label(string)]
                              sreturn local varlist `"`varlist'"'
                              sreturn local name `"`name'"'
                              sreturn local label `"`label'"'
                          end
                          
                          program CheckDupVars
                              args vars opt1 opt2
                          
                              local k : list sizeof vars
                              if `k' == 0 {
                                  exit
                              }
                              if `k' > 1 {
                                  local s s
                              }
                              if "`opt1'" == "`opt2'" {
                                  di as err ///
                                  "variable`s' duplicated in separate {bf:`opt1'} options"
                              }
                              else {
                                  di as err ///
                                  "variable`s' duplicated in options {bf:`opt1'} and {bf:`opt2'}"
                              }
                              di as err "{p}offending variable`s': {bf:`vars'}{p_end}"
                              exit 198
                          end
                          
                          program CheckDupNames
                              gettoken first rest : 0
                              if `:list posof "`first'" in rest' == 0 {
                                  exit
                              }
                              di as err "name {bf:`first'} used in more than one {bf:group()} option"
                              exit 198
                          end
                          
                          program CheckNameVarConflict
                              args found opt
                          
                              local k : list sizeof found
                              if `k' == 0 {
                                  exit
                              }
                              gettoken first : found
                              di as err "{p}"
                              di as err "{bf:grouped()} suboption {bf:name(`first')} is not allowed;{break}"
                              di as err "variable {bf:`first'} was specified in option {bf:`opt'}"
                              di as err "{p_end}"
                              exit 198
                          end
                          
                          exit
                          Here are these new sub-options in action using the above example.
                          Code:
                          mw_table, ///
                                  by(female, first label(Overall)) ///
                                  binary(bin?) ///
                                  categorical(cat?) ///
                                  continuous(cont?) ///
                                  grouped(nib?, name(nibs) label(Group1 indicators)) ///
                                  grouped(bib?, name(bibs) label(Group2 indicators)) ///
                                  nformat(%6.1f mean sd) ///
                                  nformat(%6.1f fvpercent percent) ///
                                  sformat("%s%%" fvpercent percent) ///
                                  sformat("(%s)" sd) ///
                                  name(mytable)
                          Here is the resulting table.
                          Code:
                          ------------------------------------------------------------------------------
                                               Overall         Male          Female      p-value       N
                          ------------------------------------------------------------------------------
                          cont1              5.0   (2.1)    4.5   (2.1)    5.5   (2.0)      0.00     956
                          cont2              0.4   (3.1)   -0.1   (3.0)    0.9   (3.1)      0.00   1,000
                          cont3              0.0   (1.0)    0.1   (1.0)   -0.0   (1.1)      0.20   1,000
                                                                                                        
                          cat1                                                              0.00   1,000
                            1                 65    6.5%     65   13.0%      0    0.0%                  
                            2                414   41.4%    208   41.5%    206   41.3%                  
                            3                521   52.1%    228   45.5%    293   58.7%                  
                                                                                                        
                          cat2                                                              0.05   1,000
                            1                272   27.2%    143   28.5%    129   25.9%                  
                            2                228   22.8%     99   19.8%    129   25.9%                  
                            3                235   23.5%    113   22.6%    122   24.4%                  
                            4                265   26.5%    146   29.1%    119   23.8%                  
                                                                                                        
                          cat3                                                              0.24   1,000
                            11               213   21.3%    104   20.8%    109   21.8%                  
                            12               194   19.4%     88   17.6%    106   21.2%                  
                            13               194   19.4%     98   19.6%     96   19.2%                  
                            14               198   19.8%     97   19.4%    101   20.2%                  
                            15               201   20.1%    114   22.8%     87   17.4%                  
                                                                                                        
                          Group1 indicators                                                             
                            nib1             290   29.0%    152   30.3%    138   27.7%      0.35   1,000
                            nib2             215   21.5%    143   28.5%     72   14.4%      0.00   1,000
                            nib3             264   26.4%    130   25.9%    134   26.9%      0.75   1,000
                                                                                                        
                          Group2 indicators                                                             
                            bib1             710   71.0%    349   69.7%    361   72.3%      0.35   1,000
                            bib2             785   78.5%    358   71.5%    427   85.6%      0.00   1,000
                            bib3             736   73.6%    371   74.1%    365   73.1%      0.75   1,000
                                                                                                        
                          bin1               645   64.5%    284   56.7%    361   72.3%      0.00   1,000
                                                                                                        
                          bin2               496   49.6%    257   51.3%    239   47.9%      0.28   1,000
                          ------------------------------------------------------------------------------

                          Comment


                          • #14
                            Perfect. Thanks so much, Jeff. I've been trying to create a table 1 just like this for a long time in Stata and am very glad to finally have one.

                            Comment


                            • #15
                              Back again with two more requests.

                              1) How would I go about displaying levels of categorical variables with 0 observations? I'd want it to look something like pictured below, where I've added a level to cat2 labelled "3.5" with 0 observations in that level:
                              Click image for larger version

Name:	capture2.PNG
Views:	1
Size:	4.0 KB
ID:	1721452




                              2) Possibly a bit trickier, but is there a way I can add an additional row for the continuous variables for statistics like the range and IQR? I know I can add those statistics in this loop as such (and use collect composite to combine the min and max into a range):

                              Code:
                              if `:list sizeof continuous' {
                                      local CVopts    statistic(mean `continuous') ///
                                              statistic(sd `continuous') ///
                                              statistic(min `continuous') ///
                                              statistic(max `continuous') ///
                                              statistic(iqr `continuous')
                                  }
                              but I'm unsure if I can get them on their own row as pictured below:
                              Click image for larger version

Name:	capture3.PNG
Views:	1
Size:	6.3 KB
ID:	1721446

                              Last edited by Matthew Wallace; 21 Jul 2023, 20:56.

                              Comment

                              Working...
                              X