Formatting binary and categorical variables and p-values using table

Matthew Wallace

Join Date: Jul 2023
Posts: 16

Formatting binary and categorical variables and p-values using table

03 Jul 2023, 17:44

I'm trying to build a table using the table command in Stata 17.0. I've set up code to display means and SDs for continuous variables (cont), and N's and %'s for categorical (cat) and binary variables (bin). This code I have works well for outputting continuous variables and their p-values across groups but the formatting is a bit off for categorical variables and binary variables.

As pictured below, the categorical variable header is showing up in two rows, and the p-value is on the first row. I'd like to only have the header appear on one row and the p-value along with it.

For binary variables, I'd like to only see the "success" level, so for this example, only b=1. Currently they're set up the same as the categorical variables.

As a final note, I'm trying to set the code up to be flexible, where if I wanted to add/subtract variables all I'd need to do is add/subtract them from local macros. Thanks for the help!

Code:

clear
set more off
collect clear

/* simulating data */
set seed 82
set obs 1000
g female = runiformint(0,1)
label define female 0 "Male" 1 "Female"
label values female female
g cont = female + runiformint(1,7) + runiform() if runiform() > .05
g cat = 1
replace cat = 2 if cont > 2
replace cat = 3 if cont > 5
g bin = cont > 4


/* putting the variables in macros */
local stratvar "female"
local catvars "cat"
local binvars "bin"
local contvars "cont"

table (var) (`stratvar' result), ///
    name(stats) ///
    statistic(mean `contvars') ///
    statistic(sd `contvars') ///
    statistic(fvfrequency `catvars' `binvars') ///
    statistic(fvpercent `catvars' `binvars')

/* generating p-values */
collect create fits
foreach var in `contvars' `catvars' `binvars' {
    collect p = Ftail(e(df_m), e(df_r), e(F)) ///
    , tag(var[`var']) ///
    : anova `var' `stratvar'
}

/* formatting */
collect layout (var) (result)
collect combine full = stats fits
collect recode result mean = column1 ///
                      sd = column2 ///
                      fvfrequency = column1 ///
                      fvpercent = column2 ///
                      total = column1 ///
                      percent = column2
collect layout (var) (`stratvar'#result[column1 column2] result[p])
collect style cell var[`catvars' `binvars']#result[column1], nformat(%6.0fc)
collect style cell var[`catvars' `binvars']#result[column2], nformat(%6.1f) sformat("%s%%")
collect style cell var[`contvars']#result[column1 column2], nformat(%6.1f)
collect style cell var[`contvars']#result[column2], sformat("(%s)")
collect style cell result[p], nformat(%6.3f)
collect label levels result p "p-value", modify
collect style header `stratvar', title(hide)
collect style header result[column1 column2], level(hide)
collect style row stack, nobinder spacer
collect style cell border_block, border(right, pattern(nil))
collect preview

-------------------------------------------------------
          Male         Female        Total      p-value
-------------------------------------------------------
cont  4.5   (2.1)   5.5   (2.0)   5.0   (2.1)     0.000
cat                                               0.000
                                                       
cat                                                    
  1    65   13.0%     0    0.0%    65    6.5%          
  2   208   41.5%   206   41.3%   414   41.4%          
  3   228   45.5%   293   58.7%   521   52.1%          
                                                       
bin                                               0.000
                                                       
bin                                                    
  0   217   43.3%   138   27.7%   355   35.5%          
  1   284   56.7%   361   72.3%   645   64.5%          
-------------------------------------------------------

Tags: binary, categorical, p-value, table

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 658

03 Jul 2023, 22:22

I got this to work by treating the categorical and binary variables differently.
I use tabulate to get the p-values, and treat these variables as dimensions in the layout and styles.
This does not require any changes to the table command because collect automatically adds factor variables in dimension var to the item tags as their own dimensions.

The following is a modified version of the above:

Code:

clear
set more off
collect clear

/* simulating data */
set seed 82
set obs 1000
g female = runiformint(0,1)
label define female 0 "Male" 1 "Female"
label values female female
g cont = female + runiformint(1,7) + runiform() if runiform() > .05
g cat = 1
replace cat = 2 if cont > 2
replace cat = 3 if cont > 5
g bin = cont > 4


/* putting the variables in macros */
local stratvar "female"
local catvars "cat"
local binvars "bin"
local contvars "cont"

table (var) (`stratvar' result), ///
    name(stats) ///
    statistic(mean `contvars') ///
    statistic(sd `contvars') ///
    statistic(fvfrequency `catvars' `binvars') ///
    statistic(fvpercent `catvars' `binvars')

/* generating p-values */
collect create fits
foreach var in `contvars' {
    collect p = Ftail(e(df_m), e(df_r), e(F)) ///
    , tag(var[`var']) ///
    : anova `var' `stratvar'
}

foreach var in `catvars' `binvars' {
    tabulate `var' `stratvar', chi2
    collect get p = (r(p)), tag(`var'[_hide])
    collect style header `var', title(label)
}

/* formatting */
collect style autolevels var `contvars', clear
collect layout (var `catvars' `binvars') (result)
collect combine full = stats fits
foreach var in `catvars' {
    collect levels `var'
    collect style autolevels `var' _hide `s(levels)', clear
}
foreach var in `binvars' {
    collect style autolevels `var' _hide 1, clear
}
collect recode result mean = column1 ///
                      sd = column2 ///
                      fvfrequency = column1 ///
                      fvpercent = column2 ///
                      total = column1 ///
                      percent = column2
collect layout (var `catvars' `binvars') (`stratvar'#result[column1 column2] result[p])
collect style cell (`catvars' `binvars')#result[column1], nformat(%6.0fc)
collect style cell (`catvars' `binvars')#result[column2], nformat(%6.1f) sformat("%s%%")
collect style cell var[`contvars']#result[column1 column2], nformat(%6.1f)
collect style cell var[`contvars']#result[column2], sformat("(%s)")
collect style cell result[p], nformat(%6.3f)
collect label levels result p "p-value", modify
collect style header `stratvar', title(hide)
collect style header result[column1 column2], level(hide)
collect style row stack, nobinder spacer
collect style cell border_block, border(right, pattern(nil))
collect preview

Here is the resulting table

Code:

-------------------------------------------------------
          Male         Female        Total      p-value
-------------------------------------------------------
cont  4.5   (2.1)   5.5   (2.0)   5.0   (2.1)     0.000
                                                       
cat                                               0.000
  1    65   13.0%     0    0.0%    65    6.5%          
  2   208   41.5%   206   41.3%   414   41.4%          
  3   228   45.5%   293   58.7%   521   52.1%          
                                                       
bin                                               0.000
  1   284   56.7%   361   72.3%   645   64.5%          
-------------------------------------------------------

Comment

Matthew Wallace

Join Date: Jul 2023

Posts: 16
#3

03 Jul 2023, 23:52

Wow, thanks so much Jeff, this works great! One more question: Is there any way to get the statistics for the binary variable on the same row as the variable header?
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 658

04 Jul 2023, 09:42

Yes, but you have to tag the binary variables differently.
In the following I brake the binary variables out into their own loop for computing the p-value, apply a different tag and header style, and set their auto levels to 1.

Code:

clear
set more off
collect clear

/* simulating data */
set seed 82
set obs 1000
g female = runiformint(0,1)
label define female 0 "Male" 1 "Female"
label values female female
g cont = female + runiformint(1,7) + runiform() if runiform() > .05
g cat = 1
replace cat = 2 if cont > 2
replace cat = 3 if cont > 5
g bin = cont > 4


/* putting the variables in macros */
local stratvar "female"
local catvars "cat"
local binvars "bin"
local contvars "cont"

table (var) (`stratvar' result), ///
    name(stats) ///
    statistic(mean `contvars') ///
    statistic(sd `contvars') ///
    statistic(fvfrequency `catvars' `binvars') ///
    statistic(fvpercent `catvars' `binvars')

/* generating p-values */
collect create fits
foreach var in `contvars' {
    collect p = Ftail(e(df_m), e(df_r), e(F)) ///
    , tag(var[`var']) ///
    : anova `var' `stratvar'
}

foreach var in `catvars' {
    tabulate `var' `stratvar', chi2
    collect get p = (r(p)), tag(`var'[_hide])
    collect style header `var', title(label)
}

foreach var in `binvars' {
    tabulate `var' `stratvar', chi2
    collect get p = (r(p)), tag(var[1.`var'])
    collect style header `var'[1], title(label) level(hide)
}

/* formatting */
collect style autolevels var `contvars', clear
collect layout (var `catvars' `binvars') (result)
collect combine full = stats fits
foreach var in `catvars' {
    collect levels `var'
    collect style autolevels `var' _hide `s(levels)', clear
}
foreach var in `binvars' {
    collect style autolevels `var' 1, clear
}
collect recode result mean = column1 ///
                      sd = column2 ///
                      fvfrequency = column1 ///
                      fvpercent = column2 ///
                      total = column1 ///
                      percent = column2
collect layout (var `catvars' `binvars') (`stratvar'#result[column1 column2] result[p])
collect style cell (`catvars' `binvars')#result[column1], nformat(%6.0fc)
collect style cell (`catvars' `binvars')#result[column2], nformat(%6.1f) sformat("%s%%")
collect style cell var[`contvars']#result[column1 column2], nformat(%6.1f)
collect style cell var[`contvars']#result[column2], sformat("(%s)")
collect style cell result[p], nformat(%6.3f)
collect label levels result p "p-value", modify
collect style header `stratvar', title(hide)
collect style header result[column1 column2], level(hide)
collect style row stack, nobinder spacer
collect style cell border_block, border(right, pattern(nil))
collect preview

Here is the resulting table

Code:

-------------------------------------------------------
          Male         Female        Total      p-value
-------------------------------------------------------
cont  4.5   (2.1)   5.5   (2.0)   5.0   (2.1)     0.000
                                                       
cat                                               0.000
  1    65   13.0%     0    0.0%    65    6.5%          
  2   208   41.5%   206   41.3%   414   41.4%          
  3   228   45.5%   293   58.7%   521   52.1%          
                                                       
bin   284   56.7%   361   72.3%   645   64.5%     0.000
-------------------------------------------------------

Comment

Matthew Wallace

Join Date: Jul 2023

Posts: 16
#5

04 Jul 2023, 13:27

Perfect. Thanks so much, Jeff.
Comment
Matthew Wallace

Join Date: Jul 2023

Posts: 16
#6

11 Jul 2023, 19:58

Not sure if I should make a separate post for this, but I'm also trying to get a count for each of the variables on the same row as the header and p-value. This would be a nonmissings count, and it would also be split by the stratification variable. I've tried using the showcounts option but it puts the count on each level of the categorical variable and I haven't been able to fix that.

Last edited by Matthew Wallace; 11 Jul 2023, 20:09.
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 658

13 Jul 2023, 10:03

Try collecting the sample size where you are collecting the p-values.
In the following, I highlight where I made changes from the above example.

Code:

clear
set more off
collect clear

/* simulating data */
set seed 82
set obs 1000
g female = runiformint(0,1)
label define female 0 "Male" 1 "Female"
label values female female
g cont = female + runiformint(1,7) + runiform() if runiform() > .05
g cat = 1
replace cat = 2 if cont > 2
replace cat = 3 if cont > 5
g bin = cont > 4


/* putting the variables in macros */
local stratvar "female"
local catvars "cat"
local binvars "bin"
local contvars "cont"

table (var) (`stratvar' result), ///
    name(stats) ///
    statistic(mean `contvars') ///
    statistic(sd `contvars') ///
    statistic(fvfrequency `catvars' `binvars') ///
    statistic(fvpercent `catvars' `binvars')

/* generating p-values */
collect create fits
foreach var in `contvars' {
    collect nobs = (e(N)) p = Ftail(e(df_m), e(df_r), e(F)) ///
    , tag(var[`var']) ///
    : anova `var' `stratvar'
}

foreach var in `catvars' {
    tabulate `var' `stratvar', chi2
    collect get nobs = (r(N)) p = (r(p)), tag(`var'[_hide])
    collect style header `var', title(label)
}

foreach var in `binvars' {
    tabulate `var' `stratvar', chi2
    collect get nobs = (r(N)) p = (r(p)), tag(var[1.`var'])
    collect style header `var'[1], title(label) level(hide)
}

/* formatting */
collect style autolevels var `contvars', clear
collect layout (var `catvars' `binvars') (result)
collect combine full = stats fits
foreach var in `catvars' {
    collect levels `var'
    collect style autolevels `var' _hide `s(levels)', clear
}
foreach var in `binvars' {
    collect style autolevels `var' 1, clear
}
collect recode result mean = column1 ///
                      sd = column2 ///
                      fvfrequency = column1 ///
                      fvpercent = column2 ///
                      total = column1 ///
                      percent = column2
collect layout (var `catvars' `binvars') (`stratvar'#result[column1 column2] result[p nobs])
collect style cell (`catvars' `binvars')#result[column1], nformat(%6.0fc)
collect style cell (`catvars' `binvars')#result[column2], nformat(%6.1f) sformat("%s%%")
collect style cell var[`contvars']#result[column1 column2], nformat(%6.1f)
collect style cell var[`contvars']#result[column2], sformat("(%s)")
collect style cell result[p], nformat(%6.3f)
collect label levels result p "p-value", modify
collect style cell result[nobs], nformat(%18.0fc)
collect label levels result nobs "N", modify
collect style header `stratvar', title(hide)
collect style header result[column1 column2], level(hide)
collect style row stack, nobinder spacer
collect style cell border_block, border(right, pattern(nil))
collect preview

Here is the resulting table.

Code:

---------------------------------------------------------------
          Male         Female        Total      p-value       N
---------------------------------------------------------------
cont  4.5   (2.1)   5.5   (2.0)   5.0   (2.1)     0.000     956
                                                               
cat                                               0.000   1,000
  1    65   13.0%     0    0.0%    65    6.5%                  
  2   208   41.5%   206   41.3%   414   41.4%                  
  3   228   45.5%   293   58.7%   521   52.1%                  
                                                               
bin   284   56.7%   361   72.3%   645   64.5%     0.000   1,000
---------------------------------------------------------------

Comment

Matthew Wallace

Join Date: Jul 2023

Posts: 16
#8

13 Jul 2023, 19:07

Got it, thanks so much. One last question and I'll be all set. I'd like to add headers to some of the binary variables (think check-all-that apply-questions). I've been trying to do this using collect addtags, but I can't get it to work. Since we only have one binary variable in this example, adding a header to just that one would be fine, assuming the same code could be applied to additional binary variables later on. Thanks again!
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 658

13 Jul 2023, 20:14

If I'm following what you want, it is a matter of letting the "special" binary variables show a custom title and relabel their 1 level with the variable label/name (or something else).
In the following I add a new binary variable named nib to a special binary variable list and add some logic to treat the regular and special binary variables differently.
My labelling here is nothing special, but I think you get the idea that you can code up the labelling to suit your needs, maybe even having more than one grouping for your binary variables, each getting their own labelling.

Code:

clear
set more off
collect clear

/* simulating data */
set seed 82
set obs 1000
g female = runiformint(0,1)
label define female 0 "Male" 1 "Female"
label values female female
g cont = female + runiformint(1,7) + runiform() if runiform() > .05
g cat = 1
replace cat = 2 if cont > 2
replace cat = 3 if cont > 5
g bin = cont > 4
g nib = inrange(cont, 2, 4)


/* putting the variables in macros */
local stratvar "female"
local catvars "cat"
local regular_binvars "bin"
local special_binvars "nib"
local binvars "`regular_binvars' `special_binvars'"
local contvars "cont"

table (var) (`stratvar' result), ///
    name(stats) ///
    statistic(mean `contvars') ///
    statistic(sd `contvars') ///
    statistic(fvfrequency `catvars' `binvars') ///
    statistic(fvpercent `catvars' `binvars')

/* generating p-values */
collect create fits
foreach var in `contvars' {
    collect nobs = (e(N)) p = Ftail(e(df_m), e(df_r), e(F)) ///
    , tag(var[`var']) ///
    : anova `var' `stratvar'
}

foreach var in `catvars' {
    tabulate `var' `stratvar', chi2
    collect get nobs = (r(N)) p = (r(p)), tag(`var'[_hide])
    collect style header `var', title(label)
}

foreach var in `binvars' {
    tabulate `var' `stratvar', chi2
    collect get nobs = (r(N)) p = (r(p)), tag(var[1.`var'])
    if `:list var in regular_binvars' {
        collect style header `var'[1], title(label) level(hide)
    }
}

/* formatting */
collect style autolevels var `contvars', clear
collect layout (var `catvars' `binvars') (result)
collect combine full = stats fits
foreach var in `catvars' {
    collect levels `var'
    collect style autolevels `var' _hide `s(levels)', clear
}
foreach var in `binvars' {
    collect style autolevels `var' 1, clear
    if `:list var in special_binvars' {
        local lab : var lab `var'
        if `"`lab'"' == "" {
                local lab `var'
        }
        collect label levels `var' 1 `"`lab'"', modify
        collect label dim `var' "Special `var'", modify
    }
}
collect recode result mean = column1 ///
                      sd = column2 ///
                      fvfrequency = column1 ///
                      fvpercent = column2 ///
                      total = column1 ///
                      percent = column2
collect layout (var `catvars' `binvars') (`stratvar'#result[column1 column2] result[p nobs])
collect style cell (`catvars' `binvars')#result[column1], nformat(%6.0fc)
collect style cell (`catvars' `binvars')#result[column2], nformat(%6.1f) sformat("%s%%")
collect style cell var[`contvars']#result[column1 column2], nformat(%6.1f)
collect style cell var[`contvars']#result[column2], sformat("(%s)")
collect style cell result[p], nformat(%6.3f)
collect label levels result p "p-value", modify
collect style cell result[nobs], nformat(%18.0fc)
collect label levels result nobs "N", modify
collect style header `stratvar', title(hide)
collect style header result[column1 column2], level(hide)
collect style row stack, nobinder spacer
collect style cell border_block, border(right, pattern(nil))
collect preview

Here is the resulting table.

Code:

----------------------------------------------------------------------
                 Male         Female        Total      p-value       N
----------------------------------------------------------------------
cont         4.5   (2.1)   5.5   (2.0)   5.0   (2.1)     0.000     956
                                                                      
cat                                                      0.000   1,000
  1           65   13.0%     0    0.0%    65    6.5%                  
  2          208   41.5%   206   41.3%   414   41.4%                  
  3          228   45.5%   293   58.7%   521   52.1%                  
                                                                      
bin          284   56.7%   361   72.3%   645   64.5%     0.000   1,000
                                                                      
Special mid                                                           
  mid        152   30.3%   138   27.7%   290   29.0%     0.350   1,000
----------------------------------------------------------------------

Comment

Matthew Wallace

Join Date: Jul 2023

Posts: 16
#10

14 Jul 2023, 10:53

Yes, this is very close to what I want. How would I go about, in this case, adding more binvars under the "Special mid" label? I've added a couple to the table below (bin1-bin2) for a visual. Could these be listed with the regular_binvars or would they need their own category? I figure this would require reordering the variables in the layout as well.

Attached Files

Last edited by Matthew Wallace; 14 Jul 2023, 10:57.
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 658

#11

15 Jul 2023, 00:57

Thanks for the example table. It helped me better understand what you meant by check-all-that-apply questions.

I can envision encountering many such check-all-that-apply questions, so I needed to change my approach a little.

The following code block comes from an ado-file I wrote named mw_table.ado, based on the example code so far, but with changes to accommodate multiple grouped binary variables. I added checks to prevent a variable from being used in more than one place. I also replaced the collect recode result command with collect composite definitions for column 1 and 2. This allowed me to eliminate some of the collect style commands since composite results pick up the numeric and string formats of their elements.

Code:

*! version 1.0.0  14jul2023
program mw_table
    version 17

    syntax ,            ///
        by(varname)        ///
    [                ///
        BINary(varlist)        ///
        CATegorical(varlist)    ///
        CONTinuous(varlist)    ///
        GROUPed(string asis)    ///
        *            ///
    ]

    // Do not allow variables to be specified in more than one
    // option.

    local duplist binary categorical continuous
    local k_duplist : list sizeof duplist

    forval i = 1/`k_duplist' {
        local opt1 : word `i' of `duplist'
        forval j = `=`i'+1'/`k_duplist' {
            local opt2 : word `j' of `duplist'
            local both : list `opt1' & `opt2'
            CheckDupVars "`both'" `opt1'() `opt2'()
        }
    }

    // Parse -grouped()- options.
    // Check that grouped variables are not specified in the other
    // options.
    // Check that grouped names are not specified in
    // the other options.
    // Remaining options go to -table-.

    local gid 0
    while `:length local grouped' {
        local ++gid
        ParseGroupOption `grouped'
        local group`gid'vars = s(varlist)
        local group`gid'name = s(name)
        local group`gid'label = s(label)
        forval i = 1/`k_duplist' {
            local opt1 : word `i' of `duplist'
            local both : list `opt1' & group`gid'vars
            CheckDupVars "`both'" `opt1'() group()
        }
        local both : list allgroupvars & group`gid'vars
        CheckDupVars "`both'" group() group()
        CheckDupNames `group`gid'name' `allgroupnames'
        local allgroupvars `allgroupvars' `group`gid'vars'
        local allgroupnames `allgroupnames' `group`gid'name'
        local 0 `", `options'"'
        syntax [, GROUPed(string asis) * ]
    }
    local k_grouped = `gid'

    // Check that grouped names are not also being used as
    // variables.

    local both : list allgroupnames & binary
    CheckNameVarConflict "`both'" binary()
    local both : list allgroupnames & categorical
    CheckNameVarConflict "`both'" categorical()
    local both : list allgroupnames & continuous
    CheckNameVarConflict "`both'" continuous()
    local both : list allgroupnames & allgroupvars
    CheckNameVarConflict "`both'" group()

    // Build the call to -table-.

    if `:list sizeof continuous' {
        local CVopts    statistic(mean `continuous') ///
                statistic(sd `continuous')
    }
    local fvlist `binary' `categorical' `allgroupvars'
    if `:list sizeof fvlist' {
        local FVopts    statistic(fvfrequency `fvlist') ///
                statistic(fvpercent `fvlist')
    }

    quietly table () (`by' result), `CVopts' `FVopts' `options'

    if `:list sizeof continuous' {
        // -anova- needs a numercial by variable.
        local bytype : type `by'
        if substr("`bytype'",1,3) == "str" {
            tempvar numby
            encode `by', generate(`numby')
        }
        else {
            local numby `by'
        }

        // continuous variables layout specification
        local contspec var
        collect style autolevels var `continuous', clear
    }

    foreach x of local continuous {
        quietly ///
        collect nobs=(e(N)) p=Ftail(e(df_m),e(df_r),e(F)) ///
            , tag(var[`x']) ///
            : anova `x' `numby'
    }

    foreach x of local categorical {
        quietly tabulate `x' `by', chi2
        collect get nobs=(r(N)) p=(r(p)), tag(`x'[_hide])
        collect style header `x', title(label)
        quietly collect levelsof `x'
        collect style autolevels `x' _hide `s(levels)', clear
    }

    foreach x of local binary {
        quietly tabulate `x' `by', chi2
        collect get nobs=(r(N)) p=(r(p)), tag(var[1.`x'])
        collect style header `x'[1], title(label) level(hide)
        collect style autolevels `x' 1, clear
    }

    foreach x of local allgroupvars {
        quietly tabulate `x' `by', chi2
        collect get nobs=(r(N)) p=(r(p)), tag(var[1.`x'])
        collect style autolevels `x' 1, clear
    }

    forval i = 1/`k_grouped' {
        local vars : copy local group`i'vars
        local name : copy local group`i'name
        local label : copy local group`i'label
        local j 0
        foreach x of local vars {
            local ++j
            local lab : variable label `x'
            if `"`lab'"' == "" {
                local lab `x'
            }
            quietly collect remap `x'[1] = `name'[`j']
            collect label levels `name' `j' `"`lab'"', modify
        }
        collect label dim `name' `"`label'"', modify
    }

    // Define some composites for the columns. This allows us to
    // let -table- handle result formats.

    collect composite define col1 = mean fvfrequency
    collect composite define col2 = sd fvpercent

    // p-value styles

    collect style cell result[p], nformat(%6.2f)
    collect label levels result p "p-value", modify

    // sample size styles

    collect style cell result[nobs], nformat(%18.0fc)
    collect label levels result nobs "N", modify

    // header styles

    collect style header `by', title(hide)
    collect style header result[col1 col2], level(hide)
    collect style row stack, nobinder spacer

    // border style

    collect style cell border_block, border(right, pattern(nil))

    collect layout ///
        (`contspec' `categorical' `allgroupnames' `binary') ///
        (`by'#result[col1 col2] result[p nobs])
end

program ParseGroupOption, sclass
    syntax varlist , name(name) [label(string)]
    sreturn local varlist `"`varlist'"'
    sreturn local name `"`name'"'
    sreturn local label `"`label'"'
end

program CheckDupVars
    args vars opt1 opt2

    local k : list sizeof vars
    if `k' == 0 {
        exit
    }
    if `k' > 1 {
        local s s
    }
    if "`opt1'" == "`opt2'" {
        di as err ///
        "variable`s' duplicated in separate {bf:`opt1'} options"
    }
    else {
        di as err ///
        "variable`s' duplicated in options {bf:`opt1'} and {bf:`opt2'}"
    }
    di as err "{p}offending variable`s': {bf:`vars'}{p_end}"
    exit 198
end

program CheckDupNames
    gettoken first rest : 0
    if `:list posof "`first'" in rest' == 0 {
        exit
    }
    di as err "name {bf:`first'} used in more than one {bf:group()} option"
    exit 198
end

program CheckNameVarConflict
    args found opt

    local k : list sizeof found
    if `k' == 0 {
        exit
    }
    gettoken first : found
    di as err "{p}"
    di as err "{bf:grouped()} suboption {bf:name(`first')} is not allowed;{break}"
    di as err "variable {bf:`first'} was specified in option {bf:`opt'}"
    di as err "{p_end}"
    exit 198
end

exit

Here is a working example (also based on the above example, but with added variables), using the above program, that illustrates the variable and name checks and each of the variable types in action.

Code:

* play.do

clear all

set seed 82
set obs 1000
generate female = runiformint(0,1)
label define female 0 "Male" 1 "Female"
label values female female
// cont vars
generate cont1 = female + runiformint(1,7) + runiform() if runiform() > .05
generate cont2 = female + rnormal()*3
generate cont3 = rnormal()
// cat vars
generate cat1 = 1 + (cont1>2) + (cont1>5)
generate cat2 = runiformint(1,4)
generate cat3 = runiformint(11,15)
// bin vars
generate bin1 = cont1 > 4
generate bin2 = runiformint(0,1)
// grouped bin vars
generate nib1 = inrange(cont1, 2, 4)
generate nib2 = inrange(cont1, 1, 3)
generate nib3 = inrange(cont1, 3, 5)
// grouped bin vars
generate bib1 = !inrange(cont1, 2, 4)
generate bib2 = !inrange(cont1, 1, 3)
generate bib3 = !inrange(cont1, 3, 5)

// check error messages

rcof "noisily mw_table" == 198
rcof "noisily mw_table dude" == 101
rcof "noisily mw_table if 1" == 101
rcof "noisily mw_table in 1/20" == 101
rcof "noisily mw_table, by(dude)" == 111
rcof "noisily mw_table, by(female) bin(bin?) cat(bin?)" == 198
rcof "noisily mw_table, by(female) bin(bin?) cont(bin?)" == 198
rcof "noisily mw_table, by(female) cat(bin?) cont(bin?)" == 198
rcof "noisily mw_table, by(female) bin(bin?) group(bin?, name(bins))" == 198
rcof "noisily mw_table, by(female) cat(bin?) group(bin?, name(bins))" == 198
rcof "noisily mw_table, by(female) cont(bin?) group(bin?, name(bins))" == 198
rcof "noisily mw_table, by(female) group(bin1 cat1, name(mix)) group(bin?, name(bins))" == 198
rcof "noisily mw_table, by(female) group(bin?, name(mix)) group(bin?, name(bins))" == 198
rcof "noisily mw_table, by(female) group(nib?, name(cat1)) cat(cat?)" == 198

// working example

mw_table, ///
    by(female) ///
    binary(bin?) ///
    categorical(cat?) ///
    continuous(cont?) ///
    grouped(nib?, name(nibs) label(Group1 indicators)) ///
    grouped(bib?, name(bibs) label(Group2 indicators)) ///
    nformat(%6.1f mean sd) ///
    nformat(%6.1f fvpercent percent) ///
    sformat("%s%%" fvpercent percent) ///
    sformat("(%s)" sd) ///
    name(mytable)

* end: play.do

Here is the resulting table.

Code:

------------------------------------------------------------------------------
                       Male          Female         Total      p-value       N
------------------------------------------------------------------------------
cont1               4.5   (2.1)    5.5   (2.0)   5.0   (2.1)      0.00     956
cont2              -0.1   (3.0)    0.9   (3.1)   0.4   (3.1)      0.00   1,000
cont3               0.1   (1.0)   -0.0   (1.1)   0.0   (1.0)      0.20   1,000
                                                                              
cat1                                                              0.00   1,000
  1                  65   13.0%      0    0.0%    65    6.5%                  
  2                 208   41.5%    206   41.3%   414   41.4%                  
  3                 228   45.5%    293   58.7%   521   52.1%                  
                                                                              
cat2                                                              0.05   1,000
  1                 143   28.5%    129   25.9%   272   27.2%                  
  2                  99   19.8%    129   25.9%   228   22.8%                  
  3                 113   22.6%    122   24.4%   235   23.5%                  
  4                 146   29.1%    119   23.8%   265   26.5%                  
                                                                              
cat3                                                              0.24   1,000
  11                104   20.8%    109   21.8%   213   21.3%                  
  12                 88   17.6%    106   21.2%   194   19.4%                  
  13                 98   19.6%     96   19.2%   194   19.4%                  
  14                 97   19.4%    101   20.2%   198   19.8%                  
  15                114   22.8%     87   17.4%   201   20.1%                  
                                                                              
Group1 indicators                                                            
  nib1              152   30.3%    138   27.7%   290   29.0%      0.35   1,000
  nib2              143   28.5%     72   14.4%   215   21.5%      0.00   1,000
  nib3              130   25.9%    134   26.9%   264   26.4%      0.75   1,000
                                                                              
Group2 indicators                                                            
  bib1              349   69.7%    361   72.3%   710   71.0%      0.35   1,000
  bib2              358   71.5%    427   85.6%   785   78.5%      0.00   1,000
  bib3              371   74.1%    365   73.1%   736   73.6%      0.75   1,000
                                                                              
bin1                284   56.7%    361   72.3%   645   64.5%      0.00   1,000
                                                                              
bin2                257   51.3%    239   47.9%   496   49.6%      0.28   1,000
------------------------------------------------------------------------------

Last edited by Jeff Pitblado (StataCorp); 15 Jul 2023, 01:03.

Comment

Matthew Wallace

Join Date: Jul 2023

Posts: 16
#12

17 Jul 2023, 18:28

This works wonderfully. I was worried my request would greatly expand the code needed to create the table, but this is all easy to understand. One thing I'd like to know is if the total column can be moved and/or labeled differently? I'd like to move it before the male/female split, but I can't find where you specify it in the program. Thanks again for all your help with this.
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 658

#13

17 Jul 2023, 23:04

In the following, I added support for options first and label() in option by().
I also added a clue when there is a syntax error in a specified grouped() option.

Code:

 *! 1.0.0  14jul2023
program mw_table
    version 17

    syntax ,            ///
        by(string asis)        ///
    [                ///
        BINary(varlist)        ///
        CATegorical(varlist)    ///
        CONTinuous(varlist)    ///
        GROUPed(string asis)    ///
        *            ///
    ]

    capture noisily ParseByOption `by'
    if c(rc) {
        di as err "in option {bf:by()}"
        exit c(rc)
    }
    local by = s(by)
    local byfirst "`s(first)'"
    local bylabel "`s(label)'"

    // Do not allow variables to be specified in more than one
    // option.

    local duplist binary categorical continuous
    local k_duplist : list sizeof duplist

    forval i = 1/`k_duplist' {
        local opt1 : word `i' of `duplist'
        forval j = `=`i'+1'/`k_duplist' {
            local opt2 : word `j' of `duplist'
            local both : list `opt1' & `opt2'
            CheckDupVars "`both'" `opt1'() `opt2'()
        }
    }

    // Parse -grouped()- options.
    // Check that grouped variables are not specified in the other
    // options.
    // Check that grouped names are not specified in
    // the other options.
    // Remaining options go to -table-.

    local gid 0
    while `:length local grouped' {
        local ++gid
        capture noisily ParseGroupOption `grouped'
        if c(rc) {
            di as err "in option {bf:grouped()}"
            exit c(rc)
        }
        local group`gid'vars = s(varlist)
        local group`gid'name = s(name)
        local group`gid'label = s(label)
        forval i = 1/`k_duplist' {
            local opt1 : word `i' of `duplist'
            local both : list `opt1' & group`gid'vars
            CheckDupVars "`both'" `opt1'() group()
        }
        local both : list allgroupvars & group`gid'vars
        CheckDupVars "`both'" group() group()
        CheckDupNames `group`gid'name' `allgroupnames'
        local allgroupvars `allgroupvars' `group`gid'vars'
        local allgroupnames `allgroupnames' `group`gid'name'
        local 0 `", `options'"'
        syntax [, GROUPed(string asis) * ]
    }
    local k_grouped = `gid'

    // Check that grouped names are not also being used as
    // variables.

    local both : list allgroupnames & binary
    CheckNameVarConflict "`both'" binary()
    local both : list allgroupnames & categorical
    CheckNameVarConflict "`both'" categorical()
    local both : list allgroupnames & continuous
    CheckNameVarConflict "`both'" continuous()
    local both : list allgroupnames & allgroupvars
    CheckNameVarConflict "`both'" group()

    // Build the call to -table-.

    if `:list sizeof continuous' {
        local CVopts    statistic(mean `continuous') ///
                statistic(sd `continuous')
    }
    local fvlist `binary' `categorical' `allgroupvars'
    if `:list sizeof fvlist' {
        local FVopts    statistic(fvfrequency `fvlist') ///
                statistic(fvpercent `fvlist')
    }

    quietly table () (`by' result), `CVopts' `FVopts' `options'

    if `:list sizeof continuous' {
        // -anova- needs a numercial by variable.
        local bytype : type `by'
        if substr("`bytype'",1,3) == "str" {
            tempvar numby
            encode `by', generate(`numby')
        }
        else {
            local numby `by'
        }

        // continuous variables layout specification
        local contspec var
        collect style autolevels var `continuous', clear
    }

    foreach x of local continuous {
        quietly ///
        collect nobs=(e(N)) p=Ftail(e(df_m),e(df_r),e(F)) ///
            , tag(var[`x']) ///
            : anova `x' `numby'
    }

    foreach x of local categorical {
        quietly tabulate `x' `by', chi2
        collect get nobs=(r(N)) p=(r(p)), tag(`x'[_hide])
        collect style header `x', title(label)
        quietly collect levelsof `x'
        collect style autolevels `x' _hide `s(levels)', clear
    }

    foreach x of local binary {
        quietly tabulate `x' `by', chi2
        collect get nobs=(r(N)) p=(r(p)), tag(var[1.`x'])
        collect style header `x'[1], title(label) level(hide)
        collect style autolevels `x' 1, clear
    }

    foreach x of local allgroupvars {
        quietly tabulate `x' `by', chi2
        collect get nobs=(r(N)) p=(r(p)), tag(var[1.`x'])
        collect style autolevels `x' 1, clear
    }

    forval i = 1/`k_grouped' {
        local vars : copy local group`i'vars
        local name : copy local group`i'name
        local label : copy local group`i'label
        local j 0
        foreach x of local vars {
            local ++j
            local lab : variable label `x'
            if `"`lab'"' == "" {
                local lab `x'
            }
            quietly collect remap `x'[1] = `name'[`j']
            collect label levels `name' `j' `"`lab'"', modify
        }
        collect label dim `name' `"`label'"', modify
    }

    // Define some composites for the columns. This allows us to
    // let -table- handle result formats.

    collect composite define col1 = mean fvfrequency
    collect composite define col2 = sd fvpercent

    // p-value styles

    collect style cell result[p], nformat(%6.2f)
    collect label levels result p "p-value", modify

    // sample size styles

    collect style cell result[nobs], nformat(%18.0fc)
    collect label levels result nobs "N", modify

    // header styles

    collect style header `by', title(hide)
    collect style header result[col1 col2], level(hide)
    collect style row stack, nobinder spacer

    // border style

    collect style cell border_block, border(right, pattern(nil))

    // handle by Total styles

    if "`byfirst'" != "" {
        quietly collect levels `by'
        collect style autolevels `by' .m `s(levels)', clear
    }
    if `"`bylabel'"' != "" {
        collect label levels `by' .m `"`bylabel'"', modify
    }

    collect layout ///
        (`contspec' `categorical' `allgroupnames' `binary') ///
        (`by'#result[col1 col2] result[p nobs])
end

program ParseByOption, sclass
    syntax varname [, first label(string)]
    sreturn local by "`varlist'"
    sreturn local first `"`first'"'
    sreturn local label `"`label'"'
end

program ParseGroupOption, sclass
    syntax varlist , name(name) [label(string)]
    sreturn local varlist `"`varlist'"'
    sreturn local name `"`name'"'
    sreturn local label `"`label'"'
end

program CheckDupVars
    args vars opt1 opt2

    local k : list sizeof vars
    if `k' == 0 {
        exit
    }
    if `k' > 1 {
        local s s
    }
    if "`opt1'" == "`opt2'" {
        di as err ///
        "variable`s' duplicated in separate {bf:`opt1'} options"
    }
    else {
        di as err ///
        "variable`s' duplicated in options {bf:`opt1'} and {bf:`opt2'}"
    }
    di as err "{p}offending variable`s': {bf:`vars'}{p_end}"
    exit 198
end

program CheckDupNames
    gettoken first rest : 0
    if `:list posof "`first'" in rest' == 0 {
        exit
    }
    di as err "name {bf:`first'} used in more than one {bf:group()} option"
    exit 198
end

program CheckNameVarConflict
    args found opt

    local k : list sizeof found
    if `k' == 0 {
        exit
    }
    gettoken first : found
    di as err "{p}"
    di as err "{bf:grouped()} suboption {bf:name(`first')} is not allowed;{break}"
    di as err "variable {bf:`first'} was specified in option {bf:`opt'}"
    di as err "{p_end}"
    exit 198
end

exit

Here are these new sub-options in action using the above example.

Code:

mw_table, ///
        by(female, first label(Overall)) ///
        binary(bin?) ///
        categorical(cat?) ///
        continuous(cont?) ///
        grouped(nib?, name(nibs) label(Group1 indicators)) ///
        grouped(bib?, name(bibs) label(Group2 indicators)) ///
        nformat(%6.1f mean sd) ///
        nformat(%6.1f fvpercent percent) ///
        sformat("%s%%" fvpercent percent) ///
        sformat("(%s)" sd) ///
        name(mytable)

Here is the resulting table.

Code:

------------------------------------------------------------------------------
                     Overall         Male          Female      p-value       N
------------------------------------------------------------------------------
cont1              5.0   (2.1)    4.5   (2.1)    5.5   (2.0)      0.00     956
cont2              0.4   (3.1)   -0.1   (3.0)    0.9   (3.1)      0.00   1,000
cont3              0.0   (1.0)    0.1   (1.0)   -0.0   (1.1)      0.20   1,000
                                                                              
cat1                                                              0.00   1,000
  1                 65    6.5%     65   13.0%      0    0.0%                  
  2                414   41.4%    208   41.5%    206   41.3%                  
  3                521   52.1%    228   45.5%    293   58.7%                  
                                                                              
cat2                                                              0.05   1,000
  1                272   27.2%    143   28.5%    129   25.9%                  
  2                228   22.8%     99   19.8%    129   25.9%                  
  3                235   23.5%    113   22.6%    122   24.4%                  
  4                265   26.5%    146   29.1%    119   23.8%                  
                                                                              
cat3                                                              0.24   1,000
  11               213   21.3%    104   20.8%    109   21.8%                  
  12               194   19.4%     88   17.6%    106   21.2%                  
  13               194   19.4%     98   19.6%     96   19.2%                  
  14               198   19.8%     97   19.4%    101   20.2%                  
  15               201   20.1%    114   22.8%     87   17.4%                  
                                                                              
Group1 indicators                                                             
  nib1             290   29.0%    152   30.3%    138   27.7%      0.35   1,000
  nib2             215   21.5%    143   28.5%     72   14.4%      0.00   1,000
  nib3             264   26.4%    130   25.9%    134   26.9%      0.75   1,000
                                                                              
Group2 indicators                                                             
  bib1             710   71.0%    349   69.7%    361   72.3%      0.35   1,000
  bib2             785   78.5%    358   71.5%    427   85.6%      0.00   1,000
  bib3             736   73.6%    371   74.1%    365   73.1%      0.75   1,000
                                                                              
bin1               645   64.5%    284   56.7%    361   72.3%      0.00   1,000
                                                                              
bin2               496   49.6%    257   51.3%    239   47.9%      0.28   1,000
------------------------------------------------------------------------------

Comment

Matthew Wallace

Join Date: Jul 2023

Posts: 16
#14

18 Jul 2023, 17:11

Perfect. Thanks so much, Jeff. I've been trying to create a table 1 just like this for a long time in Stata and am very glad to finally have one.
Comment
Matthew Wallace

Join Date: Jul 2023

Posts: 16
#15

21 Jul 2023, 20:52

Back again with two more requests.

1) How would I go about displaying levels of categorical variables with 0 observations? I'd want it to look something like pictured below, where I've added a level to cat2 labelled "3.5" with 0 observations in that level:

2) Possibly a bit trickier, but is there a way I can add an additional row for the continuous variables for statistics like the range and IQR? I know I can add those statistics in this loop as such (and use collect composite to combine the min and max into a range):

Code:

if `:list sizeof continuous' { local CVopts statistic(mean `continuous') /// statistic(sd `continuous') /// statistic(min `continuous') /// statistic(max `continuous') /// statistic(iqr `continuous') }

but I'm unsure if I can get them on their own row as pictured below:

Last edited by Matthew Wallace; 21 Jul 2023, 20:56.
Comment

Announcement

Formatting binary and categorical variables and p-values using table

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment