DTable and Collect to Extract Which Test Was Applied For Which Variable

Kevin Blaine

Join Date: Jul 2017

Posts: 66
#1

DTable and Collect to Extract Which Test Was Applied For Which Variable

07 Apr 2025, 17:02

Hey there Stata friends (specifically Jeff Pitblado (StataCorp)!) -- I'm wondering if anyone has been successful in storing the type of test applied to which variable in a dimension/level within the table/dtable/etable command. As an example, I'd want to create a new table of which test was applied by variable or perhaps notate specific variables received a particular test in the notes.

Using the auto.dta dataset, I imagine I could do something like:

Code:

clear all sysuse auto dtable i.rep78 mpg price, /// by(foreign) /// cont(mpg, stat(mean sd) test(regress)) /// cont(price, stat(median iqr) test(kwallis)) INSERT BUNCH OF COLLECT SUBCOMMANDS TO FORMAT THINGS collect preview

I know the original dtable command displays as a "note" all the tests that were done at the top of the initial dtable table preview, but I haven't yet found a way to store that elsewhere. Any ideas?

Thanks!
Tags: None
Kevin Blaine

Join Date: Jul 2017

Posts: 66
#2

10 Apr 2025, 09:08

Hey there Jeff Pitblado (StataCorp) -- just following up on this! Of course, anyone else please chime in too.
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 700

10 Apr 2025, 10:44

Thanks for the example.

Here is what I came up with.

Code:

clear all

sysuse auto

* add -tests- option in -by()-, so that the tests are performed
dtable i.rep78 mpg price, ///
    by(foreign, tests) ///
    cont(mpg, stat(mean sd) test(regress)) ///
    cont(price, stat(median iqr) test(kwallis))

* get the layout; note -var- is used in the row specification, the only
* other thing we care about is -result- (currently in the column
* specification); -foreign- is the -by()- variable that we can now
* ignore since the test results exist at a unique level of -foreign- for
* each -var- level
collect layout

* list all the -result- levels; note levels -regress- and -kwallis- are
* the names of the test results
collect label list result, all

* change the header styles so we see the variable names instead of their
* labels
collect style header var, level(value)

* change the header styles so we see the result names
collect style header result, level(value)

* change the row header style to show the headers side-by-side instead
* of stacked
collect style row split

* change the layout to show variable names, test results of interest,
* and their values
collect layout (var#result[kwallis regress])

Here is the resulting table.

Code:

--------------------
mpg   regress <0.001
price kwallis  0.298
--------------------

Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 700

10 Apr 2025, 10:55

I forgot about the Pearson test in the example.

Code:

clear all

sysuse auto

* add -tests- option in -by()-, so that the tests are performed
dtable i.rep78 mpg price, ///
    by(foreign, tests) ///
    cont(mpg, stat(mean sd) test(regress)) ///
    cont(price, stat(median iqr) test(kwallis))

* get the layout; note -var- is used in the row specification, the only
* other thing we care about is -result- (currently in the column
* specification); -foreign- is the -by()- variable that we can now
* ignore since the test results exist at a unique level of -foreign- for
* each -var- level
collect layout

* list all the -result- levels; note levels -regress- and -kwallis- are
* the names of the test results
collect label list result, all

* change the header styles so we see the variable names instead of their
* labels
collect style header var, level(value)
collect style header rep78, title(name) level(hide)

* change the header styles so we see the result names
collect style header result, level(value)

* change the row header style to show the headers side-by-side instead
* of stacked; -binder()- here prevents extra column in factor row headers
collect style row split, binder(=)

* change the layout to show variable names, test results of interest,
* and their values
collect layout (var#result[pearson kwallis regress])

Resulting table.

Code:

--------------------
rep78 pearson <0.001
mpg   regress <0.001
price kwallis  0.298
--------------------

Comment

Kevin Blaine

Join Date: Jul 2017

Posts: 66
#5

10 Apr 2025, 11:52

Jeff -- you are a rock star. I literally don't know what I would do without DTable and all the amazing work put into it.
Comment
Kevin Blaine

Join Date: Jul 2017

Posts: 66
#6

10 Apr 2025, 11:55

One more thing, and this may complicate things a bit. Is there a way to add a symbol to the end of the variable label/name (or if not, an additional column that contains a symbol) that could represent which test was used? I could then use the notes section to explain the symbols. I think even better would be specifying WHICH test gets a symbol -- for example, if I only wanted to mark the variables undergoing non-parametric tests.

Is there any way of storing those tests and associated variables in a local?
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 700

10 Apr 2025, 14:12

Before you call dtable, you are in control of which test is applied to each variable you specify. However, I see how it would be nice to know which test was applied to each variable in your table, especially if you were given a collection by a colleague--instead of calling dtable yourself.

While you cannot automatically provide test-specific augmentations to the variable names in row headers, you can apply string formats to items in the table with the sformat() option of collect style cell.

Here is an example of how this can be done.

Code:

clear all
sysuse auto

dtable i.rep78 mpg price, ///
    by(foreign, tests) ///
    cont(mpg, stat(mean sd) test(regress)) ///
    cont(price, stat(median iqr) test(kwallis))

* ₁ is unicode character u2081
collect style cell result[kwallis], sformat("%s₁")
collect note "Test₁ p-values from Kruskal-Wallis test"

* ₂ is unicode character u2082
collect style cell result[regress], sformat("%s₂")
collect note "Test₂ p-values from Wald test"

* ₃ is unicode character u2083
collect style cell result[pearson], sformat("%s₃")
collect note "Test₃ p-values from Pearson test"

collect preview

Here is the resulting table.

Code:

--------------------------------------------------------------------------------------
                                                Car origin                            
                         Domestic            Foreign              Total          Test
--------------------------------------------------------------------------------------
N                           52 (70.3%)          22 (29.7%)         74 (100.0%)        
Repair record 1978                                                                    
  1                           2 (4.2%)            0 (0.0%)            2 (2.9%) <0.001₃
  2                          8 (16.7%)            0 (0.0%)           8 (11.6%)        
  3                         27 (56.2%)           3 (14.3%)          30 (43.5%)        
  4                          9 (18.8%)           9 (42.9%)          18 (26.1%)        
  5                           2 (4.2%)           9 (42.9%)          11 (15.9%)        
Mileage (mpg)           19.827 (4.743)      24.773 (6.611)      21.297 (5.786) <0.001₂
Price              4,782.500 2,050.000 5,759.000 2,641.000 5,006.500 2,147.000  0.298₁
--------------------------------------------------------------------------------------
Test₁ p-values from Kruskal-Wallis test
Test₂ p-values from Wald test
Test₃ p-values from Pearson test

Last edited by Jeff Pitblado (StataCorp); 10 Apr 2025, 14:15.

Comment

Kevin Blaine

Join Date: Jul 2017

Posts: 66
#8

10 Apr 2025, 14:16

This is EXACTLY what I'm looking for. Thanks a ton!!
Comment

Announcement