How to combine several dtable tables into a single one?

Yuduan Pu

Join Date: Dec 2022

Posts: 8
#1

How to combine several dtable tables into a single one?

27 Jun 2023, 18:47

Hello everyone, I am new to using Stata 18, so please bear with me if my question seems naive. Currently, I am attempting to utilize the new "dtable" function to generate descriptive statistics. However, I am encountering difficulties when it comes to combine multiple descriptive statistics tables into a single table. I have categorized my observations into three groups: Science, STEM, and Social Science & Arts. Each group has a corresponding dummy variable. The issue is that these categories overlap; an observation labeled as 'Science' may also fall under 'STEM' or 'Social Science.' Consequently, I cannot simply combine them into a single variable and employ the "by (field) " command.

My code is very simple

Code:

dtable yrcurrinst yrphdaward acarank i.academicrank i.gender, by(science, nototals) dtable yrcurrinst yrphdaward acarank i.academicrank i.gender, by(art_social_science, nototals) dtable yrcurrinst yrphdaward acarank i.academicrank i.gender, by(stem, nototals) dtable yrcurrinst yrphdaward acarank i.academicrank i.gender

I want my first column to be total, the second one to be science, the third one to be STEM and the fourth one to be Social Science and Arts.

I am aware that manually copying and pasting the data into a spreadsheet is an option, but I am hoping to find a more efficient approach using Stata code. If anyone has any insights on how to accomplish this, I would greatly appreciate your assistance.
Tags: None

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 680

27 Jun 2023, 21:01

You do not share any data for us to provide you an example, so I'll use nhanes2.dta.

I'm guessing you do not want both categories of the variables science ...

Code:

webuse nhanes2

* build each table separately, and store them in named collections that
* we will combine later -- each collections needs to have their own
* level of a dimension (group) that we will use to layout the collumns
* in our table

* total sample
dtable weight bmi bpsystol i.race i.agegrp, name(total)
collect addtag group[Total]

* heart attack sample
dtable weight bmi bpsystol i.race i.agegrp if heartatk==1, name(heartatk)
collect addtag group[HeartATK]

* high blood pressure sample
dtable weight bmi bpsystol i.race i.agegrp if highbp==1, name(highbp)
collect addtag group[HighBP]

* high lead level sample
dtable weight bmi bpsystol i.race i.agegrp if highlead==1, name(highlead)
collect addtag group[HighLead]

* combine above collections
collect combine comb = total heartatk highbp highlead
* specify the order of the group levels (columns)
collect style autolevels group Total HeartATK HighBP HighLead
* hide the result labels
collect style header result[_dtable_stats], title(hide) level(hide)

* table layout
collect layout (var) (group#result)

* nicer column labels
collect label levels group ///
    HeartATK "Prior heart attack" ///
    HighBP "High blood pressure" ///
    HighLead "High lead level"
collect preview

Here are the first table

Code:

-------------------------------------------------------------------------------------------
                              Total          HeartATK          HighBP          HighLead    
-------------------------------------------------------------------------------------------
N                                 10,351              476            4,376              293
Weight (kg)              71.898 (15.356)  74.794 (15.242)  76.856 (16.241)  74.571 (15.201)
Body mass index (BMI)     25.538 (4.915)   26.399 (4.870)   27.361 (5.332)   25.172 (4.611)
Systolic blood pressure 130.882 (23.333) 140.813 (23.093) 150.539 (20.695) 133.601 (22.173)
Race                                                                                       
  White                    9,065 (87.6%)      424 (89.1%)    3,748 (85.6%)      227 (77.5%)
  Black                    1,086 (10.5%)        47 (9.9%)      541 (12.4%)       60 (20.5%)
  Other                       200 (1.9%)         5 (1.1%)        87 (2.0%)         6 (2.0%)
Age group                                                                                  
  20–29                    2,320 (22.4%)         1 (0.2%)       392 (9.0%)       54 (18.4%)
  30–39                    1,622 (15.7%)         4 (0.8%)      455 (10.4%)       42 (14.3%)
  40–49                    1,272 (12.3%)        20 (4.2%)      502 (11.5%)       50 (17.1%)
  50–59                    1,291 (12.5%)       69 (14.5%)      701 (16.0%)       52 (17.7%)
  60–69                    2,860 (27.6%)      265 (55.7%)    1,667 (38.1%)       77 (26.3%)
  70+                         986 (9.5%)      117 (24.6%)      659 (15.1%)        18 (6.1%)
-------------------------------------------------------------------------------------------

Here is the table with nice column labels

Code:

------------------------------------------------------------------------------------------------
                              Total      Prior heart attack High blood pressure  High lead level
------------------------------------------------------------------------------------------------
N                                 10,351                476               4,376              293
Weight (kg)              71.898 (15.356)    74.794 (15.242)     76.856 (16.241)  74.571 (15.201)
Body mass index (BMI)     25.538 (4.915)     26.399 (4.870)      27.361 (5.332)   25.172 (4.611)
Systolic blood pressure 130.882 (23.333)   140.813 (23.093)    150.539 (20.695) 133.601 (22.173)
Race                                                                                            
  White                    9,065 (87.6%)        424 (89.1%)       3,748 (85.6%)      227 (77.5%)
  Black                    1,086 (10.5%)          47 (9.9%)         541 (12.4%)       60 (20.5%)
  Other                       200 (1.9%)           5 (1.1%)           87 (2.0%)         6 (2.0%)
Age group                                                                                       
  20–29                    2,320 (22.4%)           1 (0.2%)          392 (9.0%)       54 (18.4%)
  30–39                    1,622 (15.7%)           4 (0.8%)         455 (10.4%)       42 (14.3%)
  40–49                    1,272 (12.3%)          20 (4.2%)         502 (11.5%)       50 (17.1%)
  50–59                    1,291 (12.5%)         69 (14.5%)         701 (16.0%)       52 (17.7%)
  60–69                    2,860 (27.6%)        265 (55.7%)       1,667 (38.1%)       77 (26.3%)
  70+                         986 (9.5%)        117 (24.6%)         659 (15.1%)        18 (6.1%)
------------------------------------------------------------------------------------------------

Comment

Yuduan Pu

Join Date: Dec 2022
Posts: 8

27 Jun 2023, 22:39

Originally posted by Jeff Pitblado (StataCorp) View Post

You do not share any data for us to provide you an example, so I'll use nhanes2.dta.

I'm guessing you do not want both categories of the variables science ...

Code:

webuse nhanes2

* build each table separately, and store them in named collections that
* we will combine later -- each collections needs to have their own
* level of a dimension (group) that we will use to layout the collumns
* in our table

* total sample
dtable weight bmi bpsystol i.race i.agegrp, name(total)
collect addtag group[Total]

* heart attack sample
dtable weight bmi bpsystol i.race i.agegrp if heartatk==1, name(heartatk)
collect addtag group[HeartATK]

* high blood pressure sample
dtable weight bmi bpsystol i.race i.agegrp if highbp==1, name(highbp)
collect addtag group[HighBP]

* high lead level sample
dtable weight bmi bpsystol i.race i.agegrp if highlead==1, name(highlead)
collect addtag group[HighLead]

* combine above collections
collect combine comb = total heartatk highbp highlead
* specify the order of the group levels (columns)
collect style autolevels group Total HeartATK HighBP HighLead
* hide the result labels
collect style header result[_dtable_stats], title(hide) level(hide)

* table layout
collect layout (var) (group#result)

* nicer column labels
collect label levels group ///
HeartATK "Prior heart attack" ///
HighBP "High blood pressure" ///
HighLead "High lead level"
collect preview

Here are the first table

Code:

-------------------------------------------------------------------------------------------
Total HeartATK HighBP HighLead
-------------------------------------------------------------------------------------------
N 10,351 476 4,376 293
Weight (kg) 71.898 (15.356) 74.794 (15.242) 76.856 (16.241) 74.571 (15.201)
Body mass index (BMI) 25.538 (4.915) 26.399 (4.870) 27.361 (5.332) 25.172 (4.611)
Systolic blood pressure 130.882 (23.333) 140.813 (23.093) 150.539 (20.695) 133.601 (22.173)
Race
White 9,065 (87.6%) 424 (89.1%) 3,748 (85.6%) 227 (77.5%)
Black 1,086 (10.5%) 47 (9.9%) 541 (12.4%) 60 (20.5%)
Other 200 (1.9%) 5 (1.1%) 87 (2.0%) 6 (2.0%)
Age group
20–29 2,320 (22.4%) 1 (0.2%) 392 (9.0%) 54 (18.4%)
30–39 1,622 (15.7%) 4 (0.8%) 455 (10.4%) 42 (14.3%)
40–49 1,272 (12.3%) 20 (4.2%) 502 (11.5%) 50 (17.1%)
50–59 1,291 (12.5%) 69 (14.5%) 701 (16.0%) 52 (17.7%)
60–69 2,860 (27.6%) 265 (55.7%) 1,667 (38.1%) 77 (26.3%)
70+ 986 (9.5%) 117 (24.6%) 659 (15.1%) 18 (6.1%)
-------------------------------------------------------------------------------------------

Here is the table with nice column labels

Code:

------------------------------------------------------------------------------------------------
Total Prior heart attack High blood pressure High lead level
------------------------------------------------------------------------------------------------
N 10,351 476 4,376 293
Weight (kg) 71.898 (15.356) 74.794 (15.242) 76.856 (16.241) 74.571 (15.201)
Body mass index (BMI) 25.538 (4.915) 26.399 (4.870) 27.361 (5.332) 25.172 (4.611)
Systolic blood pressure 130.882 (23.333) 140.813 (23.093) 150.539 (20.695) 133.601 (22.173)
Race
White 9,065 (87.6%) 424 (89.1%) 3,748 (85.6%) 227 (77.5%)
Black 1,086 (10.5%) 47 (9.9%) 541 (12.4%) 60 (20.5%)
Other 200 (1.9%) 5 (1.1%) 87 (2.0%) 6 (2.0%)
Age group
20–29 2,320 (22.4%) 1 (0.2%) 392 (9.0%) 54 (18.4%)
30–39 1,622 (15.7%) 4 (0.8%) 455 (10.4%) 42 (14.3%)
40–49 1,272 (12.3%) 20 (4.2%) 502 (11.5%) 50 (17.1%)
50–59 1,291 (12.5%) 69 (14.5%) 701 (16.0%) 52 (17.7%)
60–69 2,860 (27.6%) 265 (55.7%) 1,667 (38.1%) 77 (26.3%)
70+ 986 (9.5%) 117 (24.6%) 659 (15.1%) 18 (6.1%)
------------------------------------------------------------------------------------------------

Thank you so much, this is very helpful, I totally understand the logic behind those codes. However, somehow this does not apply to my data set. Here is what my code looks like

clear
*generate science and stem variable
gen science=1 if inlist(chair, 1, 2, 3)
gen stem=1 if inlist(chair, 1, 2, 3, 5, 6, 7)

preserve
**in my original dataset, I was collecting publications of many professors, but here I am doing biographic information of those professors. Therefore I have to use collapse
collapse gender science stem, by(chair scholar_id)
collect clear
dtable i.gender, name(total)
collect addtag group[Total]
dtable i.gender if science==1, name(science)
collect addtag group[Science]
dtable i.gender if stem==1, name(stem)
collect addtag group[Stem]
collect combine comb = total science stem
collect style autolevels group Total science stem
collect style header result[_dtable_stats], title(hide) level(hide)
collect layout(var)(group#result)

restore

it's telling me "collect layout(var)(group#result) not recognized", I do not understand why.
When i use:

collect preview

,
it's telling me:

Your layout specification does not uniquely match any items. One or more of the following dimensions might help uniquely match items: collection, colname, gender, group, statcmd.

I don't understand why is this happening. Can you help me with that? I have no clue what this is happening.

Here I attached the dataset file I generated after the collapse command so it's easier to understand the dataset. Hope this is helpful.

Attached Files

dtable trail.dta (13.0 KB, 1 view)

Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 680
#4

28 Jun 2023, 08:45

Your line

Code:

collect style autolevels group Total science stem

should capitalize science and stem

Code:

collect style autolevels group Total Science Stem

Also the line

Code:

collect layout(var)(group#result)

is missing some spaces

Code:

collect layout (var) (group#result)
Comment
Yuduan Pu

Join Date: Dec 2022

Posts: 8
#5

28 Jun 2023, 09:03

Originally posted by Jeff Pitblado (StataCorp) View Post

Your line

Code:

collect style autolevels group Total science stem

should capitalize science and stem

Code:

collect style autolevels group Total Science Stem

Also the line

Code:

collect layout(var)(group#result)

is missing some spaces

Code:

collect layout (var) (group#result)

Many thanks, it's really helpful and it solved my problem!
Comment
Maciej Koniewski

Join Date: Jul 2018

Posts: 11
#6

31 Mar 2025, 02:49

Dear Jeff Pitblado (StataCorp) thank you for this neat example. However, what should I change in your code if in different collection different variable names are used, e.g. in panel study where I need to stack side be side results from different variables in the same rows? Perhaps, this code line need to be changed: collect layout (var) (group#result)

Below I post example of table from one of my current research. I need to put responses into the same rows.

Attached Files

Zrzut ekranu 2025-03-31 o 10.50.58.png (0, 0 views)

Last edited by Maciej Koniewski; 31 Mar 2025, 02:57.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1309
#7

31 Mar 2025, 03:16

Maciej, your file is not visible. Please upload it again.
Comment
Maciej Koniewski

Join Date: Jul 2018

Posts: 11
#8

31 Mar 2025, 03:57

Hemanshu Kumar I am sorry, this should be visible both in the post and as an attachment. I am attaching again. Hope it works now.
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1309

31 Mar 2025, 04:42

Consider this:

Code:

clear all
webuse nhanes2
keep agegrp black
* construct an artificial situation where there are different variables for the different races
clonevar age_range = agegrp if black
replace agegrp = . if black

dtable i.age_range if black, name(black)
collect addtag group[Black]

dtable i.agegrp if !black, name(nonblack)
collect addtag group[NonBlack]
collect remap agegrp = age_range

collect combine comb = black nonblack
collect label levels group Black "Black" NonBlack "Non-Black"

collect style header result[_dtable_stats], level(hide)
collect style header age_range, title(label)
collect layout (var[_N] age_range) (group#result)

which produces:

Code:

. collect preview

-----------------------------------
             Black      Non-Black  
-----------------------------------
N               1,086         9,265
Age group                          
  20–29   286 (26.3%) 2,034 (22.0%)
  30–39   179 (16.5%) 1,443 (15.6%)
  40–49   124 (11.4%) 1,148 (12.4%)
  50–59   140 (12.9%) 1,151 (12.4%)
  60–69   260 (23.9%) 2,600 (28.1%)
  70+       97 (8.9%)    889 (9.6%)
-----------------------------------

Last edited by Hemanshu Kumar; 31 Mar 2025, 04:47.

Comment

Maciej Koniewski

Join Date: Jul 2018
Posts: 11

#10

31 Mar 2025, 12:01

Thank you Hemanshu Kumar. Unfortunately, I can not adapt your code. Attached please find my sample.dta data, Below two tables that I need to join. The problem occured with collect remap command.

Code:

dtable i.victNoPre firstGrVictPre JudeVictCntrPre i.assctAschwtzPre ///
i.nalkowskaPre i.whyRembAusPre, ///
    name(wave1) replace
    collect addtag group[wave1]

dtable i.victNoPst i.firstGrVictPst i.JudeVictCntrPst i.assctAschwtzPst ///
i.nalkowskaPst i.whyRembAusPst if zrodlo=="t1, t2" | zrodlo=="t1, t2, t3", ///
    name(wave2) replace
    collect addtag group[wave2]

Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1309

#11

01 Apr 2025, 03:51

Here is the code that works for me:

Code:

dtable i.victNoPre i.firstGrVictPre i.JudeVictCntrPre i.assctAschwtzPre ///
i.nalkowskaPre i.whyRembAusPre, ///
    name(wave1) replace
    collect addtag group[wave1]

dtable i.victNoPst i.firstGrVictPst i.JudeVictCntrPst i.assctAschwtzPst ///
i.nalkowskaPst i.whyRembAusPst if zrodlo=="t1, t2" | zrodlo=="t1, t2, t3", ///
    name(wave2) replace
    collect addtag group[wave2]

collect remap victNoPst = victNoPre
collect remap firstGrVictPst = firstGrVictPre
collect remap JudeVictCntrPst = JudeVictCntrPre
collect remap assctAschwtzPst = assctAschwtzPre
collect remap nalkowskaPst = nalkowskaPre
collect remap whyRembAusPst = whyRembAusPre

collect combine comb = wave1 wave2
collect label levels group wave1 "Wave 1" wave2 "Wave 2"

collect style header result[_dtable_stats], level(hide)
collect style header victNoPre firstGrVictPre JudeVictCntrPre assctAschwtzPre nalkowskaPre whyRembAusPre, title(label)
collect layout (var[_N] victNoPre firstGrVictPre JudeVictCntrPre assctAschwtzPre nalkowskaPre whyRembAusPre) (group#result)

which produces:

Code:

. collect preview

-----------------------------------------------------------------------------------------------------
                                                                                 Wave 1      Wave 2  
-----------------------------------------------------------------------------------------------------
N                                                                                     633         505
How many people do you think perished in the Auschwitz camp complex?                                 
  1. < niż 1 mln                                                               83 (13.1%)    5 (1.0%)
  2. 1 – 1,5 mln                                                              137 (21.6%) 420 (83.2%)
  3. > niż 1,5 mln – 3 mln                                                      44 (7.0%)   20 (4.0%)
  4. > niż 3 mln                                                              151 (23.9%)   31 (6.1%)
  8. Don't know                                                               218 (34.4%)   29 (5.7%)
Who, what national or ethnic group constituted the largest number of victims?                        
  1 Jews                                                                      538 (85.0%) 480 (95.0%)
  2 Poles                                                                      76 (12.0%)   22 (4.4%)
  3 Others                                                                       2 (0.3%)    3 (0.6%)
  9 Don't know/No answer                                                        17 (2.7%)            
Jews from what countries?                                                                            
  1 all Europe, various, all, incl. Poland                                    146 (27.1%) 265 (55.2%)
  2 Poland and others                                                         185 (34.4%) 155 (32.3%)
  3 only Poland                                                               112 (20.8%)   24 (5.0%)
  4 Israel, USA, etc.                                                           10 (1.9%)    1 (0.2%)
  9 Don't know/No answer                                                       85 (15.8%)   35 (7.3%)
What above all do you associate the word 'Auschwitz' with? Is it for you...?                         
  above all the site of martyrdom of the Polish nation                         69 (10.9%)   26 (5.1%)
  above all the site of destruction of the Jews                               230 (36.3%) 276 (54.7%)
  other associations (universalist, Polish-and-Jewish)                        329 (52.0%) 182 (36.0%)
  difficult to say (option not read out)                                         5 (0.8%)   21 (4.2%)
In your opinion, which statement BEST expresses what happened in Auschwitz?                          
  PEOPLE dealt this fate to PEOPLE                                            534 (84.4%) 408 (80.8%)
  GERMANS dealt this fate to JEWS                                                7 (1.1%)    8 (1.6%)
  GERMANS dealt this fate to POLES                                               3 (0.5%)    1 (0.2%)
  GERMANS dealt this fate to PEOPLE                                            84 (13.3%)  87 (17.2%)
  difficult to say (option not read out)                                         5 (0.8%)    1 (0.2%)
Why ABOVE ALL should Auschwitz be remembered?                                                        
  To know WHO did it, WHAT they did and to WHOM                                91 (14.4%)   37 (7.3%)
  That such crimes would happen NEVER AGAIN                                   540 (85.3%) 405 (80.2%)
  difficult to say (option not read out)                                         2 (0.3%)  63 (12.5%)
-----------------------------------------------------------------------------------------------------

Comment

Maciej Koniewski

Join Date: Jul 2018

Posts: 11
#12

01 Apr 2025, 08:57

Thank you Hemanshu Kumar! The code works perfectly fine. One think I don't get however is why remap works in one direction, whilst not in the other one?

Code:

. collect remap victNoPre = victNoPst (dimension victNoPre not found) (0 items remapped in collection wave2) . collect remap victNoPst = victNoPre //works fine (10 items remapped in collection wave2)
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1309
#13

01 Apr 2025, 13:39

In the code in #11, at the point that the collect remap commands are executed, the active collection is wave2, so you can remap from the *Pst dimensions that exist in wave2, to *Pre. If you want to go the other way, you would first need to activate the other collection, as follows:

Code:

collect set wave1
1 like
Comment
Maciej Koniewski

Join Date: Jul 2018

Posts: 11
#14

01 Apr 2025, 23:50

Thank you Hemanshu Kumar! Your help is much appreciated.

Best regards,
mk
Comment

Announcement