Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to combine several dtable tables into a single one?

    Hello everyone, I am new to using Stata 18, so please bear with me if my question seems naive. Currently, I am attempting to utilize the new "dtable" function to generate descriptive statistics. However, I am encountering difficulties when it comes to combine multiple descriptive statistics tables into a single table. I have categorized my observations into three groups: Science, STEM, and Social Science & Arts. Each group has a corresponding dummy variable. The issue is that these categories overlap; an observation labeled as 'Science' may also fall under 'STEM' or 'Social Science.' Consequently, I cannot simply combine them into a single variable and employ the "by (field) " command.

    My code is very simple

    Code:
    dtable yrcurrinst yrphdaward acarank i.academicrank i.gender, by(science, nototals) 
    dtable yrcurrinst yrphdaward acarank i.academicrank i.gender, by(art_social_science, nototals) 
    dtable yrcurrinst yrphdaward acarank i.academicrank i.gender, by(stem, nototals) 
    dtable yrcurrinst yrphdaward acarank i.academicrank i.gender
    I want my first column to be total, the second one to be science, the third one to be STEM and the fourth one to be Social Science and Arts.

    I am aware that manually copying and pasting the data into a spreadsheet is an option, but I am hoping to find a more efficient approach using Stata code. If anyone has any insights on how to accomplish this, I would greatly appreciate your assistance.

  • #2
    You do not share any data for us to provide you an example, so I'll use nhanes2.dta.

    I'm guessing you do not want both categories of the variables science ...

    Code:
    webuse nhanes2
    
    * build each table separately, and store them in named collections that
    * we will combine later -- each collections needs to have their own
    * level of a dimension (group) that we will use to layout the collumns
    * in our table
    
    * total sample
    dtable weight bmi bpsystol i.race i.agegrp, name(total)
    collect addtag group[Total]
    
    * heart attack sample
    dtable weight bmi bpsystol i.race i.agegrp if heartatk==1, name(heartatk)
    collect addtag group[HeartATK]
    
    * high blood pressure sample
    dtable weight bmi bpsystol i.race i.agegrp if highbp==1, name(highbp)
    collect addtag group[HighBP]
    
    * high lead level sample
    dtable weight bmi bpsystol i.race i.agegrp if highlead==1, name(highlead)
    collect addtag group[HighLead]
    
    * combine above collections
    collect combine comb = total heartatk highbp highlead
    * specify the order of the group levels (columns)
    collect style autolevels group Total HeartATK HighBP HighLead
    * hide the result labels
    collect style header result[_dtable_stats], title(hide) level(hide)
    
    * table layout
    collect layout (var) (group#result)
    
    * nicer column labels
    collect label levels group ///
        HeartATK "Prior heart attack" ///
        HighBP "High blood pressure" ///
        HighLead "High lead level"
    collect preview
    Here are the first table
    Code:
    -------------------------------------------------------------------------------------------
                                  Total          HeartATK          HighBP          HighLead    
    -------------------------------------------------------------------------------------------
    N                                 10,351              476            4,376              293
    Weight (kg)              71.898 (15.356)  74.794 (15.242)  76.856 (16.241)  74.571 (15.201)
    Body mass index (BMI)     25.538 (4.915)   26.399 (4.870)   27.361 (5.332)   25.172 (4.611)
    Systolic blood pressure 130.882 (23.333) 140.813 (23.093) 150.539 (20.695) 133.601 (22.173)
    Race                                                                                       
      White                    9,065 (87.6%)      424 (89.1%)    3,748 (85.6%)      227 (77.5%)
      Black                    1,086 (10.5%)        47 (9.9%)      541 (12.4%)       60 (20.5%)
      Other                       200 (1.9%)         5 (1.1%)        87 (2.0%)         6 (2.0%)
    Age group                                                                                  
      20–29                    2,320 (22.4%)         1 (0.2%)       392 (9.0%)       54 (18.4%)
      30–39                    1,622 (15.7%)         4 (0.8%)      455 (10.4%)       42 (14.3%)
      40–49                    1,272 (12.3%)        20 (4.2%)      502 (11.5%)       50 (17.1%)
      50–59                    1,291 (12.5%)       69 (14.5%)      701 (16.0%)       52 (17.7%)
      60–69                    2,860 (27.6%)      265 (55.7%)    1,667 (38.1%)       77 (26.3%)
      70+                         986 (9.5%)      117 (24.6%)      659 (15.1%)        18 (6.1%)
    -------------------------------------------------------------------------------------------
    Here is the table with nice column labels
    Code:
    ------------------------------------------------------------------------------------------------
                                  Total      Prior heart attack High blood pressure  High lead level
    ------------------------------------------------------------------------------------------------
    N                                 10,351                476               4,376              293
    Weight (kg)              71.898 (15.356)    74.794 (15.242)     76.856 (16.241)  74.571 (15.201)
    Body mass index (BMI)     25.538 (4.915)     26.399 (4.870)      27.361 (5.332)   25.172 (4.611)
    Systolic blood pressure 130.882 (23.333)   140.813 (23.093)    150.539 (20.695) 133.601 (22.173)
    Race                                                                                            
      White                    9,065 (87.6%)        424 (89.1%)       3,748 (85.6%)      227 (77.5%)
      Black                    1,086 (10.5%)          47 (9.9%)         541 (12.4%)       60 (20.5%)
      Other                       200 (1.9%)           5 (1.1%)           87 (2.0%)         6 (2.0%)
    Age group                                                                                       
      20–29                    2,320 (22.4%)           1 (0.2%)          392 (9.0%)       54 (18.4%)
      30–39                    1,622 (15.7%)           4 (0.8%)         455 (10.4%)       42 (14.3%)
      40–49                    1,272 (12.3%)          20 (4.2%)         502 (11.5%)       50 (17.1%)
      50–59                    1,291 (12.5%)         69 (14.5%)         701 (16.0%)       52 (17.7%)
      60–69                    2,860 (27.6%)        265 (55.7%)       1,667 (38.1%)       77 (26.3%)
      70+                         986 (9.5%)        117 (24.6%)         659 (15.1%)        18 (6.1%)
    ------------------------------------------------------------------------------------------------

    Comment


    • #3
      Originally posted by Jeff Pitblado (StataCorp) View Post
      You do not share any data for us to provide you an example, so I'll use nhanes2.dta.

      I'm guessing you do not want both categories of the variables science ...

      Code:
      webuse nhanes2
      
      * build each table separately, and store them in named collections that
      * we will combine later -- each collections needs to have their own
      * level of a dimension (group) that we will use to layout the collumns
      * in our table
      
      * total sample
      dtable weight bmi bpsystol i.race i.agegrp, name(total)
      collect addtag group[Total]
      
      * heart attack sample
      dtable weight bmi bpsystol i.race i.agegrp if heartatk==1, name(heartatk)
      collect addtag group[HeartATK]
      
      * high blood pressure sample
      dtable weight bmi bpsystol i.race i.agegrp if highbp==1, name(highbp)
      collect addtag group[HighBP]
      
      * high lead level sample
      dtable weight bmi bpsystol i.race i.agegrp if highlead==1, name(highlead)
      collect addtag group[HighLead]
      
      * combine above collections
      collect combine comb = total heartatk highbp highlead
      * specify the order of the group levels (columns)
      collect style autolevels group Total HeartATK HighBP HighLead
      * hide the result labels
      collect style header result[_dtable_stats], title(hide) level(hide)
      
      * table layout
      collect layout (var) (group#result)
      
      * nicer column labels
      collect label levels group ///
      HeartATK "Prior heart attack" ///
      HighBP "High blood pressure" ///
      HighLead "High lead level"
      collect preview
      Here are the first table
      Code:
      -------------------------------------------------------------------------------------------
      Total HeartATK HighBP HighLead
      -------------------------------------------------------------------------------------------
      N 10,351 476 4,376 293
      Weight (kg) 71.898 (15.356) 74.794 (15.242) 76.856 (16.241) 74.571 (15.201)
      Body mass index (BMI) 25.538 (4.915) 26.399 (4.870) 27.361 (5.332) 25.172 (4.611)
      Systolic blood pressure 130.882 (23.333) 140.813 (23.093) 150.539 (20.695) 133.601 (22.173)
      Race
      White 9,065 (87.6%) 424 (89.1%) 3,748 (85.6%) 227 (77.5%)
      Black 1,086 (10.5%) 47 (9.9%) 541 (12.4%) 60 (20.5%)
      Other 200 (1.9%) 5 (1.1%) 87 (2.0%) 6 (2.0%)
      Age group
      20–29 2,320 (22.4%) 1 (0.2%) 392 (9.0%) 54 (18.4%)
      30–39 1,622 (15.7%) 4 (0.8%) 455 (10.4%) 42 (14.3%)
      40–49 1,272 (12.3%) 20 (4.2%) 502 (11.5%) 50 (17.1%)
      50–59 1,291 (12.5%) 69 (14.5%) 701 (16.0%) 52 (17.7%)
      60–69 2,860 (27.6%) 265 (55.7%) 1,667 (38.1%) 77 (26.3%)
      70+ 986 (9.5%) 117 (24.6%) 659 (15.1%) 18 (6.1%)
      -------------------------------------------------------------------------------------------
      Here is the table with nice column labels
      Code:
      ------------------------------------------------------------------------------------------------
      Total Prior heart attack High blood pressure High lead level
      ------------------------------------------------------------------------------------------------
      N 10,351 476 4,376 293
      Weight (kg) 71.898 (15.356) 74.794 (15.242) 76.856 (16.241) 74.571 (15.201)
      Body mass index (BMI) 25.538 (4.915) 26.399 (4.870) 27.361 (5.332) 25.172 (4.611)
      Systolic blood pressure 130.882 (23.333) 140.813 (23.093) 150.539 (20.695) 133.601 (22.173)
      Race
      White 9,065 (87.6%) 424 (89.1%) 3,748 (85.6%) 227 (77.5%)
      Black 1,086 (10.5%) 47 (9.9%) 541 (12.4%) 60 (20.5%)
      Other 200 (1.9%) 5 (1.1%) 87 (2.0%) 6 (2.0%)
      Age group
      20–29 2,320 (22.4%) 1 (0.2%) 392 (9.0%) 54 (18.4%)
      30–39 1,622 (15.7%) 4 (0.8%) 455 (10.4%) 42 (14.3%)
      40–49 1,272 (12.3%) 20 (4.2%) 502 (11.5%) 50 (17.1%)
      50–59 1,291 (12.5%) 69 (14.5%) 701 (16.0%) 52 (17.7%)
      60–69 2,860 (27.6%) 265 (55.7%) 1,667 (38.1%) 77 (26.3%)
      70+ 986 (9.5%) 117 (24.6%) 659 (15.1%) 18 (6.1%)
      ------------------------------------------------------------------------------------------------
      Thank you so much, this is very helpful, I totally understand the logic behind those codes. However, somehow this does not apply to my data set. Here is what my code looks like
      clear
      *generate science and stem variable
      gen science=1 if inlist(chair, 1, 2, 3)
      gen stem=1 if inlist(chair, 1, 2, 3, 5, 6, 7)

      preserve
      **in my original dataset, I was collecting publications of many professors, but here I am doing biographic information of those professors. Therefore I have to use collapse
      collapse gender science stem, by(chair scholar_id)
      collect clear
      dtable i.gender, name(total)
      collect addtag group[Total]
      dtable i.gender if science==1, name(science)
      collect addtag group[Science]
      dtable i.gender if stem==1, name(stem)
      collect addtag group[Stem]
      collect combine comb = total science stem
      collect style autolevels group Total science stem
      collect style header result[_dtable_stats], title(hide) level(hide)
      collect layout(var)(group#result)

      restore
      it's telling me "collect layout(var)(group#result) not recognized", I do not understand why.
      When i use:
      collect preview
      ,
      it's telling me:
      Your layout specification does not uniquely match any items. One or more of the following dimensions might help uniquely match items: collection, colname, gender, group, statcmd.
      I don't understand why is this happening. Can you help me with that? I have no clue what this is happening.

      Here I attached the dataset file I generated after the collapse command so it's easier to understand the dataset. Hope this is helpful.


      Attached Files

      Comment


      • #4
        Your line
        Code:
        collect style autolevels group Total science stem
        should capitalize science and stem
        Code:
        collect style autolevels group Total Science Stem
        Also the line
        Code:
        collect layout(var)(group#result)
        is missing some spaces
        Code:
        collect layout (var) (group#result)

        Comment


        • #5
          Originally posted by Jeff Pitblado (StataCorp) View Post
          Your line
          Code:
          collect style autolevels group Total science stem
          should capitalize science and stem
          Code:
          collect style autolevels group Total Science Stem
          Also the line
          Code:
          collect layout(var)(group#result)
          is missing some spaces
          Code:
          collect layout (var) (group#result)
          Many thanks, it's really helpful and it solved my problem!

          Comment


          • #6
            Dear Jeff Pitblado (StataCorp) thank you for this neat example. However, what should I change in your code if in different collection different variable names are used, e.g. in panel study where I need to stack side be side results from different variables in the same rows? Perhaps, this code line need to be changed: collect layout (var) (group#result)

            Below I post example of table from one of my current research. I need to put responses into the same rows.

            Attached Files
            Last edited by Maciej Koniewski; 31 Mar 2025, 02:57.

            Comment


            • #7
              Maciej, your file is not visible. Please upload it again.

              Comment


              • #8
                Hemanshu Kumar I am sorry, this should be visible both in the post and as an attachment. I am attaching again. Hope it works now.

                Click image for larger version

Name:	Zrzut ekranu 2025-03-31 o 10.50.58.png
Views:	1
Size:	365.2 KB
ID:	1775096

                Comment


                • #9
                  Consider this:

                  Code:
                  clear all
                  webuse nhanes2
                  keep agegrp black
                  * construct an artificial situation where there are different variables for the different races
                  clonevar age_range = agegrp if black
                  replace agegrp = . if black
                  
                  dtable i.age_range if black, name(black)
                  collect addtag group[Black]
                  
                  dtable i.agegrp if !black, name(nonblack)
                  collect addtag group[NonBlack]
                  collect remap agegrp = age_range
                  
                  collect combine comb = black nonblack
                  collect label levels group Black "Black" NonBlack "Non-Black"
                  
                  collect style header result[_dtable_stats], level(hide)
                  collect style header age_range, title(label)
                  collect layout (var[_N] age_range) (group#result)
                  which produces:

                  Code:
                  . collect preview
                  
                  -----------------------------------
                               Black      Non-Black  
                  -----------------------------------
                  N               1,086         9,265
                  Age group                          
                    20–29   286 (26.3%) 2,034 (22.0%)
                    30–39   179 (16.5%) 1,443 (15.6%)
                    40–49   124 (11.4%) 1,148 (12.4%)
                    50–59   140 (12.9%) 1,151 (12.4%)
                    60–69   260 (23.9%) 2,600 (28.1%)
                    70+       97 (8.9%)    889 (9.6%)
                  -----------------------------------
                  Last edited by Hemanshu Kumar; 31 Mar 2025, 04:47.

                  Comment


                  • #10
                    Thank you Hemanshu Kumar. Unfortunately, I can not adapt your code. Attached please find my sample.dta data, Below two tables that I need to join. The problem occured with collect remap command.

                    Code:
                    dtable i.victNoPre firstGrVictPre JudeVictCntrPre i.assctAschwtzPre ///
                    i.nalkowskaPre i.whyRembAusPre, ///
                        name(wave1) replace
                        collect addtag group[wave1]
                    
                    dtable i.victNoPst i.firstGrVictPst i.JudeVictCntrPst i.assctAschwtzPst ///
                    i.nalkowskaPst i.whyRembAusPst if zrodlo=="t1, t2" | zrodlo=="t1, t2, t3", ///
                        name(wave2) replace
                        collect addtag group[wave2]

                    Comment


                    • #11
                      Here is the code that works for me:

                      Code:
                      dtable i.victNoPre i.firstGrVictPre i.JudeVictCntrPre i.assctAschwtzPre ///
                      i.nalkowskaPre i.whyRembAusPre, ///
                          name(wave1) replace
                          collect addtag group[wave1]
                      
                      dtable i.victNoPst i.firstGrVictPst i.JudeVictCntrPst i.assctAschwtzPst ///
                      i.nalkowskaPst i.whyRembAusPst if zrodlo=="t1, t2" | zrodlo=="t1, t2, t3", ///
                          name(wave2) replace
                          collect addtag group[wave2]
                      
                      collect remap victNoPst = victNoPre
                      collect remap firstGrVictPst = firstGrVictPre
                      collect remap JudeVictCntrPst = JudeVictCntrPre
                      collect remap assctAschwtzPst = assctAschwtzPre
                      collect remap nalkowskaPst = nalkowskaPre
                      collect remap whyRembAusPst = whyRembAusPre
                      
                      collect combine comb = wave1 wave2
                      collect label levels group wave1 "Wave 1" wave2 "Wave 2"
                      
                      collect style header result[_dtable_stats], level(hide)
                      collect style header victNoPre firstGrVictPre JudeVictCntrPre assctAschwtzPre nalkowskaPre whyRembAusPre, title(label)
                      collect layout (var[_N] victNoPre firstGrVictPre JudeVictCntrPre assctAschwtzPre nalkowskaPre whyRembAusPre) (group#result)
                      which produces:

                      Code:
                      . collect preview
                      
                      -----------------------------------------------------------------------------------------------------
                                                                                                       Wave 1      Wave 2  
                      -----------------------------------------------------------------------------------------------------
                      N                                                                                     633         505
                      How many people do you think perished in the Auschwitz camp complex?                                 
                        1. < niż 1 mln                                                               83 (13.1%)    5 (1.0%)
                        2. 1 – 1,5 mln                                                              137 (21.6%) 420 (83.2%)
                        3. > niż 1,5 mln – 3 mln                                                      44 (7.0%)   20 (4.0%)
                        4. > niż 3 mln                                                              151 (23.9%)   31 (6.1%)
                        8. Don't know                                                               218 (34.4%)   29 (5.7%)
                      Who, what national or ethnic group constituted the largest number of victims?                        
                        1 Jews                                                                      538 (85.0%) 480 (95.0%)
                        2 Poles                                                                      76 (12.0%)   22 (4.4%)
                        3 Others                                                                       2 (0.3%)    3 (0.6%)
                        9 Don't know/No answer                                                        17 (2.7%)            
                      Jews from what countries?                                                                            
                        1 all Europe, various, all, incl. Poland                                    146 (27.1%) 265 (55.2%)
                        2 Poland and others                                                         185 (34.4%) 155 (32.3%)
                        3 only Poland                                                               112 (20.8%)   24 (5.0%)
                        4 Israel, USA, etc.                                                           10 (1.9%)    1 (0.2%)
                        9 Don't know/No answer                                                       85 (15.8%)   35 (7.3%)
                      What above all do you associate the word 'Auschwitz' with? Is it for you...?                         
                        above all the site of martyrdom of the Polish nation                         69 (10.9%)   26 (5.1%)
                        above all the site of destruction of the Jews                               230 (36.3%) 276 (54.7%)
                        other associations (universalist, Polish-and-Jewish)                        329 (52.0%) 182 (36.0%)
                        difficult to say (option not read out)                                         5 (0.8%)   21 (4.2%)
                      In your opinion, which statement BEST expresses what happened in Auschwitz?                          
                        PEOPLE dealt this fate to PEOPLE                                            534 (84.4%) 408 (80.8%)
                        GERMANS dealt this fate to JEWS                                                7 (1.1%)    8 (1.6%)
                        GERMANS dealt this fate to POLES                                               3 (0.5%)    1 (0.2%)
                        GERMANS dealt this fate to PEOPLE                                            84 (13.3%)  87 (17.2%)
                        difficult to say (option not read out)                                         5 (0.8%)    1 (0.2%)
                      Why ABOVE ALL should Auschwitz be remembered?                                                        
                        To know WHO did it, WHAT they did and to WHOM                                91 (14.4%)   37 (7.3%)
                        That such crimes would happen NEVER AGAIN                                   540 (85.3%) 405 (80.2%)
                        difficult to say (option not read out)                                         2 (0.3%)  63 (12.5%)
                      -----------------------------------------------------------------------------------------------------

                      Comment


                      • #12
                        Thank you Hemanshu Kumar! The code works perfectly fine. One think I don't get however is why remap works in one direction, whilst not in the other one?

                        Code:
                        . collect remap victNoPre = victNoPst 
                        (dimension victNoPre not found)
                        (0 items remapped in collection wave2)
                        
                        . collect remap victNoPst = victNoPre //works fine
                        (10 items remapped in collection wave2)

                        Comment


                        • #13
                          In the code in #11, at the point that the collect remap commands are executed, the active collection is wave2, so you can remap from the *Pst dimensions that exist in wave2, to *Pre. If you want to go the other way, you would first need to activate the other collection, as follows:

                          Code:
                          collect set wave1

                          Comment


                          • #14
                            Thank you Hemanshu Kumar! Your help is much appreciated.

                            Best regards,
                            mk

                            Comment

                            Working...
                            X