Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using -table- and -collect- to collate tables horizontally in Stata 18

    I want to merge some tables horizontally. I'm using Stata 18 and it looks like -table- and -collect- are the preferred way of doing this, but I can't understand the documentation on them deeply enough to know how to do so.

    Here's an example that's analogous to what I want to achieve. I can make the three tables that would go into this, and following hints on this forum I've worked out how to spread the numbers and percentages wider:

    Code:
    webuse nhanes2, clear
    
    generate diab_or_htat = max(diabetes, heartatk)
    * Table of frequencies and percentages of people with diabetes, by health status
    table hlthstat diabetes, statistic(frequency) statistic(percent, across(diabetes))
    * Wide version
    collect layout (var hlthstat) (diabetes#result)
    
    * Table of frequencies and percentages of people without diabetes who have had heart attacks, by health status
    table hlthstat heartatk if diabetes == 0, statistic(frequency) statistic(percent, across(hlthstat))
    * Wide version
    collect layout (var hlthstat) (heartatk#result)
    
    * Table for frequencies and percentages of having diabetes or heart attach, by health status
    table hlthstat diab_or_htat , statistic(frequency) statistic(percent, across(diab_or_htat))
    * Wide version
    collect layout (var hlthstat) (diab_or_htat#result)
    I would like to create a version that merges columns from these horizontally. Ideally, I would just get the subset of columns that I'm interested in, but I would be fairly happy with just merging them all and then deleting the unnecessary columns after I've exported it.

    I would ideally like the columns for:

    * Health status
    * Overall N (5th column of numbers from the first wide table)
    * Diabetics n (3rd column from first wide table)
    * Diabetics % (4th column from first wide table)
    * Heart attack frequency among non-diabetics (3rd column, second wide table)
    * Heart attack % among non-diabetics (4th column, second wide table)
    * Either diabetes or heart attack frequency (3rd column, third wide table)
    * Either diabetes or heart attack % (4th column, third wide table)

    Thanks in advance.

  • #2
    Thanks for the working example.

    First you need to add a unique name() option to each table call to prevent it from replacing the previous collection. Then use collect combine to combine the collections. Then you need to update the layout in the combined collection.

    BTW, you do not need the var dimension in your custom layouts.

    Here is my modified copy of the above example.
    Code:
    webuse nhanes2, clear
    
    generate diab_or_htat = max(diabetes, heartatk)
    * Table of frequencies and percentages of people with diabetes, by health status
    table hlthstat diabetes, ///
        name(t1) ///
        statistic(frequency) ///
        statistic(percent, across(diabetes))
    * Wide version
    collect layout (hlthstat) (diabetes#result)
    
    * Table of frequencies and percentages of people without diabetes who have had heart attacks, by health status
    table hlthstat heartatk if diabetes == 0, ///
        name(t2) ///
        statistic(frequency) ///
        statistic(percent, across(hlthstat))
    * Wide version
    collect layout (hlthstat) (heartatk#result)
    
    * Table for frequencies and percentages of having diabetes or heart attach, by health status
    table hlthstat diab_or_htat , ///
        name(t3) ///
        statistic(frequency) ///
        statistic(percent, across(diab_or_htat))
    * Wide version
    collect layout (hlthstat) (diab_or_htat#result)
    
    * define new collection 'all' created by combining the above
    collect combine all = t1 t2 t3
    
    * add something to make the percent values stand out
    collect style cell result[percent], sformat("%s%%")
    * hide the statistic names
    collect style header result, level(hide)
    
    * remove the extra space between columns, the table is getting too wide
    collect style column, extraspace(0)
    
    * arrange the combined collections into a table
    collect layout (hlthstat) ((diabetes heartatk diab_or_htat)#result)
    Here is the resulting table.
    Code:
    ------------------------------------------------------------------------------------------------------------------------------------------------------
                           |             Diabetes status                           Prior heart attack                            diab_or_htat             
                           |  Not diabetic  Diabetic       Total     No heart attack Had heart attack     Total           0           1          Total    
    -----------------------+------------------------------------------------------------------------------------------------------------------------------
    Health status          |                                                                                                                              
      Excellent            | 2,383  99.00%  24  1.00%  2,407 100.00%  2,366   25.07%     17     4.13% 2,383  24.19% 2,366 98.30%  41  1.70%  2,407 100.00%
      Very good            | 2,546  98.26%  45  1.74%  2,591 100.00%  2,513   26.63%     33     8.01% 2,546  25.85% 2,513 96.99%  78  3.01%  2,591 100.00%
      Good                 | 2,805  95.47% 133  4.53%  2,938 100.00%  2,706   28.67%     99    24.03% 2,805  28.48% 2,706 92.10% 232  7.90%  2,938 100.00%
      Fair                 | 1,508  90.30% 162  9.70%  1,670 100.00%  1,367   14.48%    141    34.22% 1,508  15.31% 1,367 81.86% 303 18.14%  1,670 100.00%
      Poor                 |   594  81.48% 135 18.52%    729 100.00%    475    5.03%    119    28.88%   594   6.03%   475 65.16% 254 34.84%    729 100.00%
      Blank but applicable |    14 100.00%                14 100.00%     11    0.12%      3     0.73%    14   0.14%    11 78.57%   3 21.43%     14 100.00%
      Total                | 9,850  95.18% 499  4.82% 10,349 100.00%  9,438  100.00%    412   100.00% 9,850 100.00% 9,438 91.20% 911  8.80% 10,349 100.00%
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    If you only want the totals across health status for each of the other table variables, you need to specify them in the totals() option of each table call.
    Code:
    webuse nhanes2, clear
    
    generate diab_or_htat = max(diabetes, heartatk)
    * Table of frequencies and percentages of people with diabetes, by health status
    table hlthstat diabetes, ///
        totals(diabetes) ///
        name(t1) ///
        statistic(frequency) ///
        statistic(percent, across(diabetes))
    * Wide version
    collect layout (hlthstat) (diabetes#result)
    
    * Table of frequencies and percentages of people without diabetes who have had heart attacks, by health status
    table hlthstat heartatk if diabetes == 0, ///
        totals(heartatk) ///
        name(t2) ///
        statistic(frequency) ///
        statistic(percent, across(hlthstat))
    * Wide version
    collect layout (hlthstat) (heartatk#result)
    
    * Table for frequencies and percentages of having diabetes or heart attach, by health status
    table hlthstat diab_or_htat , ///
        totals(diab_or_htat) ///
        name(t3) ///
        statistic(frequency) ///
        statistic(percent, across(diab_or_htat))
    * Wide version
    collect layout (hlthstat) (diab_or_htat#result)
    
    * define new collection 'all' created by combining
    collect combine all = t1 t2 t3
    
    * add something to make the percent values stand out
    collect style cell result[percent], sformat("%s%%")
    * hide the statistic names
    collect style header result, level(hide)
    
    * remove the extra space between columns, the table is getting too wide
    collect style column, extraspace(0)
    
    * arrange the combined collections into a table
    collect layout (hlthstat) ((diabetes heartatk diab_or_htat)#result)
    Here is the resulting table.
    Code:
    ----------------------------------------------------------------------------------------------------------
                           |      Diabetes status            Prior heart attack              diab_or_htat     
                           |  Not diabetic  Diabetic  No heart attack Had heart attack       0           1    
    -----------------------+----------------------------------------------------------------------------------
    Health status          |                                                                                  
      Excellent            | 2,383  99.00%  24  1.00%  2,366   25.07%     17     4.13% 2,366 98.30%  41  1.70%
      Very good            | 2,546  98.26%  45  1.74%  2,513   26.63%     33     8.01% 2,513 96.99%  78  3.01%
      Good                 | 2,805  95.47% 133  4.53%  2,706   28.67%     99    24.03% 2,706 92.10% 232  7.90%
      Fair                 | 1,508  90.30% 162  9.70%  1,367   14.48%    141    34.22% 1,367 81.86% 303 18.14%
      Poor                 |   594  81.48% 135 18.52%    475    5.03%    119    28.88%   475 65.16% 254 34.84%
      Blank but applicable |    14 100.00%                11    0.12%      3     0.73%    11 78.57%   3 21.43%
      Total                | 9,850  95.18% 499  4.82%  9,438  100.00%    412   100.00% 9,438 91.20% 911  8.80%
    ----------------------------------------------------------------------------------------------------------

    Comment


    • #3
      Many thanks for that, that got me started a treat.

      For future people finding this thread, I was able to work from this to also select the columns that I wanted. The trick was to open the tables builder after running the code. The tables builder had the most recent table as its starting point, and from there I started by using the point-and-click interface to deselect and shuffle around some columns, and from that I could see the code the tables builder was creating, and hand-tweaked that until I was happy.

      Here's the version I ended up with:

      Code:
      clear all
      webuse nhanes2, clear
      
      generate diab_or_htat = max(diabetes, heartatk)
      
      table hlthstat, name(t0)
      
      * Table of frequencies and percentages of people with diabetes, by health status
      table hlthstat diabetes, ///
          totals(hlthstat diabetes) ///
          name(t1) ///
          statistic(frequency) ///
          statistic(percent, across(diabetes))
      
      * Table of frequencies and percentages of people without diabetes who have had heart attacks, by health status
      table hlthstat heartatk if diabetes == 0, ///
          totals(heartatk) ///
          name(t2) ///
          statistic(frequency) ///
          statistic(percent, across(heartatk))
      
      * Table for frequencies and percentages of having diabetes or heart attack, by health status
      table hlthstat diab_or_htat , ///
          totals(diab_or_htat) ///
          name(t3) ///
          statistic(frequency) ///
          statistic(percent, across(diab_or_htat))
      
      * define new collection 'all' created by combining
      collect combine all = t0 t1 t2 t3
      
      * add something to make the percent values stand out
      collect style cell result[percent], sformat("%s%%")
      
      collect label dim heartatk "Prior heart attack given non-diabetic", modify
      collect label dim diab_or_htat "Diabetes or heart attack", modify
      
      collect layout (hlthstat) ///
                     (result[frequency] ///
                       diabetes[.m]#result[frequency] ///
                      diabetes[1]#result ///
                      heartatk[1]#result ///
                      diab_or_htat[1]#result), name(all)
      And the output:

      Code:
      
      --------------------------------------------------------------------------------------------------------------------------------------------
                             |  Frequency           Diabetes status            Prior heart attack given non-diabetic     Diabetes or heart attack 
                             |                  Total         Diabetic                    Had heart attack                           1            
                             |              Frequency   Frequency   Percent             Frequency             Percent       Frequency      Percent
      -----------------------+--------------------------------------------------------------------------------------------------------------------
      Health status          |                                                                                                                    
        Excellent            |      2,407       2,407          24     1.00%                    17               0.71%              41        1.70%
        Very good            |      2,591       2,591          45     1.74%                    33               1.30%              78        3.01%
        Good                 |      2,938       2,938         133     4.53%                    99               3.53%             232        7.90%
        Fair                 |      1,670       1,670         162     9.70%                   141               9.35%             303       18.14%
        Poor                 |        729         729         135    18.52%                   119              20.03%             254       34.84%
        Blank but applicable |         14          14                                           3              21.43%               3       21.43%
        Total                |     10,349                     499     4.82%                   412               4.18%             911        8.80%
      --------------------------------------------------------------------------------------------------------------------------------------------
      I had to make an extra table of just the frequencies of hlthstat to get that first column of numbers working, and I'm not really clear on how -collect- knows to pick up the [frequency] column from t0 rather than any of the other columns of frequencies, but it works.

      Thanks again.

      Comment

      Working...
      X