Creating descriptive statistics table

Abhinaya Kanakasabai

Join Date: Dec 2022

Posts: 50
#1

Creating descriptive statistics table

07 Aug 2023, 18:32

Hi!
I want to create a table for 3 groups (2 dummy variables - highly fractionalised societies and less fractionalised societies and one reference category - neither of the 2) showing the descriptive statistics for about 30 variables, say A1,A2,...A30
I want the table to look something like:

I thought of running:

sum A1 A2 A3 ... A30 if high_frac == 1
return list
est sto tab1_col1

sum A1 A2 A3 ... A30 if low_frac == 1
return list
est sto tab1_col2

sum A1 A2 A3 ... A30 if high_frac == 0 & low_frac == 0
return list
est sto tab1_col3

esttab tab1_col* using Table1.rtf, replace label N(0) b(3) se(3)

But the problem is that return list returns the mean and standard deviation values only for the last 5 variables and not the first 25. In addition:
1. I want to rename rows A1,A2. etc. to some other name
2. I want to include the titles - Highly fractionalised, Low fractionalised and Reference societies in place of columns 1,2 & 3 respectively and include 3 sub-columns - Obs, Mean and Standard deviation under each of the columns.

Could you please help?

Thanks in advance for your time!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

07 Aug 2023, 18:47

If you are running version 18, read -help dtable-. The -dtable- command does precisely what you want. If you are running any earlier version, you are supposed to state in your post what version you are running.
Comment
Abhinaya Kanakasabai

Join Date: Dec 2022

Posts: 50
#3

07 Aug 2023, 19:03

Thank you for your reply, Clyde! Unfortunately, dtaversion says format 114 from Stata 10 or 11. How do I proceed?

Last edited by Abhinaya Kanakasabai; 07 Aug 2023, 19:29.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

07 Aug 2023, 21:06

No, not dtaversion. That's the version of the data set you are working with. What version of Stata are you using? Just run

Code:

version

to get that.
Comment
Abhinaya Kanakasabai

Join Date: Dec 2022

Posts: 50
#5

08 Aug 2023, 06:28

oh, it is 17.0
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#6

08 Aug 2023, 09:09

So you can use the -table- command for this. To simplify the code, I will assume that variables A1 through A30 are located consecutively in your data set. If that isn't true, use the -order- command to make it so first. I will assume that your three-category variable that heads the columns is called fractionalization

Code:

table (var) (fractionalization), statistic(count A1-A30) /// statistic(mean A1-A30) statistic(sd A1-A30) /// sformat("(%s)" sd) style(Table-1) nototals

You may want to impose display formats on the means and standard deviations to regularize the number of decimal places shown. -table-'s -nformat()- options allow you to do that. For example, if you want the means to be given to 2 decimal places and the standard deviations to 1 decimal place, you would add -nformat(%3.2f mean) nformat(%2.1f sd)- to the command.
Comment
Abhinaya Kanakasabai

Join Date: Dec 2022

Posts: 50
#7

08 Aug 2023, 11:43

Thank you so much, Clyde! Your codes worked. I got the following output:

I, however, want to:
1. Clearly mention under each of the columns that the first column is the "Reference category", the second column is "Less Fractionalised" and the third is "Highly Fractionalised" and within each of the rows, the first entry is "observations", the second is the "mean" and the third is the "standard deviation".
2. Include a third column which says Less Fractionalised - Highly Fractionalised giving the difference in mean between the two and the corresponding pvalue, just to ensure that the differences are statistically significant
3. I want to export it to word, so that I can include it in my writing

Could you please clarify how to perform the above?

Thanks in advance for your time!

Last edited by Abhinaya Kanakasabai; 08 Aug 2023, 11:57.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#8

08 Aug 2023, 12:02

1. Clearly mention under each of the columns that the first column is the "Reference category", the second column is "Less Fractionalised" and the third is "Highly Fractionalised" and within each of them, the first entry is "observations", the second is the "mean" and the third is the "standard deviation".

You have to apply a value label to your fractionalization variable. Once you do that, the -table- command will use it, rather than the numeric value, in the column header. In order to have the various rows within the cells labeled observations, mean, and standard deviation, remove the -style(Table-1)- option from the command.

2. Include a third column which says Less Fractionalised - Highly Fractionalised giving the difference in mean between the two and the corresponding pvalue, just to ensure that the differences are statistically significant

There is a way to do this with the -collect- command, but I don't know how. The -table-/-collelct- series of commands is very large, powerful, and complicated. I have learned what I consider the basics of it, but for my workflow, learning the bells and whistles is low priority (especially those that are done more easily in the new -dtable- and -etable- commands in version 18) and I haven't mastered it yet. If you were running Stata version 18, the -dtable- command has a very simple way of doing everything you are asking for. But in Stata 17, there is no -dtable-. I'm sorry I can't help with this one.

Jeff Pitblado (StataCorp) often follows threads about the use of -table- and its related commands, and perhaps he will jump in.

3. I want to export it to word, so that I can include it in my writing

-help collect export-
Comment
Abhinaya Kanakasabai

Join Date: Dec 2022

Posts: 50
#9

08 Aug 2023, 12:21

I did label them, still I'm getting those 0s and 1s which I want to get rid of. Removing style(Table-1) worked, thank you!
Sure, I will wait for Jeff's reply. Thank you so much for your help, Clyde!
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29956

#10

08 Aug 2023, 12:41

I did label them, still I'm getting those 0s and 1s which I want to get rid of.

I don't understand why that would be. It isn't enough to define a label; you also need to apply it to the variable. Perhaps you skipped that step?

I fired up my version 17 Stata, which is still on my computer although I no longer generally use it, and tried it with the auto.dta:

Code:

. clear*

.
. sysuse auto
(1978 automobile data)

.
. des foreign

Variable      Storage   Display    Value
    name         type    format    label      Variable label
----------------------------------------------------------------------------------------------------------------------------------------------
foreign         byte    %8.0g      origin     Car origin

. label list origin
origin:
           0 Domestic
           1 Foreign

. tab foreign

 Car origin |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         52       70.27       70.27
    Foreign |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00

.
. table (var) (foreign), statistic(count mpg headroom trunk weight) ///
>     statistic(mean mpg headroom trunk weight) ///
>     statistic(sd mpg headroom trunk weight) ///
>     sformat("(%s)" sd) nformat(%3.2f mean) nformat(%2.1f sd) nototals

---------------------------------------------------
                              |      Car origin    
                              |  Domestic   Foreign
------------------------------+--------------------
Mileage (mpg)                 |                    
  Number of nonmissing values |        52        22
  Mean                        |     19.83     24.77
  Standard deviation          |     (4.7)     (6.6)
Headroom (in.)                |                    
  Number of nonmissing values |        52        22
  Mean                        |      3.15      2.61
  Standard deviation          |     (0.9)     (0.5)
Trunk space (cu. ft.)         |                    
  Number of nonmissing values |        52        22
  Mean                        |     14.75     11.41
  Standard deviation          |     (4.3)     (3.2)
Weight (lbs.)                 |                    
  Number of nonmissing values |        52        22
  Mean                        |   3317.12   2315.91
  Standard deviation          |   (695.4)   (433.0)
---------------------------------------------------

.
end of do-file

As you can see, the column headers are based on the value labels, not the 0's and 1's. If you post an example from your data set, using the -dataex- command, I can try to troubleshoot this for you.

In version 17, -dataex- is already part of your official Stata installation. Run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Comment

Abhinaya Kanakasabai

Join Date: Dec 2022

Posts: 50
#11

08 Aug 2023, 13:02

I got the following after running dataex:

clear
input double(govspend_gdp invest_gdp trade_gdp) float(high_frac_dem low_frac_dem)
25.55409622192383 24.12430000305176 108.4134902954102 1 0
11.80164241790771 15.62400913238525 58.46826171875 1 0
9.831344604492188 28.37709808349609 79.83898162841797 0 1
13.80305480957031 20.04794311523438 76.69332122802734 0 1
15.01333332061768 18.64767837524414 184.9795074462891 1 0
19.86452102661133 19.0944881439209 66.17646789550781 1 0
15.8863353729248 22.78514099121094 122.4426879882812 1 0
9.600648880004883 20.30059623718262 65.95964813232422 1 0
14.21887111663818 23.60651206970215 52.28116226196289 1 0
21.49805068969727 24.6467170715332 76.62771606445313 0 1
14.91329860687256 19.38080787658691 132.7288360595703 0 1
18.17393684387207 19.67852401733398 106.3207855224609 0 1
. . 56.25030517578125 1 0
15.92795944213867 18.68181991577148 36.51618194580078 1 0
22.10501861572266 25.54928779602051 45.7457389831543 1 0
16.57808685302734 27.29662704467773 56.71168899536133 0 1
15.83838653564453 31.0993709564209 67.17784118652344 1 0
17.50044059753418 15.71498489379883 117.8158645629883 1 0
. . . 0 1
37.24919509887695 28.58604049682617 99.57510375976563 1 0
16.72316551208496 26.88591003417969 29.07459259033203 0 1
10.15808296203613 20.49099922180176 43.93564987182617 0 1
15.71298503875732 31.28337860107422 72.59205627441406 0 1
11.91844081878662 28.49498748779297 26.57992553710938 0 1
14.81019496917725 22.4546947479248 76.04920196533203 0 1
19.03897285461426 23.56102561950684 42.38838195800781 1 0
16.58379554748535 . 60.83727645874023 0 1
17.67097091674805 19.05544471740723 113.0633087158203 1 0
18.11073303222656 22.03178405761719 46.8921012878418 0 1
18.07930183410645 17.92931747436523 71.06932067871094 1 0
18.21673202514648 24.28147315979004 36.47696685791016 0 1
22.50293159484863 21.22961235046387 52.41551208496094 1 0
21.67507171630859 25.10131072998047 86.5118637084961 0 1
19.88316345214844 29.23242378234863 86.45854187011719 0 1
19.15755271911621 18.46658515930176 51.0780029296875 1 0
7.792625904083252 24.58500862121582 96.38402557373047 1 0
16.92841529846191 22.12620544433594 33.70084762573242 1 0
21.08173370361328 19.16864776611328 48.20054244995117 0 1
16.95782279968262 25.84360504150391 35.5113410949707 1 0
15.16762351989746 28.36582565307617 128.0421600341797 1 0
30.34116744995117 33.01920700073242 100.298225402832 1 0
22.9110164642334 29.83217430114746 75.89996337890625 1 0
21.60325050354004 23.44094848632812 45.11142730712891 1 0
18.10329246520996 19.16665840148926 43.61095428466797 1 0
24.7999324798584 21.19116020202637 68.99420166015625 1 0
14.82330894470215 30.87438774108887 48.47172546386719 0 1
12.87077713012695 15.21744251251221 50.57700729370117 1 0
9.013267517089844 16.40209770202637 64.12522888183594 1 0
18.69083404541016 28.2880744934082 79.48403930664063 0 1
8.369962692260742 20.82414054870605 66.81818389892578 1 0
18.88178825378418 17.99149513244629 48.89649963378906 1 0
18.32614707946777 24.1769905090332 46.0340690612793 0 1
10.0569543838501 19.65917015075684 69.54454803466797 1 0
5.677461624145508 17.39551544189453 44.42374038696289 1 0
4.997301578521729 13.68454170227051 41.55195236206055 1 0
12.84117412567139 26.65011596679688 86.44727325439453 1 0
13.17241954803467 22.74421119689941 114.9697418212891 0 1
13.34951972961426 17.93781280517578 19.7715015411377 0 1
14.33627986907959 . 73.80078125 0 1
5.50995922088623 15.05510902404785 44.68943023681641 1 0
13.1512336730957 23.04644966125488 64.40406799316406 0 1
7.529603004455566 21.96926879882812 77.92395782470703 1 0
13.45147037506104 41.13710403442383 151.2681579589844 1 0
16.2135124206543 31.85720634460449 211.3375244140625 1 0
23.25064277648926 26.99568176269531 92.58838653564453 1 0
14.55986881256104 21.89383888244629 186.2164154052734 1 0
. . . 0 1
10.9734354019165 51.51001739501953 95.01014709472656 1 0
11.68328857421875 35.88518905639648 112.9299850463867 0 1
17.09695053100586 29.14738464355469 42.11824035644531 0 1
20.05804061889648 24.38398551940918 120.5694122314453 1 0
21.47411918640137 25.38127708435059 52.11469650268555 1 0
16.08534622192383 . 35.57546234130859 1 0
12.41925144195557 22.71203994750977 30.47601127624512 0 1
. . 84.2879638671875 1 0
20.89997863769531 25.66619110107422 43.23791885375977 1 0
9.838107109069824 10.70983791351318 13.5740966796875 0 1
18.6653003692627 24.62007522583008 100.8188323974609 0 1
20.12804412841797 21.3500862121582 75.89762878417969 0 1
10.31651878356934 26.75041389465332 41.33795166015625 0 1
. . 118.8402328491211 1 0
10.70486450195312 13.31805038452148 68.76876068115234 0 1
13.32726669311523 16.23722076416016 49.85629272460938 1 0
. . . 1 0
10.01796722412109 29.62257766723633 87.22877502441406 0 1
15.32959365844727 12.51051902770996 148.7938232421875 1 0
19.68399810791016 25.99969863891602 62.16105270385742 0 1
21.27861213684082 22.59206581115723 138.6947479248047 1 0
20.93888092041016 41.21368789672852 125.5511856079102 1 0
14.33506870269775 19.5234260559082 47.96892166137695 0 1
14.2203483581543 17.41119766235352 48.78988647460938 0 1
28.78374481201172 22.03324508666992 58.37956619262695 1 0
20.02074432373047 27.39153480529785 71.16400146484375 0 1
25.07728004455566 20.77703857421875 92.28309631347656 1 0
11.533203125 22.93692779541016 97.28714752197266 1 0
17.48648834228516 19.81326293945312 45.82498550415039 0 1
8.703133583068848 19.06077003479004 29.8716926574707 1 0
8.847712516784668 40.9283447265625 43.06879425048828 1 0
11.43062591552734 32.78217315673828 50.75485229492188 0 1
8.538812637329102 26.58150482177734 62.66184234619141 0 1
end
[/CODE]

Sorry for being ignorant, were you expecting something like this?
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29956

#12

08 Aug 2023, 13:40

Well, the data are not quite as I expected. The problem is that you have two separate 0/1 variables, high_frac_dem and low_frac_dem. And there is nothing in the example data that defines a reference group. Leaving aside for now the question of the reference group, for nearly all purposes in Stata having two indicators like high_frac_dem and low_frac_dem to designate mutually exclusive categories of a single construct is, at best, inefficient. And I can't think of any situation where it is necessary. Better is to have a single variable that codes one of the groups as 0 and the other as 1. And put a value label on it.

When you do that:

Code:

. label define fractionality      0       "Less Fractionalised"   ///
>         1       "Highly Fractionalised"

. assert high_frac_dem == !low_frac_dem

. gen byte fractionality:fractionality = high_frac_dem

. label var fractionality "Degree of Fractionalization"

. tab fractionality

            Degree of |
    Fractionalization |      Freq.     Percent        Cum.
----------------------+-----------------------------------
  Less Fractionalised |         42       42.00       42.00
Highly Fractionalised |         58       58.00      100.00
----------------------+-----------------------------------
                Total |        100      100.00

.
. table (var) (fractionality), statistic(count *_gdp) statistic(mean *_gdp) ///
>         statistic(sd *_gdp) sformat("(%s)" sd) nformat(%3.2f mean sd) nototals

----------------------------------------------------------------------------
                              |          Degree of Fractionalization        
                              |  Less Fractionalised   Highly Fractionalised
------------------------------+---------------------------------------------
govspend_gdp                  |                                             
  Number of nonmissing values |                   40                      54
  Mean                        |                15.30                   16.63
  Standard deviation          |               (3.72)                  (6.60)
invest_gdp                    |                                             
  Number of nonmissing values |                   38                      53
  Mean                        |                24.20                   23.46
  Standard deviation          |               (5.27)                  (7.54)
trade_gdp                     |                                             
  Number of nonmissing values |                   40                      57
  Mean                        |                63.80                   80.86
  Standard deviation          |              (26.95)                 (41.92)
----------------------------------------------------------------------------

As for the reference group, the best way to handle that (the only way that will work the way you want it to with -table- as far as I know) would be to set fractionality = 2 for those observations. (And don't forget to add 2 "Reference Group" to the value label.)

Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 686

#13

08 Aug 2023, 14:19

Setting the reference group aside, the only thing I can add is some code
for putting the mean comparisons in the table, and changing the layout
to align with the original hand-written illustration.

Here is some code I added to Clyde's example.

Code:

* loop over the variable names in the 'var' dimension, and use -ttest- to
* compute compare the means -- collect the results and tag them so they
* match (align) with the results already in the Table collection.
collect levelsof var
local vlist = s(levels)
foreach var of local vlist {
    ttest `var' , by(fractionality)
    collect get diff=(r(mu_2)-r(mu_1)) p=(r(p)), ///
        tags(fractionality[compare] var[`var'])
}

* add and modify some result labels
collect label levels result ///
    diff "Highly-Less" ///
    p "p-value" ///
    count "Obs" ///
    sd "(Std)" ///
    , modify
* label the new level
collect label levels fractionality compare "Compare"

* fix the order of levels
collect style autolevels fractionality 1 0 compare, clear

* other cosmetic formatting
collect style cell result[diff], nformat(%3.2f)
collect style cell result[p], nformat(%5.4f)

* put the results in the columns
collect layout (var) (fractionality#result)

Here is the resulting table.

Code:

. collect layout (var) (fractionality#result)

Collection: Table
      Rows: var
   Columns: fractionality#result
   Table 1: 3 x 8

-------------------------------------------------------------------------------------------
             |                          Degree of Fractionalization
             |    Highly Fractionalised       Less Fractionalised            Compare
             |    Obs     Mean      (Std)    Obs     Mean     (Std)   Highly-Less   p-value
-------------+-----------------------------------------------------------------------------
govspend_gdp |     54    16.63     (6.60)     40    15.30    (3.72)          1.33    0.2552
invest_gdp   |     53    23.46     (7.54)     38    24.20    (5.27)         -0.73    0.6068
trade_gdp    |     57    80.86    (41.92)     40    63.80   (26.95)         17.06    0.0258
-------------------------------------------------------------------------------------------

Comment

Abhinaya Kanakasabai

Join Date: Dec 2022

Posts: 50
#14

08 Aug 2023, 14:29

These 2 dummy variables have been created based on another dummy variable and one continuous variable. For example, high_frac = 1 if dem = 1 & HIEF>threshold and low_frac = 1 if dem = 1 & HIEF <threshold. So, the reference category here would be all those with dem = 0.

I ran the following:

label define fractionality 0 "Less Fractionalised Democracy" 1 "Highly Fractionalised Democracy" 2 "Autocracy"
assert high_frac_dem == !low_frac_dem
gen byte fractionality:fractionality = high_frac_dem
label var fractionality "Degree of Fractionalization"
tab fractionality
table (var) (fractionality), statistic(count *_gdp) statistic(mean *_gdp) statistic(sd *_gdp) sformat("(%s)" sd) nformat(%3.2f mean sd) nototals

The assertion was false. I'm not sure where to add autocracy (the reference category) to the above lines of code to get it as the last column
Comment
Abhinaya Kanakasabai

Join Date: Dec 2022

Posts: 50
#15

08 Aug 2023, 14:56

hi, Jeff. thank you for the codes. but I don't get what is stored in local vlist when I run:
local vlist = s(levels) Could you please clarify? Thank you!
Comment

Announcement