Merging tables horizontally(multiple columns) as .txt files in stata, using tabout. In order to copy-paste to excel.

Mariia Chavez

Join Date: Oct 2023

Posts: 28
#1

Merging tables horizontally(multiple columns) as .txt files in stata, using tabout. In order to copy-paste to excel.

20 Oct 2023, 05:36

Hello

I have been trying to figure out how to merge multiple tables into one, by adding columns to an existing table in a .txt file using tabout in stata.
I am currently using the .txt files to make tables using tabs, so that I can copy and paste them into excel. I prefer doing it this way rather than export the tables directly into excel. That said, seeing as it's ultimately going into an excel file, it is not a problem to include more columns than would be ideal for a document.

These are the syntaxes I am currently using to make the tables. I have one for my dummy variables and one for my categorical variables.

Dummy:
foreach var of varlist city cityage citygnd citywealth {
tabout one `var' using dummytable_`var'.txt, replace c(mean var_d) sum svy f(2) h1(var_d) h3(nil) npos(row) ptotal(none)
}

Categorical:
foreach var of varlist city cityage citygnd citywealth {
tabout var_c `var' using categoricaltable_`var'.txt, ///
replace c(col) svy f(2) h1(var_c) h3(nil) npos(row) ptotal(none)
}

"var_d" represent a dummy variable and "var_c" represent a categorical variable.

The four variables city, cityage, citygnd and citywealth contain a numerical measure of sample size categorized according to city only, or city and something else.

The variable city holds the sample size of the survey dataset grouped only according to the three cities (lets call them London, Paris and Boston for the sake of simplicity). Thus, the variable city has only three value labels: London, Paris and Boston.
The remaining three variables are categorized according to cities and another relevant grouping.
For instance, the variable cityage holds the sample size grouped according to the three cities, as well as two age groups. Thus, the variable cityage has the value labels: London Age1, London Age2, Paris Age1, Paris Age2, Boston Age1 and Boston Age2.

Only counting the columns containing data, the table I want has 21 columns in total (all the sub groupings/value labels for city cityage citygnd citywealth, summed). As the rows it has the var_d in the dummytable_`var' and var_c in the categoricaltable_`var'.
(If possible, it would be ideal to merge the categorical and dummy tables as well, so that I can have multiple rows with both categorical and dummy variables in the same table, and multiple columns representing the variables city cityage citygnd and citywealth. But this is less critical as having the rows placed further down will be easier to fix later in excel, as long as the columns are placed correctly.)

In the "Tips and Tricks" section of the tabout manual called "Publication quality tables in Stata: a tutorial for the tabout program" (not the manual for Tabout version 3). They describe how to merge several tables into one in a way that seems similar to what I want to achieve. However, these are based on an intent to make a three way table, which is not what I am trying to achieve. Also, the tables in this description are merged by adding rows, instead of columns.

I did find an explanation from a previous post in statalist (Tabout 3 summary statistics transposing columns to rows - Statalist) for how to turn rows into multiple columns, but from what I understand this description is meant for one variable with multiple value labels, and not multiple variables with each their own set of value labels.
(I was however able to use this description to make the dummytable_`var' horizontal instead of vertical, so that was helpful )

Any help would be highly appreciated!

Best,
Maria Chavez
Tags: None
Mariia Chavez

Join Date: Oct 2023

Posts: 28
#2

25 Oct 2023, 03:29

Hello again

I continued working with making these tables while temporarily accepting that I have to make separate tables for each of the city variables, until I hear back from you.
In the meantime, I have come upon another problem that I was hoping you could help me with.
For the categorical table, I am trying to make one table for each of the city variables, that have all the categorical variables as rows. These all have in common that they have "_c" in the name of the variable.
This could be easily done by just writing out all the categorical variables in the tabout formula, and finish with the city variable in the tabout-syntax (before "using....).
That said, as there are many categorical variables so I'd prefer to add them using a varlist or *_c* in the syntax, instead of having to write them all down. However, I haven't been able to figure out how to use a varlist in the variable-section of the syntax, and when I write *_c* tabout understands that I want to make one table for each of the categorical variables, rather than include them as rows.

foreach var of varlist city cityage citygnd citywealth {
tabout *_c* `var' using categoricaltable_`var'.txt, ///
replace c(col) svy f(2) h1(categorical) h3(nil) npos(row) ptotal(none)
}

Do you know how to do this?
I would be so grateful for any help you may give me!

Best,
Maria Chavez
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10089
#3

25 Oct 2023, 06:24

There are not many users of tabout (the Stata Journal) that I see on the forum. Instead of fixating on a single approach to solving your problem, my advice is to create a small data example and display your intended output.. This will increase your chances of receiving helpful replies, but please don't feel obligated to do so.
Comment
Mariia Chavez

Join Date: Oct 2023

Posts: 28
#4

02 Nov 2023, 07:35

Thank you so much for your response, Andrew

Attached to this comment is a picture of an attempt to display what I am trying to achieve using the nlsw88 example dataset:

I have used the variables married, union and industry as substitute for my four city variables. As you can see in the picture, I have these variables presented as multiple columns, with one column designated to each of the subcategories of the variable. So, for instance, the variable married has two columns, one for married and one single. Meanwhile, industry has twelve columns, one for each of the value labels stored for the variable industry. In this table there are in total 19 columns with data, one for each of the subcategories pertaining to the variables married, union and industry. This is what I want for my table except with my four city variables, that in total have 21 subcategories pertaining to these four variables.

This is what I am struggling the most to achieve. To have a set list of variables (my four city variables) appear as columns, and that I can add whatever other categorical or dummy variables as rows.
In this table I added occupation, collgrad, south and c_city as rows. The categorical variable is presented with as many rows as it has subcategories. So for occupation, this is thirteen. Meanwhile, the dummies are presented only with one row. Due to the nature of a dummy variable, and that there are only two categories, I do not need the second row, as the proportion of the second subcategory will be easily deducible based on the proportions of the proportions of the first.

There are a few ways in which the nlsw88 differs from my dataset. My dataset is based on survey data. This is why my syntaxes presented above have the svy function in them. Adding the svy functions also turns the content in the cells from percentages to proportions. This is what I want (to have the cell content be proportions and not percentages), but I didn't know how to make it so when it was not survey data which is why the table below shows percentages.

The reason I have been intent on using tabout to export these tables to .txt files, rather than exporting it straight into excel with tabout, is that I have a Windows computer, and with a Windows computer I can preview the .txt-file and thus modify the syntax while keeping track of the output, instead of having to close the excel file in order to run a new command that changes it. In order to make it easy to later copy and paste it over to an excel, it is also important that the table is made with indents. If there is another way to do this, then please let me know!

This is the syntax I used to make the table below happen:
foreach var1 of varlist married union industry race {
local firstvar2 1
foreach var2 of varlist occupation collgrad south c_city {
if `firstvar2' == 1 {
local filemethod = "replace"
local firstvar2 0
}
else {
local filemethod = "append"
}

// Create a temporary dataset excluding missing data
preserve
keep if !missing(`var2')
// Calculate survey-weighted means for the current variable
tabout `var2' `var1' using statalist-`var1'.txt, `filemethod' c(col) f(2) h3(nil) npos(row) ptotal(none)
restore
}
}

This created four files with four tables in each files. Then, I copy pasted the content of the file over to an excel sheet, and assembled the separate tables into one, in order for it to display what I am hoping to learn how to make with minimal edits in the excel file. I also deleted the total-column for each of the variables, and the additional subcategory row for the horizontally displayed dummy variables.

Let me know if there are any remaining uncertainties and I would be happy to respond.

Best,
Maria
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10089

02 Nov 2023, 10:02

Using estout from SSC, here is how I'd create the table that you display.

Code:

ssc install estout, clear

Code:

webuse nlsw88, clear
estimates drop _all
local i 0
local vars "occupation collgrad south c_city"
foreach var1 of local vars{
    foreach var2 of varlist married union industry race{
        local ++i
        eststo m`i': estpost tab `var1' `var2'
    }
    local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
    esttab m* using myfile.csv, cells(colpct(fmt(2))) unstack drop(Total: Total) `opt'
    estimates drop _all
}

You could save in .txt extension in case you want the output in a text file.

Res.:

Click image for larger version

Name: Capture.PNG
Views: 1
Size: 87.6 KB
ID: 1732449

Last edited by Andrew Musau; 02 Nov 2023, 10:05.

Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 686

02 Nov 2023, 13:17

If you have Stata 17 or newer, you can use the new collect features and updated table command.
Here is my attempt to reconstruct the nlsw88 data example.

Code:

clear all

webuse nlsw88

local rowvars occupation collgrad south c_city
local colvars married union industry race

local cmdset 0
foreach r of local rowvars {
    sum `r', meanonly
    if r(min) == 0 & r(max) == 1 {
        local rspec 1.`r'
        collect style header `r', title(hide)
    }
    else {
        local rspec `r'
    }
    foreach c of local colvars {
        local ++cmdset
        table (var) (`c'), ///
            statistic(fvpercent `rspec') ///
            statistic(count `r') ///
            nototals ///
            append
        collect addtags `r'[__margCode__], ///
            fortags(result[count]#cmdset[`cmdset'])
        collect label levels `r' __margCode__ "N"
    } // c
} // r
collect style cell, nformat(%6.2f)
foreach r of local rowvars {
    collect style cell `r'[__margCode__], nformat(%19.0f)
}

collect style header result[fvpercent], level(hide)
collect style header result[count], level(hide)
collect style row stack, nobinder spacer
collect style column, dups(first)
collect style cell cell_type[column-header], halign(left)
collect layout (`rowvars') (`colvars')

collect export table.xlsx, replace

Here is a LibreOffice screenshot of the results posted to table.xlsx.

Click image for larger version

Name: Screenshot 2023-11-02 at 2.12.43 PM.png
Views: 1
Size: 246.9 KB
ID: 1732473

Last edited by Jeff Pitblado (StataCorp); 02 Nov 2023, 13:19.

Comment

Mariia Chavez

Join Date: Oct 2023

Posts: 28
#7

20 Nov 2023, 02:41

Thank you so much, both of you!
I am looking at your commands now, and trying to adjust it to my original dataset and intent.
One of the things that I am trying to achieve is to use a *c* to define the row variables.
Therefore, in order to try this with the nlsw88 dataset and your codes, I ran these commands to make it so that all my row variables were the only one's who had "c" in the variable name:
rename race rae
rename south soucth
rename idcode idode

Then I tried this for the suggestion by Andrew Musau:
webuse nlsw88, clear

local i 0
local vars *c*
foreach var1 of local vars{
foreach var2 of varlist married union industry rae {
local ++i
eststo m`i': estpost tab `var1' `var2'
}
local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
esttab m* using myfile.csv, cells(colpct(fmt(2))) unstack drop(Total: Total) `opt'
estimates drop _all
}

But then Stata replied that it was too many variables specified.

I also tried this for the suggestion by Jeff Pitblado:

local rowvars *c*
local colvars married union industry rae

local cmdset 0
foreach r of local rowvars {
sum `r', meanonly
if r(min) == 0 & r(max) == 1 {
local rspec 1.`r'
collect style header `r', title(hide)
}
else {
local rspec `r'
}
foreach c of local colvars {
local ++cmdset
table (var) (`c'), ///
statistic(fvpercent `rspec') ///
statistic(count `r') ///
nototals ///
append
collect addtags `r'[__margCode__], ///
fortags(result[count]#cmdset[`cmdset'])
collect label levels `r' __margCode__ "N"
} // c
} // r
collect style cell, nformat(%6.2f)
foreach r of local rowvars {
collect style cell `r'[__margCode__], nformat(%19.0f)
}

collect style header result[fvpercent], level(hide)
collect style header result[count], level(hide)
collect style row stack, nobinder spacer
collect style column, dups(first)
collect style cell cell_type[column-header], halign(left)
collect layout (`rowvars') (`colvars')

collect export statalist2.xlsx, replace

But then stata replied that there was a *c* found where a name expected.

Do any of you know how I could determine the row variables as any variable with a c in it?

Also, I was able to make .txt-files with both of your commands, which was great! But neither of these files used tabulator to separate the columns like tabout does, thus I was not able to copy-paste it over to an excel after. Do you know if there is a way to achieve this?

Potentially relevant additional information:
In our dataset, we have labeled all our categorical variables to end with "_c" and then whatever number of categories the variable contain i.e., "_c4".
We have also labelled all our dummy variables to end with "_d".
This is why I am trying to figure out how to do the *c* for the row variables so that I can just run one command that captures all the categorical or dummy variables at once.

Best,
Maria

Last edited by Mariia Chavez; 20 Nov 2023, 02:45.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10089
#8

20 Nov 2023, 05:07

Originally posted by Mariia Chavez View Post

Do any of you know how I could determine the row variables as any variable with a c in it?

Code:

webuse nlsw88, clear ds *c* local vars `r(varlist)' di "`vars'"

Res.:

Code:

. di "`vars'" idcode race collgrad c_city occupation

Also, I was able to make .txt-files with both of your commands, which was great! But neither of these files used tabulator to separate the columns like tabout does, thus I was not able to copy-paste it over to an excel after. Do you know if there is a way to achieve this?

If your goal is to obtain an Excel file at the end, then #5 and #6 get you there directly without having to convert to a text file then copy and paste to Excel.

Potentially relevant additional information:
In our dataset, we have labeled all our categorical variables to end with "_c" and then whatever number of categories the variable contain i.e., "_c4".
We have also labelled all our dummy variables to end with "_d".
This is why I am trying to figure out how to do the *c* for the row variables so that I can just run one command that captures all the categorical or dummy variables at once.

EDITED: The wildcard "?" is not allowed with ds. So the best you can do is

Code:

*ENDING WITH _c# ds *_c* local vars `r(varlist)'

But this will also capture variable names such as "city_code" as it has "_c" as a substring. So beware!

Last edited by Andrew Musau; 20 Nov 2023, 05:16.
Comment
Mariia Chavez

Join Date: Oct 2023

Posts: 28
#9

29 Nov 2023, 02:55

Thank you so much, Andrew!

Can you please show me where and how to add your suggested codes for *c* to your previous suggestion?
That is:

estimates drop _all
ssc install estout

webuse nlsw88, clear
ds *c*
local vars `r(varlist)'
di "`vars'"

local i 0
local [what do I write here to refer to the new varlist?]
foreach var1 of local vars{
foreach var2 of varlist married union industry rae {
local ++i
eststo m`i': estpost tab `var1' `var2'
}
local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
esttab m* using myfile2.csv, cells(colpct(fmt(2))) unstack drop(Total: Total) `opt'
estimates drop _all
}

As for the excel-issue. The idea is that we want to collect several analyses made from several commands into the same excel sheet, thus it would be simpler to manipulate it in Stata, while I can review it in the preview function that windows has. And then, when I have the command that makes what I want to display, ready, I just copy paste it into the existing excel file, with all my other analyses or tables. Thus, having it in a text file with tabulators separating each column, would be more ideal, than to have it make multiple excel files.

And also thank you for the heads up

Have a great day!
Maria
Comment
Mariia Chavez

Join Date: Oct 2023

Posts: 28
#10

30 Nov 2023, 02:26

Hello again!

I figured out how to add the new variable list.
With this command:
ds *c*
local vars `r(varlist)'
di "`vars'"

local i 0
local vars "`vars'"
foreach var1 of local vars{
foreach var2 of varlist married union industry rae {
local ++i
eststo m`i': estpost tab `var1' `var2'
}
local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
esttab m* using myfile.csv, cells(colpct(fmt(2))) unstack drop(Total: Total) `opt'
estimates drop _all
}

The table showed up with the correct variables So thank you!
I am having some difficulty implementing the code in my own dataset, though. I have swapped the relevant variable names and such, and when I run the code it seems like it worked perfectly, and there are no error messages. That said, no tables are made, even though I haven't touched the part of the code that makes the tables. Do you know if there are any differences in datasets that may cause this to occur??

The dummy variables are also still showing up with two rows for each value label (see picture). Is there any way to remove the negatives from all dummies, so that it only shows one line with the share of people that are college graduates, that do live in the south, etc?

As previously mentioned, in my original dataset, all dummies end with "_d". Would it help to separate them to get the dummies to only have one line, without affecting the categorical variables like occupation. So that in the command we have separate instructions for *c*-variables and *_d variables, but that they are all supposed to be joint together in the same table?

So I tried achieving something similar in the example set with this command:
. rename collgrad ollgrad_d
. rename soucth south_d
. rename c_city _ity_d

So, would there be a way to make a table with all variables with a *c* listed as they are with the command above, and all the variables that end in *_d with only one line showing the mean (so seeing as these are dummies coded as 0 and 1, the mean would then be equal to the proportion of the data points registered with a value of 1)?

All the best,
Maria
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10089

#11

30 Nov 2023, 09:30

I can answer the second question. Look at the names of the coefficients in the table and drop them.

Code:

webuse nlsw88, clear
ds c*
local vars `r(varlist)'
di "`vars'"
estimates drop _all
local i 0
foreach var1 of local vars{
    foreach var2 of varlist married union industry south{
        local ++i
        eststo m`i': estpost tab `var1' `var2'
    }
    local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
    esttab m* using myfile.txt, cells(colpct(fmt(2))) unstack drop(Total: Total Not*) `opt' 
    estimates drop _all
}

Res.:

Code:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                                                                                                                                                      
                   Single      Married     Nonunion        Union Ag/Forestr~s       Mining Construction Manufactur~g Transport/~y Wholesale/~e Finance/In~e Business/R~c Personal s~s Entertainm~c Profession~s Public adm~n    Not south        South
                   colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
College grad        23.38        23.86        22.30        32.10        17.65         0.00        10.34         8.99        12.22        11.41        14.58        19.77         3.09        29.41        41.87        24.43        24.08        23.14
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
N                    2246                      1878                      2232                                                                                                                                                        2246             
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                                                                                                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Central city        39.18        23.58        27.03        37.53        11.76        50.00        27.59        21.53        38.89        24.02        33.33        36.05        34.02        17.65        30.22        37.50        26.69        32.59
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
N                    2246                      1878                      2232                                                                                                                                                        2246             
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I wouldn't know what is wrong with your code because I cannot see your data. If you are able to share the dataset, I'd be happy to have a look.

Comment

Lisa NiDhu

Join Date: Sep 2020
Posts: 9

#12

15 Dec 2023, 03:23

Hi Andrew, this is exactly what I am looking for - I have a couple of questions
1) is is possible to have n and % in side by side columns? I have tried the code below but get % under n...

Code:

local i 0
local vars "a b c"
foreach var1 of local vars{
    foreach var2 of varlist gender2 ethnicity_3 {
        local ++i
        eststo m`i': estpost tab `var1' `var2'
    }
    local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
    esttab m* using myfile.csv, cells(b rowpct(fmt(2))) unstack drop(Total: Total) `opt'
    estimates drop _all
}

2) Is there a way to add the variable name like below?

	Female	Male	Asian	White	Other
VARIABLE NAME HERE	rowpct	rowpct	rowpct	rowpct	rowpct
Level 1	48.25	51.75	49.52	20.00	30.48
Level 2	53.07	46.93	44.13	26.54	29.33
Level 3	59.70	40.30	38.81	25.37	35.82

3) is it also possible to create a 'total' column - in my example the columns would be total, gender [ male, female ], ethnicity [white, asian, all other ethnicities]

Thank you so much in advance,
Lisa

Last edited by Lisa NiDhu; 15 Dec 2023, 03:36.

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10089
#13

15 Dec 2023, 11:31

Lisa NiDhu, please start a new thread and provide a data example (see FAQ Advice #12 for details on how to do that).
Comment

Announcement