Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging tables horizontally(multiple columns) as .txt files in stata, using tabout. In order to copy-paste to excel.

    Hello

    I have been trying to figure out how to merge multiple tables into one, by adding columns to an existing table in a .txt file using tabout in stata.
    I am currently using the .txt files to make tables using tabs, so that I can copy and paste them into excel. I prefer doing it this way rather than export the tables directly into excel. That said, seeing as it's ultimately going into an excel file, it is not a problem to include more columns than would be ideal for a document.


    These are the syntaxes I am currently using to make the tables. I have one for my dummy variables and one for my categorical variables.

    Dummy:
    foreach var of varlist city cityage citygnd citywealth {
    tabout one `var' using dummytable_`var'.txt, replace c(mean var_d) sum svy f(2) h1(var_d) h3(nil) npos(row) ptotal(none)
    }


    Categorical:
    foreach var of varlist city cityage citygnd citywealth {
    tabout var_c `var' using categoricaltable_`var'.txt, ///
    replace c(col) svy f(2) h1(var_c) h3(nil) npos(row) ptotal(none)
    }

    "var_d" represent a dummy variable and "var_c" represent a categorical variable.

    The four variables city, cityage, citygnd and citywealth contain a numerical measure of sample size categorized according to city only, or city and something else.

    The variable city holds the sample size of the survey dataset grouped only according to the three cities (lets call them London, Paris and Boston for the sake of simplicity). Thus, the variable city has only three value labels: London, Paris and Boston.
    The remaining three variables are categorized according to cities and another relevant grouping.
    For instance, the variable cityage holds the sample size grouped according to the three cities, as well as two age groups. Thus, the variable cityage has the value labels: London Age1, London Age2, Paris Age1, Paris Age2, Boston Age1 and Boston Age2.

    Only counting the columns containing data, the table I want has 21 columns in total (all the sub groupings/value labels for city cityage citygnd citywealth, summed). As the rows it has the var_d in the dummytable_`var' and var_c in the categoricaltable_`var'.
    (If possible, it would be ideal to merge the categorical and dummy tables as well, so that I can have multiple rows with both categorical and dummy variables in the same table, and multiple columns representing the variables city cityage citygnd and citywealth. But this is less critical as having the rows placed further down will be easier to fix later in excel, as long as the columns are placed correctly.)

    In the "Tips and Tricks" section of the tabout manual called "Publication quality tables in Stata: a tutorial for the tabout program" (not the manual for Tabout version 3). They describe how to merge several tables into one in a way that seems similar to what I want to achieve. However, these are based on an intent to make a three way table, which is not what I am trying to achieve. Also, the tables in this description are merged by adding rows, instead of columns.

    I did find an explanation from a previous post in statalist (Tabout 3 summary statistics transposing columns to rows - Statalist) for how to turn rows into multiple columns, but from what I understand this description is meant for one variable with multiple value labels, and not multiple variables with each their own set of value labels.
    (I was however able to use this description to make the dummytable_`var' horizontal instead of vertical, so that was helpful )


    Any help would be highly appreciated!

    Best,
    Maria Chavez







  • #2
    Hello again

    I continued working with making these tables while temporarily accepting that I have to make separate tables for each of the city variables, until I hear back from you.
    In the meantime, I have come upon another problem that I was hoping you could help me with.
    For the categorical table, I am trying to make one table for each of the city variables, that have all the categorical variables as rows. These all have in common that they have "_c" in the name of the variable.
    This could be easily done by just writing out all the categorical variables in the tabout formula, and finish with the city variable in the tabout-syntax (before "using....).
    That said, as there are many categorical variables so I'd prefer to add them using a varlist or *_c* in the syntax, instead of having to write them all down. However, I haven't been able to figure out how to use a varlist in the variable-section of the syntax, and when I write *_c* tabout understands that I want to make one table for each of the categorical variables, rather than include them as rows.

    foreach var of varlist city cityage citygnd citywealth {
    tabout *_c* `var' using categoricaltable_`var'.txt, ///
    replace c(col) svy f(2) h1(categorical) h3(nil) npos(row) ptotal(none)
    }

    Do you know how to do this?
    I would be so grateful for any help you may give me!

    Best,
    Maria Chavez

    Comment


    • #3
      There are not many users of tabout (the Stata Journal) that I see on the forum. Instead of fixating on a single approach to solving your problem, my advice is to create a small data example and display your intended output.. This will increase your chances of receiving helpful replies, but please don't feel obligated to do so.

      Comment


      • #4
        Thank you so much for your response, Andrew


        Attached to this comment is a picture of an attempt to display what I am trying to achieve using the nlsw88 example dataset:

        I have used the variables married, union and industry as substitute for my four city variables. As you can see in the picture, I have these variables presented as multiple columns, with one column designated to each of the subcategories of the variable. So, for instance, the variable married has two columns, one for married and one single. Meanwhile, industry has twelve columns, one for each of the value labels stored for the variable industry. In this table there are in total 19 columns with data, one for each of the subcategories pertaining to the variables married, union and industry. This is what I want for my table except with my four city variables, that in total have 21 subcategories pertaining to these four variables.

        This is what I am struggling the most to achieve. To have a set list of variables (my four city variables) appear as columns, and that I can add whatever other categorical or dummy variables as rows.
        In this table I added occupation, collgrad, south and c_city as rows. The categorical variable is presented with as many rows as it has subcategories. So for occupation, this is thirteen. Meanwhile, the dummies are presented only with one row. Due to the nature of a dummy variable, and that there are only two categories, I do not need the second row, as the proportion of the second subcategory will be easily deducible based on the proportions of the proportions of the first.

        There are a few ways in which the nlsw88 differs from my dataset. My dataset is based on survey data. This is why my syntaxes presented above have the svy function in them. Adding the svy functions also turns the content in the cells from percentages to proportions. This is what I want (to have the cell content be proportions and not percentages), but I didn't know how to make it so when it was not survey data which is why the table below shows percentages.

        The reason I have been intent on using tabout to export these tables to .txt files, rather than exporting it straight into excel with tabout, is that I have a Windows computer, and with a Windows computer I can preview the .txt-file and thus modify the syntax while keeping track of the output, instead of having to close the excel file in order to run a new command that changes it. In order to make it easy to later copy and paste it over to an excel, it is also important that the table is made with indents. If there is another way to do this, then please let me know!


        This is the syntax I used to make the table below happen:
        foreach var1 of varlist married union industry race {
        local firstvar2 1
        foreach var2 of varlist occupation collgrad south c_city {
        if `firstvar2' == 1 {
        local filemethod = "replace"
        local firstvar2 0
        }
        else {
        local filemethod = "append"
        }

        // Create a temporary dataset excluding missing data
        preserve
        keep if !missing(`var2')
        // Calculate survey-weighted means for the current variable
        tabout `var2' `var1' using statalist-`var1'.txt, `filemethod' c(col) f(2) h3(nil) npos(row) ptotal(none)
        restore
        }
        }


        This created four files with four tables in each files. Then, I copy pasted the content of the file over to an excel sheet, and assembled the separate tables into one, in order for it to display what I am hoping to learn how to make with minimal edits in the excel file. I also deleted the total-column for each of the variables, and the additional subcategory row for the horizontally displayed dummy variables.

        Let me know if there are any remaining uncertainties and I would be happy to respond.

        Best,
        Maria



        Click image for larger version

Name:	Statalist.jpg
Views:	1
Size:	194.0 KB
ID:	1732435

        Comment


        • #5
          Using estout from SSC, here is how I'd create the table that you display.

          Code:
          ssc install estout, clear
          Code:
          webuse nlsw88, clear
          estimates drop _all
          local i 0
          local vars "occupation collgrad south c_city"
          foreach var1 of local vars{
              foreach var2 of varlist married union industry race{
                  local ++i
                  eststo m`i': estpost tab `var1' `var2'
              }
              local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
              esttab m* using myfile.csv, cells(colpct(fmt(2))) unstack drop(Total: Total) `opt'
              estimates drop _all
          }
          You could save in .txt extension in case you want the output in a text file.

          Res.:
          Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	87.6 KB
ID:	1732449

          Last edited by Andrew Musau; 02 Nov 2023, 10:05.

          Comment


          • #6
            If you have Stata 17 or newer, you can use the new collect features and updated table command.
            Here is my attempt to reconstruct the nlsw88 data example.
            Code:
            clear all
            
            webuse nlsw88
            
            local rowvars occupation collgrad south c_city
            local colvars married union industry race
            
            local cmdset 0
            foreach r of local rowvars {
                sum `r', meanonly
                if r(min) == 0 & r(max) == 1 {
                    local rspec 1.`r'
                    collect style header `r', title(hide)
                }
                else {
                    local rspec `r'
                }
                foreach c of local colvars {
                    local ++cmdset
                    table (var) (`c'), ///
                        statistic(fvpercent `rspec') ///
                        statistic(count `r') ///
                        nototals ///
                        append
                    collect addtags `r'[__margCode__], ///
                        fortags(result[count]#cmdset[`cmdset'])
                    collect label levels `r' __margCode__ "N"
                } // c
            } // r
            collect style cell, nformat(%6.2f)
            foreach r of local rowvars {
                collect style cell `r'[__margCode__], nformat(%19.0f)
            }
            
            collect style header result[fvpercent], level(hide)
            collect style header result[count], level(hide)
            collect style row stack, nobinder spacer
            collect style column, dups(first)
            collect style cell cell_type[column-header], halign(left)
            collect layout (`rowvars') (`colvars')
            
            collect export table.xlsx, replace
            Here is a LibreOffice screenshot of the results posted to table.xlsx.
            Click image for larger version

Name:	Screenshot 2023-11-02 at 2.12.43 PM.png
Views:	1
Size:	246.9 KB
ID:	1732473

            Last edited by Jeff Pitblado (StataCorp); 02 Nov 2023, 13:19.

            Comment


            • #7
              Thank you so much, both of you!
              I am looking at your commands now, and trying to adjust it to my original dataset and intent.
              One of the things that I am trying to achieve is to use a *c* to define the row variables.
              Therefore, in order to try this with the nlsw88 dataset and your codes, I ran these commands to make it so that all my row variables were the only one's who had "c" in the variable name:
              rename race rae
              rename south soucth
              rename idcode idode

              Then I tried this for the suggestion by Andrew Musau:
              webuse nlsw88, clear

              local i 0
              local vars *c*
              foreach var1 of local vars{
              foreach var2 of varlist married union industry rae {
              local ++i
              eststo m`i': estpost tab `var1' `var2'
              }
              local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
              esttab m* using myfile.csv, cells(colpct(fmt(2))) unstack drop(Total: Total) `opt'
              estimates drop _all
              }


              But then Stata replied that it was too many variables specified.


              I also tried this for the suggestion by Jeff Pitblado:

              local rowvars *c*
              local colvars married union industry rae

              local cmdset 0
              foreach r of local rowvars {
              sum `r', meanonly
              if r(min) == 0 & r(max) == 1 {
              local rspec 1.`r'
              collect style header `r', title(hide)
              }
              else {
              local rspec `r'
              }
              foreach c of local colvars {
              local ++cmdset
              table (var) (`c'), ///
              statistic(fvpercent `rspec') ///
              statistic(count `r') ///
              nototals ///
              append
              collect addtags `r'[__margCode__], ///
              fortags(result[count]#cmdset[`cmdset'])
              collect label levels `r' __margCode__ "N"
              } // c
              } // r
              collect style cell, nformat(%6.2f)
              foreach r of local rowvars {
              collect style cell `r'[__margCode__], nformat(%19.0f)
              }

              collect style header result[fvpercent], level(hide)
              collect style header result[count], level(hide)
              collect style row stack, nobinder spacer
              collect style column, dups(first)
              collect style cell cell_type[column-header], halign(left)
              collect layout (`rowvars') (`colvars')

              collect export statalist2.xlsx, replace


              But then stata replied that there was a *c* found where a name expected.

              Do any of you know how I could determine the row variables as any variable with a c in it?


              Also, I was able to make .txt-files with both of your commands, which was great! But neither of these files used tabulator to separate the columns like tabout does, thus I was not able to copy-paste it over to an excel after. Do you know if there is a way to achieve this?


              Potentially relevant additional information:
              In our dataset, we have labeled all our categorical variables to end with "_c" and then whatever number of categories the variable contain i.e., "_c4".
              We have also labelled all our dummy variables to end with "_d".
              This is why I am trying to figure out how to do the *c* for the row variables so that I can just run one command that captures all the categorical or dummy variables at once.

              Best,
              Maria
              Last edited by Mariia Chavez; 20 Nov 2023, 02:45.

              Comment


              • #8
                Originally posted by Mariia Chavez View Post
                Do any of you know how I could determine the row variables as any variable with a c in it?
                Code:
                webuse nlsw88, clear
                ds *c*
                local vars `r(varlist)'
                di "`vars'"
                Res.:

                Code:
                . di "`vars'"
                idcode race collgrad c_city occupation
                Also, I was able to make .txt-files with both of your commands, which was great! But neither of these files used tabulator to separate the columns like tabout does, thus I was not able to copy-paste it over to an excel after. Do you know if there is a way to achieve this?
                If your goal is to obtain an Excel file at the end, then #5 and #6 get you there directly without having to convert to a text file then copy and paste to Excel.

                Potentially relevant additional information:
                In our dataset, we have labeled all our categorical variables to end with "_c" and then whatever number of categories the variable contain i.e., "_c4".
                We have also labelled all our dummy variables to end with "_d".
                This is why I am trying to figure out how to do the *c* for the row variables so that I can just run one command that captures all the categorical or dummy variables at once.
                EDITED: The wildcard "?" is not allowed with ds. So the best you can do is

                Code:
                *ENDING WITH _c#
                ds *_c*
                local vars `r(varlist)'
                But this will also capture variable names such as "city_code" as it has "_c" as a substring. So beware!
                Last edited by Andrew Musau; 20 Nov 2023, 05:16.

                Comment


                • #9
                  Thank you so much, Andrew!

                  Can you please show me where and how to add your suggested codes for *c* to your previous suggestion?
                  That is:

                  estimates drop _all
                  ssc install estout

                  webuse nlsw88, clear
                  ds *c*
                  local vars `r(varlist)'
                  di "`vars'"


                  local i 0
                  local [what do I write here to refer to the new varlist?]
                  foreach var1 of local vars{
                  foreach var2 of varlist married union industry rae {
                  local ++i
                  eststo m`i': estpost tab `var1' `var2'
                  }
                  local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
                  esttab m* using myfile2.csv, cells(colpct(fmt(2))) unstack drop(Total: Total) `opt'
                  estimates drop _all
                  }


                  As for the excel-issue. The idea is that we want to collect several analyses made from several commands into the same excel sheet, thus it would be simpler to manipulate it in Stata, while I can review it in the preview function that windows has. And then, when I have the command that makes what I want to display, ready, I just copy paste it into the existing excel file, with all my other analyses or tables. Thus, having it in a text file with tabulators separating each column, would be more ideal, than to have it make multiple excel files.


                  And also thank you for the heads up

                  Have a great day!
                  Maria

                  Comment


                  • #10
                    Hello again!

                    I figured out how to add the new variable list.
                    With this command:
                    ds *c*
                    local vars `r(varlist)'
                    di "`vars'"

                    local i 0
                    local vars "`vars'"
                    foreach var1 of local vars{
                    foreach var2 of varlist married union industry rae {
                    local ++i
                    eststo m`i': estpost tab `var1' `var2'
                    }
                    local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
                    esttab m* using myfile.csv, cells(colpct(fmt(2))) unstack drop(Total: Total) `opt'
                    estimates drop _all
                    }

                    The table showed up with the correct variables So thank you!
                    I am having some difficulty implementing the code in my own dataset, though. I have swapped the relevant variable names and such, and when I run the code it seems like it worked perfectly, and there are no error messages. That said, no tables are made, even though I haven't touched the part of the code that makes the tables. Do you know if there are any differences in datasets that may cause this to occur??

                    The dummy variables are also still showing up with two rows for each value label (see picture). Is there any way to remove the negatives from all dummies, so that it only shows one line with the share of people that are college graduates, that do live in the south, etc?

                    As previously mentioned, in my original dataset, all dummies end with "_d". Would it help to separate them to get the dummies to only have one line, without affecting the categorical variables like occupation. So that in the command we have separate instructions for *c*-variables and *_d variables, but that they are all supposed to be joint together in the same table?

                    So I tried achieving something similar in the example set with this command:
                    . rename collgrad ollgrad_d
                    . rename soucth south_d
                    . rename c_city _ity_d

                    So, would there be a way to make a table with all variables with a *c* listed as they are with the command above, and all the variables that end in *_d with only one line showing the mean (so seeing as these are dummies coded as 0 and 1, the mean would then be equal to the proportion of the data points registered with a value of 1)?
                    Click image for larger version

Name:	Statalist2.JPG
Views:	1
Size:	267.3 KB
ID:	1735626

                    All the best,
                    Maria

                    Comment


                    • #11
                      I can answer the second question. Look at the names of the coefficients in the table and drop them.

                      Code:
                      webuse nlsw88, clear
                      ds c*
                      local vars `r(varlist)'
                      di "`vars'"
                      estimates drop _all
                      local i 0
                      foreach var1 of local vars{
                          foreach var2 of varlist married union industry south{
                              local ++i
                              eststo m`i': estpost tab `var1' `var2'
                          }
                          local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
                          esttab m* using myfile.txt, cells(colpct(fmt(2))) unstack drop(Total: Total Not*) `opt' 
                          estimates drop _all
                      }
                      Res.:

                      Code:
                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                                                                                                                                                                            
                                         Single      Married     Nonunion        Union Ag/Forestr~s       Mining Construction Manufactur~g Transport/~y Wholesale/~e Finance/In~e Business/R~c Personal s~s Entertainm~c Profession~s Public adm~n    Not south        South
                                         colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct       colpct
                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      College grad        23.38        23.86        22.30        32.10        17.65         0.00        10.34         8.99        12.22        11.41        14.58        19.77         3.09        29.41        41.87        24.43        24.08        23.14
                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      N                    2246                      1878                      2232                                                                                                                                                        2246             
                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                                                                                                                                                                            
                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      Central city        39.18        23.58        27.03        37.53        11.76        50.00        27.59        21.53        38.89        24.02        33.33        36.05        34.02        17.65        30.22        37.50        26.69        32.59
                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      N                    2246                      1878                      2232                                                                                                                                                        2246             
                      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      I wouldn't know what is wrong with your code because I cannot see your data. If you are able to share the dataset, I'd be happy to have a look.

                      Comment


                      • #12
                        Hi Andrew, this is exactly what I am looking for - I have a couple of questions
                        1) is is possible to have n and % in side by side columns? I have tried the code below but get % under n...

                        Code:
                        local i 0
                        local vars "a b c"
                        foreach var1 of local vars{
                            foreach var2 of varlist gender2 ethnicity_3 {
                                local ++i
                                eststo m`i': estpost tab `var1' `var2'
                            }
                            local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
                            esttab m* using myfile.csv, cells(b rowpct(fmt(2))) unstack drop(Total: Total) `opt'
                            estimates drop _all
                        }
                        2) Is there a way to add the variable name like below?
                        Female Male Asian White Other
                        VARIABLE NAME HERE rowpct rowpct rowpct rowpct rowpct
                        Level 1 48.25 51.75 49.52 20.00 30.48
                        Level 2 53.07 46.93 44.13 26.54 29.33
                        Level 3 59.70 40.30 38.81 25.37 35.82

                        3) is it also possible to create a 'total' column - in my example the columns would be total, gender [ male, female ], ethnicity [white, asian, all other ethnicities]

                        Thank you so much in advance,
                        Lisa
                        Last edited by Lisa NiDhu; 15 Dec 2023, 03:36.

                        Comment


                        • #13
                          Lisa NiDhu, please start a new thread and provide a data example (see FAQ Advice #12 for details on how to do that).

                          Comment

                          Working...
                          X