Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making tables showing the total with estpost summarize

    Hello

    I am using Stata to make tables for a project that has collected Survey data from three cities. For some of the variables I want them displayed categorically, whereas for others I want just the mean value displayed. Like, for instance, for the distribution of age (variable b02age_n) I don't want it shown with multiple rows of the different ages, and then the proportion of participants with each of these ages, but rather one row with the mean age for each of the city*demographic categories.
    There are some other important features in the table, for instance, I want the total N to show for each variable
    I figured out how to formulate the command for the categorical variables, but I am struggling a bit with formulating the command for mean variables.

    Here is the command I use for the categorical variable:

    local vars grwloc_c3

    foreach var1 of local vars {
    local i 0 // Reset counter for each variable
    replace `var1' = 888 if `var1' == .a
    replace `var1' = 999 if `var1' == .b

    foreach var2 of varlist citygnd cityage citywealth {
    local ++i
    eststo City1`i': estpost tab `var1' `var2' if city == 1 [aw = gweightcorr_n]
    eststo City2`i': estpost tab `var1' `var2' if city == 2 [aw = gweightcorr_n]
    eststo City3`i': estpost tab `var1' `var2' if city == 3 [aw = gweightcorr_n]
    }

    local opt = cond("`var1'" == word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
    esttab City1* City2* City3* using cattables.csv, cells("colpct(fmt(2)) b(label(N) fmt(0))") unstack drop(Total: ) `opt' mtitles(`var1') noobs order(Total) note("Extra line") addnotes("Observations")

    estimates drop _all // Clear estimates after output
    }

    I was able to achieve the correct display of the means by using the estpost summarize instead of the estpost tab function. The following command is the closest I have come to a command for the mean-variables

    local vars b02age_n

    foreach var1 of local vars {
    local i 0 // Reset counter for each variable
    replace `var1' = 888 if `var1' == .a
    replace `var1' = 999 if `var1' == .b

    foreach var2 of varlist citygnd cityage citywealth {
    local ++i

    // Use estpost summarize to calculate the mean
    eststo City1`i': estpost summarize `var1' if city == 1 [aw = gweightcorr_n]
    eststo City2`i': estpost summarize `var1' if city == 2 [aw = gweightcorr_n]
    eststo City3`i': estpost summarize `var1' if city == 3 [aw = gweightcorr_n]
    }

    local opt = cond("`var1'" == word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")


    esttab City1* City2* City3* using meantestdescriptives.csv, ///
    cells("mean(fmt(2)) b(label(N) fmt(0))") unstack `opt' ///
    mtitles(`var1') noobs note("Extra line") addnotes("Observations")

    estimates drop _all
    }


    but this drops the rows with the total, and I cant figure out to how to incorporate this with the estpost summarize

    Can you help me with this, please?

    Best,
    Maria

  • #2
    Please add a reproducible example. Note that estout is from SSC (FAQ Advice #12).

    Comment


    • #3
      The first command is based on one that I received from you at an earlier occation, where I used the nlsw88 example dataset:

      webuse nlsw88, clear
      estimates drop _all
      local i 0
      local vars "occupation collgrad south c_city"
      foreach var1 of local vars{
      foreach var2 of varlist married union race{
      local ++i
      eststo Mining`i': estpost tab `var1' `var2' if industry==2
      eststo Construction`i': estpost tab `var1' `var2' if industry==3
      eststo Manufacturing`i': estpost tab `var1' `var2' if industry==4
      }
      local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
      esttab Mining* Construction* Manufacturing* using myfile3.csv, cells(colpct(fmt(2))) unstack drop(Total: Total) `opt' mtitles(`var1') note("Extra line") addnotes("Observations")
      estimates drop _all
      }

      Here is a version employing the nlsw88 dataset, modified to present means:

      webuse nlsw88, clear
      estimates drop _all
      local i 0
      local vars "age grade"
      foreach var1 of local vars{
      foreach var2 of varlist married union race{
      local ++i
      eststo Mining`i': estpost summarize `var1' `var2' if industry==2
      eststo Construction`i': estpost summarize `var1' `var2' if industry==3
      eststo Manufacturing`i': estpost summarize `var1' `var2' if industry==4
      }
      local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
      esttab Mining* Construction* Manufacturing* using myfile4.csv, cells("mean(fmt(2)) b(label(N) fmt(0))") unstack `opt' mtitles(`var1') noobs note("Extra Line") addnotes("Observations")
      estimates drop _all
      }

      Comment


      • #4
        Originally posted by Mariia Chavez View Post
        but this drops the rows with the total, and I cant figure out to how to incorporate this with the estpost summarize
        Aren't the totals held in the matrix e(count)? This suggests the following modification (highlighted) to #3.

        Code:
        webuse nlsw88, clear
        estimates drop _all
        local i 0
        local vars "age grade"
        foreach var1 of local vars{
        foreach var2 of varlist married union race{
        local ++i
        eststo Mining`i': estpost summarize `var1' `var2' if industry==2
        eststo Construction`i': estpost summarize `var1' `var2' if industry==3
        eststo Manufacturing`i': estpost summarize `var1' `var2' if industry==4
        }
        local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
        esttab Mining* Construction* Manufacturing* using myfile4.csv, cells("mean(fmt(2)) count(label(N) fmt(0))") unstack `opt' mtitles(`var1') noobs note("Extra Line") addnotes("Observations")
        estimates drop _all
        }

        Comment


        • #5
          Thank you. Yes, the command you are proposing are showing the totals, but the "collgrad" "married" and "race" variables are presented wrong.


          In the following estpost command, these variables are presented as columns nested and repeated for each industry. This is what I want.

          webuse nlsw88, clear
          estimates drop _all
          local i 0


          local vars "occupation"
          foreach var1 of local vars{
          foreach var2 of varlist collgrad married race{
          local ++i
          eststo Agriculture`i': estpost tab `var1' `var2' if industry==1
          eststo Mining`i': estpost tab `var1' `var2' if industry==2
          eststo Construction`i': estpost tab `var1' `var2' if industry==3
          }
          local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
          esttab Agriculture* Mining* Construction* using myfile5.csv, cells("colpct(fmt(2)) b(label(N) fmt(0))") unstack drop(Total: ) `opt' mtitles(Agriculture1) noobs order(Total) note("Extra Line") addnotes("Observations")
          estimates drop _all
          }

          How can I achieve this for the estpost summarize command too?

          Best,
          Maria

          Comment


          • #6
            I think this should do it. I name the models "industry name_categorical level", but you can store the model names in a local and create a second local that gets rid of the underscores and categorical levels if you only want the industry names in the output. This should be fed through the -mlabel()- option of esttab. Let me know if you want this but cannot figure out how to do it.

            Code:
            webuse nlsw88, clear
            estimates drop _all
            local i 0
            local vars "age grade"
            foreach var1 of local vars{
                foreach var2 of varlist married union race{
                    local ++i
                    qui levelsof `var2', local(levs)
                    foreach j of local levs{
                        capture noisily{
                            eststo Mining`i'_`j': estpost summarize `var1' if `j'.`var2' & industry==2
                            mat mean= e(mean)
                            mat colname mean= "mean"
                            mat count= e(N)
                            mat colname count= "mean"
                            mat coleq mean= "`:lab (`var2') `j''"
                            estadd mat mean= mean, replace
                            estadd mat count= count, replace
                            eststo Construction`i'_`j': estpost summarize `var1' if `j'.`var2' & industry==3
                            mat mean= e(mean)
                            mat colname mean= "mean"
                            mat count= e(N)
                            mat colname count= "mean"
                            mat coleq mean= "`:lab (`var2') `j''"
                            estadd mat mean= mean, replace
                            estadd mat count= count, replace
                            eststo Manufacturing`i'_`j': estpost summarize `var1' if `j'.`var2' & industry==4
                            mat mean= e(mean)
                            mat colname mean= "mean"
                            mat count= e(N)
                            mat colname count= "mean"
                            mat coleq mean= "`:lab (`var2') `j''"
                            estadd mat mean= mean, replace
                            estadd mat count= count, replace
                            
                        }
                    }
                }
            esttab Mining* Construction* Manufacturing* using myfile4.csv, coeflab(mean " ") ///
            cells("mean(fmt(2)) count(label(N) fmt(0))") replace unstack noobs mlab(Mining1_0) ///
            note("Extra Line") addnotes("Observations")
            }

            Comment


            • #7
              Thank you! But now it seems like the age and grade variables, that were supposed to be listed as rows, are also listed as columns.
              I was expecting a table with two rows, one showing the mean and total N for age within each [married, union, race] x [industry] category, and the other showing the mean and total N for grade within each category. But now there is only one row for both, first age and then grades. Would it be possible to have age and grade appear, as separate rows?

              Best,
              Maria
              Last edited by Mariia Chavez; 30 Oct 2024, 05:00.

              Comment


              • #8
                Then you don't need to rename the matrix columns.

                Code:
                webuse nlsw88, clear
                estimates drop _all
                local i 0
                local vars "age grade"
                foreach var1 of local vars{
                    foreach var2 of varlist married union race{
                        local ++i
                        qui levelsof `var2', local(levs)
                        foreach j of local levs{
                            capture noisily{
                                eststo Mining`i'_`j': estpost summarize `var1' if `j'.`var2' & industry==2
                                mat mean= e(mean)
                                mat colname count= "mean"
                                mat coleq mean= "`:lab (`var2') `j''"
                                estadd mat mean= mean, replace
                                eststo Construction`i'_`j': estpost summarize `var1' if `j'.`var2' & industry==3
                                mat mean= e(mean)
                                mat colname count= "mean"
                                mat coleq mean= "`:lab (`var2') `j''"
                                estadd mat mean= mean, replace
                                eststo Manufacturing`i'_`j': estpost summarize `var1' if `j'.`var2' & industry==4
                                mat mean= e(mean)
                                mat coleq mean= "`:lab (`var2') `j''"
                                estadd mat mean= mean, replace    
                            }
                        }
                    }
                local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)") 
                esttab Mining* Construction* Manufacturing* using myfile4.csv, cells("mean(fmt(2)) count(label(N) fmt(0))") ///
                unstack `opt' mlab(none) noobs note("Extra Line") addnotes("Observations")
                estimates drop _all
                }

                Comment


                • #9
                  Thank you
                  When I put this code, I get the table, but I also get these error codes:

                  matrix count not found
                  no observations
                  no observations
                  estimation result Construction* not found
                  r(111);

                  Do you know why is that, and what they mean for my table?

                  Best,
                  Maria

                  Comment


                  • #10
                    capture noisily{
                    ...
                    }
                    This is used to suppress "no observations" error messages and allow the loop to continue executing. In some industries, certain levels are not observed (i.e., they are absent), which leads to these "no observations" errors. An estimation result may not exist if all the tabulations comprising it are unsuccessful. Try implementing these steps manually (outside the loop code) to see how it goes.

                    Comment


                    • #11
                      Thank you
                      I don't know if it is related to the "no observations" error code, but I am having difficulties implementing the code with my own dataset. When I use the code you gave me for the example dataset nlsw88 it produces the tables that I want. But when I try to apply the same command for my own dataset, for some reason it only wants to produce a table for one "industry" (or in my dataset: "city") at a time. That is, the following commands work:

                      *** Full code for São Vicente

                      estimates drop _all
                      local i 0
                      local vars "b02age_n b37sblno_n"
                      foreach var1 of local vars{
                      foreach var2 of varlist citygnd cityage citywealth_t3 {
                      local ++i
                      qui levelsof `var2', local(levs)
                      foreach j of local levs{
                      capture noisily{
                      eststo SãoVicente`i'_`j': estpost summarize `var1' if `j'.`var2' & city==3 [aw = gweightcorr_n]
                      mat mean= e(mean)
                      mat coleq mean= "`:lab (`var2') `j''"
                      estadd mat mean= mean, replace
                      }
                      }
                      }
                      local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
                      esttab SãoVicente* using testingSV.csv, cells("mean(fmt(2)) count(label(N) fmt(0))") ///
                      unstack `opt' mlab(none) noobs note("Extra Line") addnotes("Observations")
                      estimates drop _all
                      }

                      ***

                      *** Full code for Serekunda

                      estimates drop _all
                      local i 0
                      local vars "b02age_n b37sblno_n"
                      foreach var1 of local vars{
                      foreach var2 of varlist citygnd cityage citywealth_t3 {
                      local ++i
                      qui levelsof `var2', local(levs)
                      foreach j of local levs{
                      capture noisily{
                      eststo Serekunda`i'_`j': estpost summarize `var1' if `j'.`var2' & city==1
                      mat mean= e(mean)
                      mat colname count= "mean"
                      mat coleq mean= "`:lab (`var2') `j''"
                      estadd mat mean= mean, replace
                      }
                      }
                      }
                      local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
                      esttab Serekunda* using testingSr.csv, cells("mean(fmt(2)) count(label(N) fmt(0))") ///
                      unstack `opt' mlab(none) noobs note("Extra Line") addnotes("Observations")
                      estimates drop _all
                      }

                      ***

                      ** Full code for Tema

                      estimates drop _all
                      local i 0
                      local vars "b02age_n b37sblno_n"
                      foreach var1 of local vars{
                      foreach var2 of varlist citygnd cityage citywealth_t3 {
                      local ++i
                      qui levelsof `var2', local(levs)
                      foreach j of local levs{
                      capture noisily{
                      eststo Tema`i'_`j': estpost summarize `var1' if `j'.`var2' & city==2 [aw = gweightcorr_n]
                      mat mean= e(mean)
                      mat colname count= "mean"
                      mat coleq mean= "`:lab (`var2') `j''"
                      estadd mat mean= mean, replace
                      }
                      }
                      }
                      local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
                      esttab Tema* using testingT.csv, cells("mean(fmt(2)) count(label(N) fmt(0))") ///
                      unstack `opt' mlab(none) noobs note("Extra Line") addnotes("Observations")
                      estimates drop _all
                      }

                      ***

                      Whereas the following command including all three cities does not create any output at all:

                      *** Full code

                      estimates drop _all
                      local i 0
                      local vars "b02age_n b37sblno_n"
                      foreach var1 of local vars{
                      foreach var2 of varlist citygnd cityage citywealth_t3 {
                      local ++i
                      qui levelsof `var2', local(levs)
                      foreach j of local levs{
                      capture noisily{
                      eststo Serekunda`i'_`j': estpost summarize `var1' if `j'.`var2' & city==1 [aw = gweightcorr_n]
                      mat mean= e(mean)
                      mat coleq mean= "`:lab (`var2') `j''"
                      estadd mat mean= mean, replace
                      eststo Tema`i'_`j': estpost summarize `var1' if `j'.`var2' & city==2 [aw = gweightcorr_n]
                      mat mean= e(mean)
                      mat colname count= "mean"
                      mat coleq mean= "`:lab (`var2') `j''"
                      estadd mat mean= mean, replace
                      eststo SãoVicente`i'_`j': estpost summarize `var1' if `j'.`var2' & city==3 [aw = gweightcorr_n]
                      mat mean= e(mean)
                      mat coleq mean= "`:lab (`var2') `j''"
                      estadd mat mean= mean, replace
                      }
                      }
                      }
                      local opt= cond("`var1'"==word("`vars'", 1), "replace nonumb", "append eqlab(none) nonumb collab(none)")
                      esttab Serekunda* Tema* SãoVicente* using testingall.csv, cells("mean(fmt(2)) count(label(N) fmt(0))") ///
                      unstack `opt' mlab(none) noobs note("Extra Line") addnotes("Observations")
                      estimates drop _all
                      }

                      ***



                      When I run the previous command for all three cities I receive the following error message:

                      "matrix count not found
                      estimation result SãoVicente* not found
                      r(111);

                      end of do-file

                      r(111);"

                      This is similar to the error message I receive when I run your command with the nlsw88 dataset:

                      "matrix count not found
                      no observations
                      no observations
                      estimation result Construction* not found
                      r(111);

                      end of do-file

                      r(111);"


                      Yet, with the nlsw88 (unlike with my own dataset) the command still produces a table.
                      I know it must be difficult without access to my dataset for you to say anything about why it may be that it does not work for me to run the command with all cities, when it works with the nlsw88. Seeing as the error messages are the same between the two datasets where one produces a table and the other doesn't, I would guess that these error messages do not hold the answer to my problem. Hence, I am completely at a loss for how to go about to fix this, and I would be so grateful for any suggestions for where I can start looking for the problem. Do you for instance think it could it help to implement the steps to suppress "no observations" manually, like you suggested above? If so how does one do this?

                      I don't know how relevant this is, but I have also tried variations of the full code where I include only one var1 ("b02age_n") and/or one var2 (citygnd). I have also tried it without the weights ([aw = gweightcorr_n]). Neither of these variations of the command produce a table. Sometimes it changes which city it is that "can't be found" in the error message, between SãoVicente* and Tema*.

                      Best,
                      Maria
                      Last edited by Mariia Chavez; 05 Nov 2024, 03:22.

                      Comment


                      • #12
                        It's difficult to tell absent a data example.

                        Comment


                        • #13
                          But if you had access to the data. Where would you start to look for the problem? What would you do?

                          Comment


                          • #14
                            If an error stops the code execution, I would investigate it by using:

                            Code:
                            set trace on
                            Reviewing the log may then help me diagnose the problem. If the code runs successfully but no table is generated, I would simplify the code to tab only one variable and check if I get a result. Remember, a loop is simply an iteration of a set of commands, so running the commands outside the loop can indicate what fails and what works.

                            ​​​​​​​​​​​​​​
                            Code:
                            estpost summarize ...
                            esttab ...

                            Comment


                            • #15
                              Thank you
                              After some trial and error I found out that it works with two cities, just not three. And it doesn't matter which I leave out as it apparently works with any two cities. Hence, I will just copy paste the third city in for these variables.

                              Comment

                              Working...
                              X