Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Preventing -estout- from overwriting e(labels) in successive runs

    Dear Statalisters,

    I am hoping to get -estout- (written by Ben Jann, link below) to assign labels to factor variables. This has been discussed previously (links below) but apparently not yet been resolved.

    My approach is to place the labels into a local and attach these to the estimation set as e(labels) using -estadd-. In the process of trying this I noticed that sometimes -estout- appears to overwrite labels stored in e(labels), which can lead to labels being dropped. I am hoping that someone might know how to prevent this.

    I paste below example code with commentary as well as the associated output.

    EDIT: I am using Stata MP 13.1.

    Thanks for your thoughts, and apologies for mistakenly cross-posting with the old listserv.
    Bert



    ** Discussion on labels for factor variables in -estout-, with
    solutions summarized by Bert Lloyd

    http://www.stata.com/statalist/archi.../msg00139.html

    ** Ben Jann's fabulous -estout-

    http://repec.org/bocode/e/estout/index.html



    *********** Example code **************

    Code:
    which estout
    
    sysuse auto, clear
    eststo clear
    
    eststo set1: estpost tabstat price in 1/15, by(make)
    
    * To get the labels, use: varlabels(`e(labels)').  This works nicely.
    estout set1, cells(mean)
    estout set1, cells(mean) varlabels(`e(labels)')
    
    * Note that e(labels) as assigned to this estimation set contains the
    relevant labels.
    ereturn list
    di `"`e(labels)'"'
    
    
    * The problem appears when preparing another estimation set.
    * This affects the labeling by -estout- IF the levels of the
    additional set < original set.
    
    * Example:
    
    * set2 has as many levels/labels => no problem
    eststo set2: estpost tabstat price in 21/35, by(make)
    estout set1, cells(mean) varlabels(`e(labels)')
    
    * set3 has as FEWER levels/labels => -estout- omits some labels
    eststo set3: estpost tabstat price in 21/23, by(make)
    estout set1, cells(mean) varlabels(`e(labels)')
    
    
    * It seems that -estout- overwrites the e(labels) that were attached
    to est1.  It now only contains 3 entries.
    di `"`e(labels)'"'
    ********* End of example code ********


    *********** Output *************


    . which estout
    c:\ado\plus\e\estout.ado
    *! version 3.13 06aug2009 Ben Jann

    .
    . sysuse auto, clear
    (1978 Automobile Data)

    . eststo clear

    .
    . eststo set1: estpost tabstat price in 1/15, by(make)

    Summary statistics: mean
    for variables: price
    by categories of: make

    make | e(mean)
    -------------+-----------
    1 | 4099
    2 | 4749
    3 | 3799
    4 | 4816
    5 | 7827
    6 | 5788
    7 | 4453
    8 | 5189
    9 | 10372
    10 | 4082
    11 | 11385
    12 | 14500
    13 | 15906
    14 | 3299
    15 | 5705
    -------------+-----------
    Total | 7064.6

    category labels saved in macro e(labels)

    .
    . * To get the labels, use: varlabels(`e(labels)'). This works nicely.
    . estout set1, cells(mean)

    -------------------------
    set1
    mean
    -------------------------
    1 4099
    2 4749
    3 3799
    4 4816
    5 7827
    6 5788
    7 4453
    8 5189
    9 10372
    10 4082
    11 11385
    12 14500
    13 15906
    14 3299
    15 5705
    Total 7064.6
    -------------------------

    . estout set1, cells(mean) varlabels(`e(labels)')

    -------------------------
    set1
    mean
    -------------------------
    AMC Concord 4099
    AMC Pacer 4749
    AMC Spirit 3799
    Buick Cent~y 4816
    Buick Elec~a 7827
    Buick LeSa~e 5788
    Buick Opel 4453
    Buick Regal 5189
    Buick Rivi~a 10372
    Buick Skyl~k 4082
    Cad. Deville 11385
    Cad. Eldor~o 14500
    Cad. Seville 15906
    Chev. Chev~e 3299
    Chev. Impala 5705
    Total 7064.6
    -------------------------

    .
    . * Note that e(labels) as assigned to this estimation set contains
    the relevant labels.
    . ereturn list

    scalars:
    e(N) = 15

    macros:
    e(_estimates_name) : "set1"
    e(cmd) : "estpost"
    e(subcmd) : "tabstat"
    e(stats) : "mean"
    e(vars) : "price"
    e(byvar) : "make"
    e(labels) : "1 `"AMC Concord"' 2 `"AMC Pacer"' 3 `"AMC
    Spirit"' 4 `"Buick Century"' 5 `"Buick Electra"' 6 `"Buick LeSabre"' 7
    `"Buick Opel"' 8 `"Buick.."

    matrices:
    e(mean) : 1 x 16

    . di `"`e(labels)'"'
    1 `"AMC Concord"' 2 `"AMC Pacer"' 3 `"AMC Spirit"' 4 `"Buick Century"'
    5 `"Buick Electra"' 6 `"Buick LeSabre"' 7 `"Buick Opel"' 8 `"Buick
    Regal"' 9 `"Buick Riviera"' 1
    > 0 `"Buick Skylark"' 11 `"Cad. Deville"' 12 `"Cad. Eldorado"' 13 `"Cad. Seville"' 14 `"Chev. Chevette"' 15 `"Chev. Impala"'

    .
    .
    . * The problem appears when preparing another estimation set.
    . * This affects the labeling by -estout- IF the levels of the
    additional set < original set.
    .
    . * Example:
    .
    . * set2 has as many levels/labels => no problem
    . eststo set2: estpost tabstat price in 21/35, by(make)

    Summary statistics: mean
    for variables: price
    by categories of: make

    make | e(mean)
    -------------+-----------
    1 | 4010
    2 | 5886
    3 | 6342
    4 | 4389
    5 | 4187
    6 | 11497
    7 | 13594
    8 | 13466
    9 | 3829
    10 | 5379
    11 | 6165
    12 | 4516
    13 | 6303
    14 | 3291
    15 | 8814
    -------------+-----------
    Total | 6777.867

    category labels saved in macro e(labels)

    . estout set1, cells(mean) varlabels(`e(labels)')

    -------------------------
    set1
    mean
    -------------------------
    Dodge Dipl~t 4099
    Dodge Magnum 4749
    Dodge.. Re~s 3799
    Ford Fiesta 4816
    Ford Mustang 7827
    Linc. Cont~l 5788
    Linc. Mark V 4453
    Linc. Vers~s 5189
    Merc. Bobcat 10372
    Merc. Cougar 4082
    Merc. Marq~s 11385
    Merc. Mona~h 14500
    Merc. XR-7 15906
    Merc. Zephyr 3299
    Olds 98 5705
    Total 7064.6
    -------------------------

    .
    . * set3 has as FEWER levels/labels => -estout- omits some labels
    . eststo set3: estpost tabstat price in 21/23, by(make)

    Summary statistics: mean
    for variables: price
    by categories of: make

    make | e(mean)
    -------------+-----------
    1 | 4010
    2 | 5886
    3 | 6342
    -------------+-----------
    Total | 5412.667

    category labels saved in macro e(labels)

    . estout set1, cells(mean) varlabels(`e(labels)')

    -------------------------
    set1
    mean
    -------------------------
    Dodge Dipl~t 4099
    Dodge Magnum 4749
    Dodge.. Re~s 3799
    4 4816
    5 7827
    6 5788
    7 4453
    8 5189
    9 10372
    10 4082
    11 11385
    12 14500
    13 15906
    14 3299
    15 5705
    Total 7064.6
    -------------------------

    .
    .
    . * It seems that -estout- overwrites the e(labels) that were attached
    to est1. It now only contains 3 entries.
    . di `"`e(labels)'"'
    1 `"Dodge Diplomat"' 2 `"Dodge Magnum"' 3 `"Dodge St. Regis"'




    ******************************
    Last edited by Bert Jung; 16 Apr 2014, 13:38.

  • #2
    Also posted in old Statalist. Choose one or t'other please! (Preferably this forum.)

    Comment


    • #3
      Hi,

      Stata is not re-writing anything. The problem is that you are using the values for e(labels) in the estimates that are in memory, and those are the ones for set3. Consider the following:

      Code:
      set more off
      sysuse auto, clear
      quiet eststo set1: estpost tabstat price in 1/15, by(make)
      estout set1, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')
      quiet eststo set2: estpost tabstat price in 21/35, by(make)
      estout set2, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')
      quiet eststo set3: estpost tabstat price in 21/23, by(make)
      estout set3, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')
      
      * This will use the first three labels of set3 because they are the ones
      * currently stored in e(labels)
      estout set1, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')
      
      * This will put set1 in memory and thus use the appropriate labels
      est restore set1
      estout set1, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')
      So the problem is that you don't have set1 any longer in memory when you are calling e(labels) in the estout command with set1.
      Last edited by Alfonso Sánchez-Peñalver; 16 Apr 2014, 15:32. Reason: Formatting.
      Alfonso Sanchez-Penalver

      Comment


      • #4
        Thanks Alfonso,

        That is helpful to refine my question: -estout- successfully stores and recalls other components of the estimation output, such as e(mean). In the below I output all three sets at the same time, and the means are correct. Those items can be recalled even after additional/different -eststo- are done. What can I do to make e(labels) similarly persistent?

        I should add that the problem of the e(labels) also occurs with regressions, not just -tabstat-. In that case your -est restore- allows to show e(labels) for one estimation set but not any of the others that might be included as columns in the -estout- table. So ultimately I suspect we need to make e(labels) permanent just like the other components.

        Hope that makes sense, and thanks again,
        Bert


        PS: Incidentally it seems that one needs to be careful when combinining -estpost tabstat- results. The below example correctly shows the means for the three estimation sets. But the labels are clearly wrong. My guess is that occurs because the column names in -tabstat- e(mean) and hence -estout- r(coefs) are just a numbered sequence - which has a different meaning for each of the -tabstat- commands. Maybe that also messes up the labels, since the values 1, 2, ... have different labels in each estimation set.


        Code:
        ​set more off
        sysuse auto, clear
        eststo set1: estpost tabstat price in 1/15, by(make)
        quiet eststo set2: estpost tabstat price in 21/35, by(make)
        eststo set3: estpost tabstat price in 21/23, by(make)
        
        * The means are correct, but the labels are only based on set3
        estout set1 set2 set3, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')
        
        * Fyi - the tabstat matrices e(mean) use columnnames 1, 2... n.  Maybe that trips up estout when preparing the joint matrix r(coefs)?
        qui eststo set3: estpost tabstat price in 21/23, by(make)
        mat list e(mean)
        
        qui estout set1 set2 set3, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')
        mat list r(coefs)

        Comment


        • #5
          Again, you are only using the e(labels) that is in memory when using `e(labels)' so don't be surprised it only uses the ones for set3, because those are the estimates loaded in memory. Having said that, it seems that estout doesn't handle label values well, which is how we label categories in Stata. Consider the following

          Code:
          sysuse lifeexp.dta, clear
          reg popgrowth i.region gnppc
          est store reg1
          reg popgrowth i.region gnppc lexp
          est store reg2
          
          estout reg1 reg2, cells(b(star fmt(%10.2fc)) se(par fmt(%10.2fc))) label legend
          Notice how in both regressions, since we're using factor variable notation (i.) you have the value labels (N.A. and S.A.) describing the variables. To use the variables' labels in the estout table we have the label option. So you can see that for the non-categorical variables it captures the variable label very well, but for the categorical variable it simply uses the variable label with the category prefix, not the value labels. Of course you can go around that by defining the labels in the estout statement like
          Code:
          estout reg1 reg2, cells(b(star fmt(%10.2fc)) se(par fmt(%10.2fc))) label legend ///
              varlabel(1b.region "Eur & C. Asia" 2.region "N.A" 3.region "S.A")
          but it kind of beats the purpose. Having said that the way you're using estout I'm not sure this would help either. The reason is that it is storing the observation number in the reduced set as the name, so you can see when it presents the table it has continuous numbers from 1 to whatever the last observation number is in each estimation. The label option does not work then, because there is no variable associated with the estimate to pull the label from.

          Sorry I can't help any more.

          Alfonso.
          Alfonso Sanchez-Penalver

          Comment


          • #6
            Thanks for your help Alfonso.

            Below my current solution in case it is useful for anyone else. Since I was unable to firmly attach the e(labels) to the estimation output, I am using a workaround that stores the labels in a macro and uses those in -estout-'s varlablels( ). It's a little clumsy but it works and will save me time from manually changing the labels for the factor variables.

            Cheers Bert


            Code:
            * The program puts levels of factor variables into sreturn that can be placed into local and used in -estout-'s varlabels( )
            This example works for basic levels, as well as baselevels and omitted levels
            
            cap program drop estoutfvlabels
            program estoutfvlabels, sclass
            
                foreach i in `: colnames e(b)'  {
            
                    * Focus on factor variables in e(b), which always have three parts: number.varname
                    
                    * Add label to levels that were actually used
                    if regexm("`i'", "([0-9]+)\.([a-zA-Z]+)")==1    {
                    
                        local level = substr("`i'", 1, strpos("`i'", ".")-1 )
                        local vname = substr("`i'", strpos("`i'", ".")+1, .                    )
                        
                        local vlab: label( `vname' ) `level'
                        local tmp  `tmp'  `i' `"`vlab'"'        
                    }
                    
                    
                    * Add label+suffix for baselevels (#b.varname) and omitted levels (#o.varname)
                    if regexm("`i'", "([0-9]+[bo])\.([a-zA-Z]+)")==1    {
                        
                        * Because the convention is "b." or "o." have to substract 1 additional character to get to level
                        local level = substr("`i'", 1, strpos("`i'", ".")-2 )
                        local vname = substr("`i'", strpos("`i'", ".")+1, .                    )
                        
                        local vlab: label( `vname' ) `level'
                        
                        * Baselevel #b.varname
                        if regexm("`i'", "([0-9]+[b])\.([a-zA-Z]+)")==1    {
                            local tmp  `tmp'  `i' `"`vlab' [baselevel]"'        
                        }        
                        
                        * Omitted #o.varname
                        if regexm("`i'", "([0-9]+[o])\.([a-zA-Z]+)")==1    {
                            local tmp  `tmp'  `i' `"`vlab' [omitted]"'        
                        }
                    }
                }
            
                sreturn local fvlabels = `"`tmp'"'
                
            end
            
            
            sysuse auto, clear
            eststo clear
            
            encode make, gen( m ) 
            
            * Run the regression
            * Immediately call the above programput the results (the labels) into a local for later use
            qui eststo set1: reg price mpg i.m in 1/10
            
            estoutfvlabels
            local lab1 = `"`s(fvlabels)'"'
            
            * For a second regression, make sure to use a differently named local
            qui eststo set2: reg price mpg i.foreign in 1/10
            
            estoutfvlabels
            local lab2 = `"`s(fvlabels)'"'
            
            
            * Place the locals with the labels into varlabels( )
            estout set1 set2, varlabels( `lab1' `lab2' _cons "Constant" ) label

            Output

            Code:
            ----------------------------------------------
                                         set1         set2
                                            b            b
            ----------------------------------------------
            Mileage (mpg)            5.666667    -418.9326
            AMC Concord [basel~]            0             
            AMC Pacer                678.3333             
            AMC Spirit                   -300             
            Buick Century            728.3333             
            Buick Electra            3767.667             
            Buick LeSabre            1711.667             
            Buick Opel               331.3333             
            Buick Regal              1101.333             
            Buick Riviera                6307             
            Buick Skylark [omi~]            0             
            Domestic [omitted]                           0
            Constant                 3974.333     13686.59
            ----------------------------------------------

            Comment


            • #7
              Bert, I was wondering if there is any way of accessing the label values of a variable, once you know the variable name and work from there. It shouldn't be that difficult. I guess I can check one estimation command that allows for factor variables, and check how they do it. This can actually be fun.

              Like your code!
              Alfonso Sanchez-Penalver

              Comment

              Working...
              X