Preventing -estout- from overwriting e(labels) in successive runs

Bert Jung

Join Date: Apr 2014

Posts: 16
#1

Preventing -estout- from overwriting e(labels) in successive runs

16 Apr 2014, 13:35

Dear Statalisters,

I am hoping to get -estout- (written by Ben Jann, link below) to assign labels to factor variables. This has been discussed previously (links below) but apparently not yet been resolved.

My approach is to place the labels into a local and attach these to the estimation set as e(labels) using -estadd-. In the process of trying this I noticed that sometimes -estout- appears to overwrite labels stored in e(labels), which can lead to labels being dropped. I am hoping that someone might know how to prevent this.

I paste below example code with commentary as well as the associated output.

EDIT: I am using Stata MP 13.1.

Thanks for your thoughts, and apologies for mistakenly cross-posting with the old listserv.
Bert

** Discussion on labels for factor variables in -estout-, with
solutions summarized by Bert Lloyd

http://www.stata.com/statalist/archi.../msg00139.html

** Ben Jann's fabulous -estout-

http://repec.org/bocode/e/estout/index.html

*********** Example code **************

Code:

which estout sysuse auto, clear eststo clear eststo set1: estpost tabstat price in 1/15, by(make) * To get the labels, use: varlabels(`e(labels)'). This works nicely. estout set1, cells(mean) estout set1, cells(mean) varlabels(`e(labels)') * Note that e(labels) as assigned to this estimation set contains the relevant labels. ereturn list di `"`e(labels)'"' * The problem appears when preparing another estimation set. * This affects the labeling by -estout- IF the levels of the additional set < original set. * Example: * set2 has as many levels/labels => no problem eststo set2: estpost tabstat price in 21/35, by(make) estout set1, cells(mean) varlabels(`e(labels)') * set3 has as FEWER levels/labels => -estout- omits some labels eststo set3: estpost tabstat price in 21/23, by(make) estout set1, cells(mean) varlabels(`e(labels)') * It seems that -estout- overwrites the e(labels) that were attached to est1. It now only contains 3 entries. di `"`e(labels)'"'

********* End of example code ********

*********** Output *************

. which estout
c:\ado\plus\e\estout.ado
*! version 3.13 06aug2009 Ben Jann

.
. sysuse auto, clear
(1978 Automobile Data)

. eststo clear

.
. eststo set1: estpost tabstat price in 1/15, by(make)

Summary statistics: mean
for variables: price
by categories of: make

make | e(mean)
-------------+-----------
1 | 4099
2 | 4749
3 | 3799
4 | 4816
5 | 7827
6 | 5788
7 | 4453
8 | 5189
9 | 10372
10 | 4082
11 | 11385
12 | 14500
13 | 15906
14 | 3299
15 | 5705
-------------+-----------
Total | 7064.6

category labels saved in macro e(labels)

.
. * To get the labels, use: varlabels(`e(labels)'). This works nicely.
. estout set1, cells(mean)

-------------------------
set1
mean
-------------------------
1 4099
2 4749
3 3799
4 4816
5 7827
6 5788
7 4453
8 5189
9 10372
10 4082
11 11385
12 14500
13 15906
14 3299
15 5705
Total 7064.6
-------------------------

. estout set1, cells(mean) varlabels(`e(labels)')

-------------------------
set1
mean
-------------------------
AMC Concord 4099
AMC Pacer 4749
AMC Spirit 3799
Buick Cent~y 4816
Buick Elec~a 7827
Buick LeSa~e 5788
Buick Opel 4453
Buick Regal 5189
Buick Rivi~a 10372
Buick Skyl~k 4082
Cad. Deville 11385
Cad. Eldor~o 14500
Cad. Seville 15906
Chev. Chev~e 3299
Chev. Impala 5705
Total 7064.6
-------------------------

.
. * Note that e(labels) as assigned to this estimation set contains
the relevant labels.
. ereturn list

scalars:
e(N) = 15

macros:
e(_estimates_name) : "set1"
e(cmd) : "estpost"
e(subcmd) : "tabstat"
e(stats) : "mean"
e(vars) : "price"
e(byvar) : "make"
e(labels) : "1 `"AMC Concord"' 2 `"AMC Pacer"' 3 `"AMC
Spirit"' 4 `"Buick Century"' 5 `"Buick Electra"' 6 `"Buick LeSabre"' 7
`"Buick Opel"' 8 `"Buick.."

matrices:
e(mean) : 1 x 16

. di `"`e(labels)'"'
1 `"AMC Concord"' 2 `"AMC Pacer"' 3 `"AMC Spirit"' 4 `"Buick Century"'
5 `"Buick Electra"' 6 `"Buick LeSabre"' 7 `"Buick Opel"' 8 `"Buick
Regal"' 9 `"Buick Riviera"' 1
> 0 `"Buick Skylark"' 11 `"Cad. Deville"' 12 `"Cad. Eldorado"' 13 `"Cad. Seville"' 14 `"Chev. Chevette"' 15 `"Chev. Impala"'

.
.
. * The problem appears when preparing another estimation set.
. * This affects the labeling by -estout- IF the levels of the
additional set < original set.
.
. * Example:
.
. * set2 has as many levels/labels => no problem
. eststo set2: estpost tabstat price in 21/35, by(make)

Summary statistics: mean
for variables: price
by categories of: make

make | e(mean)
-------------+-----------
1 | 4010
2 | 5886
3 | 6342
4 | 4389
5 | 4187
6 | 11497
7 | 13594
8 | 13466
9 | 3829
10 | 5379
11 | 6165
12 | 4516
13 | 6303
14 | 3291
15 | 8814
-------------+-----------
Total | 6777.867

category labels saved in macro e(labels)

. estout set1, cells(mean) varlabels(`e(labels)')

-------------------------
set1
mean
-------------------------
Dodge Dipl~t 4099
Dodge Magnum 4749
Dodge.. Re~s 3799
Ford Fiesta 4816
Ford Mustang 7827
Linc. Cont~l 5788
Linc. Mark V 4453
Linc. Vers~s 5189
Merc. Bobcat 10372
Merc. Cougar 4082
Merc. Marq~s 11385
Merc. Mona~h 14500
Merc. XR-7 15906
Merc. Zephyr 3299
Olds 98 5705
Total 7064.6
-------------------------

.
. * set3 has as FEWER levels/labels => -estout- omits some labels
. eststo set3: estpost tabstat price in 21/23, by(make)

Summary statistics: mean
for variables: price
by categories of: make

make | e(mean)
-------------+-----------
1 | 4010
2 | 5886
3 | 6342
-------------+-----------
Total | 5412.667

category labels saved in macro e(labels)

. estout set1, cells(mean) varlabels(`e(labels)')

-------------------------
set1
mean
-------------------------
Dodge Dipl~t 4099
Dodge Magnum 4749
Dodge.. Re~s 3799
4 4816
5 7827
6 5788
7 4453
8 5189
9 10372
10 4082
11 11385
12 14500
13 15906
14 3299
15 5705
Total 7064.6
-------------------------

.
.
. * It seems that -estout- overwrites the e(labels) that were attached
to est1. It now only contains 3 entries.
. di `"`e(labels)'"'
1 `"Dodge Diplomat"' 2 `"Dodge Magnum"' 3 `"Dodge St. Regis"'

******************************

Last edited by Bert Jung; 16 Apr 2014, 13:38.
Tags: estout, factor variables
Nick Cox

Join Date: Mar 2014

Posts: 35210
#2

16 Apr 2014, 13:38

Also posted in old Statalist. Choose one or t'other please! (Preferably this forum.)
Comment

Alfonso Sánchez-Peñalver

Join Date: Mar 2014
Posts: 432

16 Apr 2014, 15:28

Hi,

Stata is not re-writing anything. The problem is that you are using the values for e(labels) in the estimates that are in memory, and those are the ones for set3. Consider the following:

Code:

set more off
sysuse auto, clear
quiet eststo set1: estpost tabstat price in 1/15, by(make)
estout set1, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')
quiet eststo set2: estpost tabstat price in 21/35, by(make)
estout set2, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')
quiet eststo set3: estpost tabstat price in 21/23, by(make)
estout set3, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')

* This will use the first three labels of set3 because they are the ones
* currently stored in e(labels)
estout set1, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')

* This will put set1 in memory and thus use the appropriate labels
est restore set1
estout set1, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)')

So the problem is that you don't have set1 any longer in memory when you are calling e(labels) in the estout command with set1.

Last edited by Alfonso Sánchez-Peñalver; 16 Apr 2014, 15:32. Reason: Formatting.

Alfonso Sanchez-Penalver

Comment

Bert Jung

Join Date: Apr 2014

Posts: 16
#4

16 Apr 2014, 16:32

Thanks Alfonso,

That is helpful to refine my question: -estout- successfully stores and recalls other components of the estimation output, such as e(mean). In the below I output all three sets at the same time, and the means are correct. Those items can be recalled even after additional/different -eststo- are done. What can I do to make e(labels) similarly persistent?

I should add that the problem of the e(labels) also occurs with regressions, not just -tabstat-. In that case your -est restore- allows to show e(labels) for one estimation set but not any of the others that might be included as columns in the -estout- table. So ultimately I suspect we need to make e(labels) permanent just like the other components.

Hope that makes sense, and thanks again,
Bert

PS: Incidentally it seems that one needs to be careful when combinining -estpost tabstat- results. The below example correctly shows the means for the three estimation sets. But the labels are clearly wrong. My guess is that occurs because the column names in -tabstat- e(mean) and hence -estout- r(coefs) are just a numbered sequence - which has a different meaning for each of the -tabstat- commands. Maybe that also messes up the labels, since the values 1, 2, ... have different labels in each estimation set.

Code:

set more off sysuse auto, clear eststo set1: estpost tabstat price in 1/15, by(make) quiet eststo set2: estpost tabstat price in 21/35, by(make) eststo set3: estpost tabstat price in 21/23, by(make) * The means are correct, but the labels are only based on set3 estout set1 set2 set3, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)') * Fyi - the tabstat matrices e(mean) use columnnames 1, 2... n. Maybe that trips up estout when preparing the joint matrix r(coefs)? qui eststo set3: estpost tabstat price in 21/23, by(make) mat list e(mean) qui estout set1 set2 set3, cells(mean(fmt(%10.0fc))) varlabels(`e(labels)') mat list r(coefs)
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#5

17 Apr 2014, 07:26

Again, you are only using the e(labels) that is in memory when using `e(labels)' so don't be surprised it only uses the ones for set3, because those are the estimates loaded in memory. Having said that, it seems that estout doesn't handle label values well, which is how we label categories in Stata. Consider the following

Code:

sysuse lifeexp.dta, clear reg popgrowth i.region gnppc est store reg1 reg popgrowth i.region gnppc lexp est store reg2 estout reg1 reg2, cells(b(star fmt(%10.2fc)) se(par fmt(%10.2fc))) label legend

Notice how in both regressions, since we're using factor variable notation (i.) you have the value labels (N.A. and S.A.) describing the variables. To use the variables' labels in the estout table we have the label option. So you can see that for the non-categorical variables it captures the variable label very well, but for the categorical variable it simply uses the variable label with the category prefix, not the value labels. Of course you can go around that by defining the labels in the estout statement like

Code:

estout reg1 reg2, cells(b(star fmt(%10.2fc)) se(par fmt(%10.2fc))) label legend /// varlabel(1b.region "Eur & C. Asia" 2.region "N.A" 3.region "S.A")

but it kind of beats the purpose. Having said that the way you're using estout I'm not sure this would help either. The reason is that it is storing the observation number in the reduced set as the name, so you can see when it presents the table it has continuous numbers from 1 to whatever the last observation number is in each estimation. The label option does not work then, because there is no variable associated with the estimate to pull the label from.

Sorry I can't help any more.

Alfonso.

Alfonso Sanchez-Penalver
Comment

Bert Jung

Join Date: Apr 2014
Posts: 16

18 Apr 2014, 09:05

Thanks for your help Alfonso.

Below my current solution in case it is useful for anyone else. Since I was unable to firmly attach the e(labels) to the estimation output, I am using a workaround that stores the labels in a macro and uses those in -estout-'s varlablels( ). It's a little clumsy but it works and will save me time from manually changing the labels for the factor variables.

Cheers Bert

Code:

* The program puts levels of factor variables into sreturn that can be placed into local and used in -estout-'s varlabels( )
This example works for basic levels, as well as baselevels and omitted levels

cap program drop estoutfvlabels
program estoutfvlabels, sclass

    foreach i in `: colnames e(b)'  {

        * Focus on factor variables in e(b), which always have three parts: number.varname
        
        * Add label to levels that were actually used
        if regexm("`i'", "([0-9]+)\.([a-zA-Z]+)")==1    {
        
            local level = substr("`i'", 1, strpos("`i'", ".")-1 )
            local vname = substr("`i'", strpos("`i'", ".")+1, .                    )
            
            local vlab: label( `vname' ) `level'
            local tmp  `tmp'  `i' `"`vlab'"'        
        }
        
        
        * Add label+suffix for baselevels (#b.varname) and omitted levels (#o.varname)
        if regexm("`i'", "([0-9]+[bo])\.([a-zA-Z]+)")==1    {
            
            * Because the convention is "b." or "o." have to substract 1 additional character to get to level
            local level = substr("`i'", 1, strpos("`i'", ".")-2 )
            local vname = substr("`i'", strpos("`i'", ".")+1, .                    )
            
            local vlab: label( `vname' ) `level'
            
            * Baselevel #b.varname
            if regexm("`i'", "([0-9]+[b])\.([a-zA-Z]+)")==1    {
                local tmp  `tmp'  `i' `"`vlab' [baselevel]"'        
            }        
            
            * Omitted #o.varname
            if regexm("`i'", "([0-9]+[o])\.([a-zA-Z]+)")==1    {
                local tmp  `tmp'  `i' `"`vlab' [omitted]"'        
            }
        }
    }

    sreturn local fvlabels = `"`tmp'"'
    
end


sysuse auto, clear
eststo clear

encode make, gen( m ) 

* Run the regression
* Immediately call the above programput the results (the labels) into a local for later use
qui eststo set1: reg price mpg i.m in 1/10

estoutfvlabels
local lab1 = `"`s(fvlabels)'"'

* For a second regression, make sure to use a differently named local
qui eststo set2: reg price mpg i.foreign in 1/10

estoutfvlabels
local lab2 = `"`s(fvlabels)'"'


* Place the locals with the labels into varlabels( )
estout set1 set2, varlabels( `lab1' `lab2' _cons "Constant" ) label

Output

Code:

----------------------------------------------
                             set1         set2
                                b            b
----------------------------------------------
Mileage (mpg)            5.666667    -418.9326
AMC Concord [basel~]            0             
AMC Pacer                678.3333             
AMC Spirit                   -300             
Buick Century            728.3333             
Buick Electra            3767.667             
Buick LeSabre            1711.667             
Buick Opel               331.3333             
Buick Regal              1101.333             
Buick Riviera                6307             
Buick Skylark [omi~]            0             
Domestic [omitted]                           0
Constant                 3974.333     13686.59
----------------------------------------------

Comment

Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#7

18 Apr 2014, 12:29

Bert, I was wondering if there is any way of accessing the label values of a variable, once you know the variable name and work from there. It shouldn't be that difficult. I guess I can check one estimation command that allows for factor variables, and check how they do it. This can actually be fun.

Like your code!

Alfonso Sanchez-Penalver
Comment

Announcement