Differences in results from fixed effects estimator and demeaned OLS

David Tsu

Join Date: Jan 2018

Posts: 57
#1

Differences in results from fixed effects estimator and demeaned OLS

01 Feb 2018, 10:26

I compared results from using

(1) xtset id year
xtreg var1 var2 var3, fe

and OLS with demeaned (by id) versions of the same variables

(2) reg var1_demean var2_demean var3_demean

My prior was that, the estimation results should be exactly the same. However, though fairly close, they are not identical. Why these two commands give different results?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

02 Feb 2018, 04:13

David:
as per FAQ, please also post via CODE delimiters what Stata gave you back. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#3

02 Feb 2018, 04:35

Originally posted by Carlo Lazzaro View Post

David:
as per FAQ, please also post via CODE delimiters what Stata gave you back. Thanks.

and provide the exact code you used for demeaning, possibly with a reproducible example.

Best
Daniel
1 like
Comment
David Tsu

Join Date: Jan 2018

Posts: 57
#4

02 Feb 2018, 07:10

Originally posted by Carlo Lazzaro View Post

David:
as per FAQ, please also post via CODE delimiters what Stata gave you back. Thanks.

Hi Carlo,

My code:
(1) FE estimate:

xtreg ml_w prof_w tang_w loga_w mk2bk_w, fe robust

(2) Demeaning:

by id: egen ml_mean = mean(ml_w)
by id: egen prof_mean = mean(prof_w)
by id: egen tang_mean = mean(tang_w)
by id: egen loga_mean = mean(loga_w)
by id: egen mk2bk_mean = mean(mk2bk_w)
gen ml_demean = ml_w - ml_mean
gen prof_demean = prof_w - prof_mean
gen tang_demean = tang_w - tang_mean
gen loga_demean = loga_w - loga_mean
gen mk2bk_demean = mk2bk_w - mk2bk_mean
reg ml_demean prof_demean tang_demean loga_demean mk2bk_demean, robust
Comment

daniel klein

Join Date: Mar 2014
Posts: 3850

02 Feb 2018, 07:22

Carlo actually asked for Stata output. My wild guess is that you have missing values and do not restrict the mean calulations and de-meaning on the estimation sample. Do something like this

Code:

// load example dataset
webuse nlswork , clear
xtset idcode

// (1) FE model
xtreg ln_wage age hours i.union , fe

// keep only complete cases
keep if e(sample)

// (2) de-mean based on same sample
foreach var of varlist ln_wage age hours union {
    bysort idcode : egen m_`var' = mean(`var')
    generate dm_`var' = `var'-m_`var'
}

// pooled OLS regression on de-meaned data
regress dm_ln_wage dm_age dm_hours dm_union

and see whether the results still differ.

Best
Daniel

Comment

David Tsu

Join Date: Jan 2018

Posts: 57
#6

02 Feb 2018, 07:44

Originally posted by daniel klein View Post

Carlo actually asked for Stata output. My wild guess is that you have missing values and do not restrict the mean calulations and de-meaning on the estimation sample. Do something like this

Code:

// load example dataset webuse nlswork , clear xtset idcode // (1) FE model xtreg ln_wage age hours i.union , fe // keep only complete cases keep if e(sample) // (2) de-mean based on same sample foreach var of varlist ln_wage age hours union { bysort idcode : egen m_`var' = mean(`var') generate dm_`var' = `var'-m_`var' } // pooled OLS regression on de-meaned data regress dm_ln_wage dm_age dm_hours dm_union

and see whether the results still differ.

Best
Daniel

Thanks! Here are the output from my original code:

. xtreg ml_w prof_w tang_w loga_w mk2bk_w, fe robust

Fixed-effects (within) regression Number of obs = 105,496
Group variable: id Number of groups = 14,054

R-sq: Obs per group:
within = 0.0657 min = 1
between = 0.1449 avg = 7.5
overall = 0.1011 max = 21

F(4,14053) = 393.59
corr(u_i, Xb) = -0.0323 Prob > F = 0.0000

(Std. Err. adjusted for 14,054 clusters in id)

Robust
ml_w Coef. Std. Err. t P>t [95% Conf. Interval]

prof_w -.0800309 .0030954 -25.85 0.000 -.0860983 -.0739635
tang_w .2025415 .0108245 18.71 0.000 .1813241 .223759
loga_w .0281374 .0016216 17.35 0.000 .0249589 .0313159
mk2bk_w -.008481 .0003351 -25.31 0.000 -.009138 -.0078241
_cons .0834798 .0077975 10.71 0.000 .0681957 .0987639

sigma_u .20757192
sigma_e .15878363
rho .63085133 (fraction of variance due to u_i)

reg ml_demean prof_demean tang_demean loga_demean mk2bk_demean, robust

Linear regression Number of obs = 105,496
F(4, 105491) = 1019.86
Prob > F = 0.0000
R-squared = 0.0659
Root MSE = .14848

Robust
ml_demean Coef. Std. Err. t P>t [95% Conf. Interval]

prof_demean -.0801377 .0021951 -36.51 0.000 -.0844401 -.0758353
tang_demean .20391 .0064021 31.85 0.000 .1913619 .216458
loga_demean .0282449 .0008499 33.23 0.000 .0265791 .0299107
mk2bk_demean -.0084683 .0002193 -38.62 0.000 -.0088981 -.0080386
_cons .0000729 .000457 0.16 0.873 -.0008229 .0009687

I tried using
// keep only complete cases keep if e(sample) after the first regression. However, the estimates are still not identical.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#7

02 Feb 2018, 07:55

Can you show us the results when you keep the sample constant? Please also include the output you get from egen (i.e. how many missing values are generated). Also, please enclose output and code in code delimters as in

[CODE]

here goes your code and output

[/CODE].

Best
Daniel
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#8

02 Feb 2018, 07:57

David:
to avoid formatting issues that make your post hard to read, you are kindly asdked to put what you typed and what you got from Stata within CODE delimiters (just click on the hash button under the Advanced editor and you're there). Thanks.
That said, I do not have an answer about the difference in your point estimates, but, as far as your standard errors are concerned, the option -robust- under -xtreg- produces standard errors that deal with heteroskedasticity and/or autocorrelation, whereas -robust- under -regress- accounts for heteroskedasticity only.
However, different standard errors options cannot explain the difference in your point estimates.
You may want to re-run Daniel's helpful code (that works perfectly) and compare it with yours.

PS: crossed in the cyberspace with Daniel's helpful reply.

Kind regards,
Carlo
(Stata 19.0)
Comment

David Tsu

Join Date: Jan 2018
Posts: 57

02 Feb 2018, 08:20

Originally posted by daniel klein View Post

Can you show us the results when you keep the sample constant? Please also include the output you get from egen (i.e. how many missing values are generated). Also, please enclose output and code in code delimters as in

[CODE]

here goes your code and output

[/CODE].

Best
Daniel

Sorry Daniel, I should use the CODE delimiters. Here are the full output:

Code:

xtset id fyear
       panel variable:  id (unbalanced)
        time variable:  fyear, 1985 to 2005, but with gaps
                delta:  1 unit

. xtreg ml_w prof_w tang_w loga_w mk2bk_w, fe robust

Fixed-effects (within) regression               Number of obs     =    105,496
Group variable: id                              Number of groups  =     14,054

R-sq:                                           Obs per group:
     within  = 0.0657                                         min =          1
     between = 0.1449                                         avg =        7.5
     overall = 0.1011                                         max =         21

                                                F(4,14053)        =     393.59
corr(u_i, Xb)  = -0.0323                        Prob > F          =     0.0000

                                (Std. Err. adjusted for 14,054 clusters in id)
------------------------------------------------------------------------------
             |               Robust
        ml_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      prof_w |  -.0800309   .0030954   -25.85   0.000    -.0860983   -.0739635
      tang_w |   .2025415   .0108245    18.71   0.000     .1813241     .223759
      loga_w |   .0281374   .0016216    17.35   0.000     .0249589    .0313159
     mk2bk_w |   -.008481   .0003351   -25.31   0.000     -.009138   -.0078241
       _cons |   .0834798   .0077975    10.71   0.000     .0681957    .0987639
-------------+----------------------------------------------------------------
     sigma_u |  .20757192
     sigma_e |  .15878363
         rho |  .63085133   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. 
end of do-file

. keep if e(sample)
(24,323 observations deleted)

. do "C:\Users\ADMINI~1\AppData\Local\Temp\STD00000000.tmp"

. 
. by id: egen ml_mean = mean(ml_w)

. by id: egen prof_mean = mean(prof_w)

. by id: egen tang_mean = mean(tang_w)

. by id: egen loga_mean = mean(loga_w)

. by id: egen mk2bk_mean = mean(mk2bk_w)

. gen ml_demean = ml_w - ml_mean

. gen prof_demean = prof_w - prof_mean

. gen tang_demean = tang_w - tang_mean

. gen loga_demean = loga_w - loga_mean

. gen mk2bk_demean = mk2bk_w - mk2bk_mean

. reg ml_demean prof_demean tang_demean loga_demean mk2bk_demean, robust

Linear regression                               Number of obs     =    105,496
                                                F(4, 105491)      =    1009.02
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0657
                                                Root MSE          =     .14783

------------------------------------------------------------------------------
             |               Robust
   ml_demean |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 prof_demean |  -.0800309   .0021964   -36.44   0.000    -.0843358    -.075726
 tang_demean |   .2025415   .0064234    31.53   0.000     .1899518    .2151313
 loga_demean |   .0281374   .0008561    32.87   0.000     .0264595    .0298153
mk2bk_demean |   -.008481   .0002197   -38.61   0.000    -.0089116   -.0080505
       _cons |  -1.01e-10   .0004551    -0.00   1.000    -.0008921    .0008921
------------------------------------------------------------------------------

. 
end of do-file

Comment

daniel klein

Join Date: Mar 2014

Posts: 3850
#10

02 Feb 2018, 08:27

I cannot identify any (unexpected) differences ...

Best
Daniel
Comment

Announcement