Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differences in results from fixed effects estimator and demeaned OLS

    I compared results from using

    (1) xtset id year
    xtreg var1 var2 var3, fe

    and OLS with demeaned (by id) versions of the same variables

    (2) reg var1_demean var2_demean var3_demean

    My prior was that, the estimation results should be exactly the same. However, though fairly close, they are not identical. Why these two commands give different results?

  • #2
    David:
    as per FAQ, please also post via CODE delimiters what Stata gave you back. Thanks.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      David:
      as per FAQ, please also post via CODE delimiters what Stata gave you back. Thanks.
      and provide the exact code you used for demeaning, possibly with a reproducible example.

      Best
      Daniel

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        David:
        as per FAQ, please also post via CODE delimiters what Stata gave you back. Thanks.
        Hi Carlo,

        My code:
        (1) FE estimate:

        xtreg ml_w prof_w tang_w loga_w mk2bk_w, fe robust

        (2) Demeaning:

        by id: egen ml_mean = mean(ml_w)
        by id: egen prof_mean = mean(prof_w)
        by id: egen tang_mean = mean(tang_w)
        by id: egen loga_mean = mean(loga_w)
        by id: egen mk2bk_mean = mean(mk2bk_w)
        gen ml_demean = ml_w - ml_mean
        gen prof_demean = prof_w - prof_mean
        gen tang_demean = tang_w - tang_mean
        gen loga_demean = loga_w - loga_mean
        gen mk2bk_demean = mk2bk_w - mk2bk_mean
        reg ml_demean prof_demean tang_demean loga_demean mk2bk_demean, robust

        Comment


        • #5
          Carlo actually asked for Stata output. My wild guess is that you have missing values and do not restrict the mean calulations and de-meaning on the estimation sample. Do something like this

          Code:
          // load example dataset
          webuse nlswork , clear
          xtset idcode
          
          // (1) FE model
          xtreg ln_wage age hours i.union , fe
          
          // keep only complete cases
          keep if e(sample)
          
          // (2) de-mean based on same sample
          foreach var of varlist ln_wage age hours union {
              bysort idcode : egen m_`var' = mean(`var')
              generate dm_`var' = `var'-m_`var'
          }
          
          // pooled OLS regression on de-meaned data
          regress dm_ln_wage dm_age dm_hours dm_union
          and see whether the results still differ.

          Best
          Daniel

          Comment


          • #6
            Originally posted by daniel klein View Post
            Carlo actually asked for Stata output. My wild guess is that you have missing values and do not restrict the mean calulations and de-meaning on the estimation sample. Do something like this

            Code:
            // load example dataset
            webuse nlswork , clear
            xtset idcode
            
            // (1) FE model
            xtreg ln_wage age hours i.union , fe
            
            // keep only complete cases
            keep if e(sample)
            
            // (2) de-mean based on same sample
            foreach var of varlist ln_wage age hours union {
            bysort idcode : egen m_`var' = mean(`var')
            generate dm_`var' = `var'-m_`var'
            }
            
            // pooled OLS regression on de-meaned data
            regress dm_ln_wage dm_age dm_hours dm_union
            and see whether the results still differ.

            Best
            Daniel
            Thanks! Here are the output from my original code:

            . xtreg ml_w prof_w tang_w loga_w mk2bk_w, fe robust

            Fixed-effects (within) regression Number of obs = 105,496
            Group variable: id Number of groups = 14,054

            R-sq: Obs per group:
            within = 0.0657 min = 1
            between = 0.1449 avg = 7.5
            overall = 0.1011 max = 21

            F(4,14053) = 393.59
            corr(u_i, Xb) = -0.0323 Prob > F = 0.0000

            (Std. Err. adjusted for 14,054 clusters in id)

            Robust
            ml_w Coef. Std. Err. t P>t [95% Conf. Interval]

            prof_w -.0800309 .0030954 -25.85 0.000 -.0860983 -.0739635
            tang_w .2025415 .0108245 18.71 0.000 .1813241 .223759
            loga_w .0281374 .0016216 17.35 0.000 .0249589 .0313159
            mk2bk_w -.008481 .0003351 -25.31 0.000 -.009138 -.0078241
            _cons .0834798 .0077975 10.71 0.000 .0681957 .0987639

            sigma_u .20757192
            sigma_e .15878363
            rho .63085133 (fraction of variance due to u_i)


            reg ml_demean prof_demean tang_demean loga_demean mk2bk_demean, robust

            Linear regression Number of obs = 105,496
            F(4, 105491) = 1019.86
            Prob > F = 0.0000
            R-squared = 0.0659
            Root MSE = .14848


            Robust
            ml_demean Coef. Std. Err. t P>t [95% Conf. Interval]

            prof_demean -.0801377 .0021951 -36.51 0.000 -.0844401 -.0758353
            tang_demean .20391 .0064021 31.85 0.000 .1913619 .216458
            loga_demean .0282449 .0008499 33.23 0.000 .0265791 .0299107
            mk2bk_demean -.0084683 .0002193 -38.62 0.000 -.0088981 -.0080386
            _cons .0000729 .000457 0.16 0.873 -.0008229 .0009687



            I tried using
            // keep only complete cases keep if e(sample) after the first regression. However, the estimates are still not identical.

            Comment


            • #7
              Can you show us the results when you keep the sample constant? Please also include the output you get from egen (i.e. how many missing values are generated). Also, please enclose output and code in code delimters as in

              [CODE]

              here goes your code and output

              [/CODE].

              Best
              Daniel

              Comment


              • #8
                David:
                to avoid formatting issues that make your post hard to read, you are kindly asdked to put what you typed and what you got from Stata within CODE delimiters (just click on the hash button under the Advanced editor and you're there). Thanks.
                That said, I do not have an answer about the difference in your point estimates, but, as far as your standard errors are concerned, the option -robust- under -xtreg- produces standard errors that deal with heteroskedasticity and/or autocorrelation, whereas -robust- under -regress- accounts for heteroskedasticity only.
                However, different standard errors options cannot explain the difference in your point estimates.
                You may want to re-run Daniel's helpful code (that works perfectly) and compare it with yours.

                PS: crossed in the cyberspace with Daniel's helpful reply.
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  Originally posted by daniel klein View Post
                  Can you show us the results when you keep the sample constant? Please also include the output you get from egen (i.e. how many missing values are generated). Also, please enclose output and code in code delimters as in

                  [CODE]

                  here goes your code and output

                  [/CODE].

                  Best
                  Daniel
                  Sorry Daniel, I should use the CODE delimiters. Here are the full output:

                  Code:
                  xtset id fyear
                         panel variable:  id (unbalanced)
                          time variable:  fyear, 1985 to 2005, but with gaps
                                  delta:  1 unit
                  
                  . xtreg ml_w prof_w tang_w loga_w mk2bk_w, fe robust
                  
                  Fixed-effects (within) regression               Number of obs     =    105,496
                  Group variable: id                              Number of groups  =     14,054
                  
                  R-sq:                                           Obs per group:
                       within  = 0.0657                                         min =          1
                       between = 0.1449                                         avg =        7.5
                       overall = 0.1011                                         max =         21
                  
                                                                  F(4,14053)        =     393.59
                  corr(u_i, Xb)  = -0.0323                        Prob > F          =     0.0000
                  
                                                  (Std. Err. adjusted for 14,054 clusters in id)
                  ------------------------------------------------------------------------------
                               |               Robust
                          ml_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                        prof_w |  -.0800309   .0030954   -25.85   0.000    -.0860983   -.0739635
                        tang_w |   .2025415   .0108245    18.71   0.000     .1813241     .223759
                        loga_w |   .0281374   .0016216    17.35   0.000     .0249589    .0313159
                       mk2bk_w |   -.008481   .0003351   -25.31   0.000     -.009138   -.0078241
                         _cons |   .0834798   .0077975    10.71   0.000     .0681957    .0987639
                  -------------+----------------------------------------------------------------
                       sigma_u |  .20757192
                       sigma_e |  .15878363
                           rho |  .63085133   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------
                  
                  . 
                  end of do-file
                  
                  . keep if e(sample)
                  (24,323 observations deleted)
                  
                  . do "C:\Users\ADMINI~1\AppData\Local\Temp\STD00000000.tmp"
                  
                  . 
                  . by id: egen ml_mean = mean(ml_w)
                  
                  . by id: egen prof_mean = mean(prof_w)
                  
                  . by id: egen tang_mean = mean(tang_w)
                  
                  . by id: egen loga_mean = mean(loga_w)
                  
                  . by id: egen mk2bk_mean = mean(mk2bk_w)
                  
                  . gen ml_demean = ml_w - ml_mean
                  
                  . gen prof_demean = prof_w - prof_mean
                  
                  . gen tang_demean = tang_w - tang_mean
                  
                  . gen loga_demean = loga_w - loga_mean
                  
                  . gen mk2bk_demean = mk2bk_w - mk2bk_mean
                  
                  . reg ml_demean prof_demean tang_demean loga_demean mk2bk_demean, robust
                  
                  Linear regression                               Number of obs     =    105,496
                                                                  F(4, 105491)      =    1009.02
                                                                  Prob > F          =     0.0000
                                                                  R-squared         =     0.0657
                                                                  Root MSE          =     .14783
                  
                  ------------------------------------------------------------------------------
                               |               Robust
                     ml_demean |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                   prof_demean |  -.0800309   .0021964   -36.44   0.000    -.0843358    -.075726
                   tang_demean |   .2025415   .0064234    31.53   0.000     .1899518    .2151313
                   loga_demean |   .0281374   .0008561    32.87   0.000     .0264595    .0298153
                  mk2bk_demean |   -.008481   .0002197   -38.61   0.000    -.0089116   -.0080505
                         _cons |  -1.01e-10   .0004551    -0.00   1.000    -.0008921    .0008921
                  ------------------------------------------------------------------------------
                  
                  . 
                  end of do-file

                  Comment


                  • #10
                    I cannot identify any (unexpected) differences ...

                    Best
                    Daniel

                    Comment

                    Working...
                    X