First differencing vs Fixed effects interpretation

Dana Baade

Join Date: Jul 2022

Posts: 23
#16

11 Jul 2022, 06:57

Thank you Andrew
Comment
Dana Baade

Join Date: Jul 2022

Posts: 23
#17

13 Jul 2022, 06:54

Thank you again Andrew. My advisor has not responded yet regarding the literature. The difference between FE and FD got me thinking: could the difference between both estimators on the same model be not only due to unit root perhaps, but also due to low within variation? The variables affected by the higher p value in the FD estimation can sometimes stay constant over time. I know that can lead to higher standerd errors when using FE and that an unbalanced panel can affect the results when using FD. But could it be, that because for example the value of the interaction term stays constant over certain years (because the binary and the continuous variable that are interacted might sometimes stay constant), that FD reports higher p values/ different coefficients? Or could that not be (one) of the reasons?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10089
#18

13 Jul 2022, 08:29

There are good discussions here and here on the differences, but with \(T =8\), my concern would be with the inefficiency of FD as losing a cross-section due to differencing represents a loss of 1/8= 12.5% of the data.
1 like
Comment
Dana Baade

Join Date: Jul 2022

Posts: 23
#19

13 Jul 2022, 09:26

Thank you Andrew. I read that thread before! I definitely share the concerns, but I was just wondering is there is some extra penalisation or something
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#20

13 Jul 2022, 09:29

I do not agree that FD is less efficient than FE (at least not for the reason given). With FD, 1 time period is lost due to differencing. With FE, effectively 1 time period is lost as well due to demeaning. For the latter, it just is not as obvious as for FD. To make the point, constrained FD-GLS (which accounts for the first-order serial correlation in the first-differenced errors, assuming the untransformed idiosyncratic errors are IID) is numerically equivalent to FE. Or, put differently, with FD you lose N degrees of freedom due to the lost time period; with FE you also lose N degrees of freedom due to the estimation of the fixed effects.

Last edited by Sebastian Kripfganz; 13 Jul 2022, 09:33.

https://www.kripfganz.de/stata/
Comment
Dana Baade

Join Date: Jul 2022

Posts: 23
#21

13 Jul 2022, 09:47

Thank you Andrew. I read that thread before! I definitely share the concerns, but I was just wondering is there is some extra penalisation or something
Comment
Dana Baade

Join Date: Jul 2022

Posts: 23
#22

13 Jul 2022, 10:05

Thank you Sebastian. That is something that is indeed not obvious at all when looking at the output. When I have serial correlation or unit root, but low within variation for my affected variables, the output of FD is then preferred? Why do we then cluster with xtreg in the first place? There are many posts even on the forum that say just go cluster.
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10089
#23

13 Jul 2022, 10:50

Originally posted by Sebastian Kripfganz View Post

I do not agree that FD is less efficient than FE (at least not for the reason given). With FD, 1 time period is lost due to differencing. With FE, effectively 1 time period is lost as well due to demeaning. For the latter, it just is not as obvious as for FD. To make the point, constrained FD-GLS (which accounts for the first-order serial correlation in the first-differenced errors, assuming the untransformed idiosyncratic errors are IID) is numerically equivalent to FE. Or, put differently, with FD you lose N degrees of freedom due to the lost time period; with FE you also lose N degrees of freedom due to the estimation of the fixed effects.

Agreed, if you can implement FD-GLS, you obtain efficiency. But that is not what we have here using regress.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#24

13 Jul 2022, 13:43

Correct. However, my main statement still holds. There is the widespread belief (evidenced by the two links you provided in your previous post) that FD is less efficient than FE because the former loses 1 observation per group. This is incorrect, because FE effectively also loses 1 observation per group by estimating the group means.

https://www.kripfganz.de/stata/
1 like
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10089

#25

13 Jul 2022, 15:34

Sebastian Kripfganz, this is a badly rigged example showing practically how inefficient the FD estimator can be. In severely unbalanced panels, you can lose all the data. The within-estimator will still stay alive as you can still calculate a mean with gaps. In general, the loss of a single cross-section is a best case scenario for FD.

Code:

webuse grunfeld, clear
keep if inlist(time, 1, 3, 7, 10, 13, 17, 20)
xtset company year
xtreg invest mvalue kstock, fe
regress D.(invest mvalue kstock), nocons robust

Res.:

Code:

. xtreg invest mvalue kstock, fe

Fixed-effects (within) regression               Number of obs     =         70
Group variable: company                         Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.7854                                         min =          7
     between = 0.8007                                         avg =        7.0
     overall = 0.7906                                         max =          7

                                                F(2,58)           =     106.12
corr(u_i, Xb)  = 0.2607                         Prob > F          =     0.0000

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .0707405   .0226918     3.12   0.003      .025318     .116163
      kstock |   .3453041   .0305301    11.31   0.000     .2841915    .4064168
       _cons |  -19.92252   23.83114    -0.84   0.407    -67.62573     27.7807
-------------+----------------------------------------------------------------
     sigma_u |     99.688
     sigma_e |  62.777348
         rho |  .71603987   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(9, 58) = 12.51                      Prob > F = 0.0000

. regress D.(invest mvalue kstock), nocons robust
no observations
r(2000);

.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#26

14 Jul 2022, 00:52

Yes, with unbalanced data the FD estimator loses a lot more observations. I should have said earlier that my statement referred to the balanced case.

https://www.kripfganz.de/stata/
1 like
Comment
Dana Baade

Join Date: Jul 2022

Posts: 23
#27

14 Jul 2022, 12:01

Andrew. Coming back to one of my precious posts: my supervisor indeed meant a different model. I misinterpreted. Thank you everyone for the help and I hope at least some others find this topic useful in the future.

Dana
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment