Pooled OLS v.s. Correlated random effects for time-invariant variables

Jiaman Xu

Join Date: Aug 2024

Posts: 13
#1

Pooled OLS v.s. Correlated random effects for time-invariant variables

23 Nov 2024, 06:20

In my problem, I have a panel dataset and the variable of interest is time-invariant. The dependent variable and control variables are time-varying. I'm considering using pooled OLS or correlated random effects (CRE) as the baseline regression model. My question is i) Is the CRE estimator always less biased than the OLS estimator for the time-invariant variable and ii) if not, is there a way to test which estimation model I should use, similar to the Hausman test for FE and RE models? Thank you.
Tags: panel data, random effects, regression
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

23 Nov 2024, 08:14

Jiaman:
why not considering -xtreg,re- instead?

Kind regards,
Carlo
(Stata 19.0)
Comment
Jiaman Xu

Join Date: Aug 2024

Posts: 13
#3

23 Nov 2024, 11:14

Originally posted by Carlo Lazzaro View Post

Jiaman:
why not considering -xtreg,re- instead?

Carlo, thank you so much for your response. I have a few follow-up questions.

Are you suggesting that the RE model is the most suitable for my case? If so, could you explain why?
My understanding was that CRE is generally superior to RE for estimating time-varying variables when those variables are correlated with unobserved heterogeneity. So I’ve chosen CRE over RE to reduce bias in the control variable estimates, but this isn’t my main concern.
What I don't understand is: Is CRE (or RE) definitively better than pooled OLS for estimating time-invariant variables? if not, is there a way to choose between the different models?
Thank you!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#4

24 Nov 2024, 02:26

Jiaman:
1) I guessed that RE was better than FE in your case, as you're interested in obtainng the cefficient of a time-invariant variable (your predictor of interest, if I got your first post right);
2) https://blog.stata.com/2015/10/29/fi...dlak-approach/ can be useful;
3) I do not consider OLS my first choice tool when it comes to panel data regression.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#5

24 Nov 2024, 08:57

If you use the CRE approach in the balanced panel data case -- that is, you include the time averages of all time-varying explanatory variables -- then using pooled OLS and RE are exactly the same. I show this in my 2019 Journal of Econometrics paper. Below are the commands that produce the same estimates. The xj variables are time varying, zj are not:

Code:

xtset id year egen x1bar = mean(x1), by(id) ... egen xKbar = mean(xK), by(id) reg y x1 ... xK x1bar ... xKbar z1 ... zJ i.year, vce(cluster id) xtreg y x1 ... xK x1bar ... xKbar z1 ... zJ i.year, re vce(cluster id)

If by pooled OLS you mean omitting the xjbar variables then there will be a difference in biases. But it's tricky. If the time-constant variable z is correlated with the unobserved effect, adding the xbarj variables can mitigate the bias. But, typically, the xjbar are correlated with z, too. I think it can go either way.

There's a slight modification that works in the unbalanced case.
1 like
Comment
Jiaman Xu

Join Date: Aug 2024

Posts: 13
#6

25 Nov 2024, 05:11

Originally posted by Jeff Wooldridge View Post

If by pooled OLS you mean omitting the xjbar variables then there will be a difference in biases. But it's tricky. If the time-constant variable z is correlated with the unobserved effect, adding the xbarj variables can mitigate the bias. But, typically, the xjbar are correlated with z, too. I think it can go either way.

There's a slight modification that works in the unbalanced case.

This is extremely helpful, thank you so much!
I do have a few follow-up questions if you don't mind.

In my original post, pooled OSL means omitting the xjbar variables.
I work with an unbalanced panel, so year dummies are treated as time-varying variables in CRE, as you emphasized in the 2019 JoE paper.

The reason I'm not sure whether to use CRE in my case is that the time-invariant variable, zj, is the main variable of interest. So I have to take its interpretation seriously.

You have used the airfare example in your slides introducing the CRE model https://conference.iza.org/conferenc...linear_iza.pdf
The time-invariant 'distance_i' variable in the airfare example is a control variable and on p.27 it was suggested that "we must use caution in interpreting the estimated coefficient on zi".
So my understanding is adding 'distance_i' in CRE improves the time-variant estimate of the variable of interest (concen_it). But the estimate of the time-invariant coefficient is not reliable and shouldn't be taken seriously. Is this right?
And why should we be cautious when interpreting the coefficient on zi? Is it because typically the xjbars are correlated with zi?

Thank you!
Comment

Announcement

Pooled OLS v.s. Correlated random effects for time-invariant variables

Comment

Comment

Comment

Comment

Comment