Fixed effects panel data model and OLS assumptions

Sabrina Muller

Join Date: Jan 2019

Posts: 21
#1

Fixed effects panel data model and OLS assumptions

05 Feb 2019, 15:49

Hello everybody,

I am running a fixed effects model panel analysis for my masters thesis. My supervisor told me to also discuss Gauß Markov theorem and general OLS assumptions in my thesis, run OLS first, discuss tests and the switch to panel data model. So what I'm looking at are especially the following assumptions:

(1) E(ut) = 0

(2) var(ut) = σ2 < ∞

(3) cov(ui, u j) = 0

(4) cov(ut, xt) = 0

(5) ut ∼ N(0, σ2)

1. Question: For me that means as much as: no autocorrelation, no heterogeneity, normality of residuals. Did I miss anything? Also I guess you always have to test for omitted variables and multicollinearity, right?

So I start with running some tests after using regress at first:

estat imtest
estat ovtest
xtserial
estat vif
sktest

2. Question: Which other tests should I use on the initial OLS regression? Also, I include i.years in the fixed effects model to control for time variant effects, do I test the initial regress models then with i.years already?

Also, after I switch to panel data I use the command:

xtreg depvar indepvars, fe vce(cluster ID)

3. Question: As far as I understand using the fe command with vce (cluster ID) already corrects for heteroscedasticity and autocorrelation, but what about the other assumptions? Do they still have to be met?
Also I understood that as long as I have a constant in my regression, assumption 1 is always met, but the fixed effects transformation drops my constant, no?
And what about normality of residuals, is there a test I can use after xtreg?

4. Question: I have a lot of different models I have to test and chose from and interaction terms to add after initially testing without etc. etc. Do I actually have to perform all those tests on aaaall my models separately?

5. Question: What can I actually do if in the end I still have non normality of residuals and omitted variables? I guess not much?

Thank you so much for your help!! I've been on this for days and googled so much but most tutorials I see are only on normal OLS, and often only on one independent variable so you can use plots etc...

Best,

Sabrina

Last edited by Sabrina Muller; 05 Feb 2019, 15:54.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

06 Feb 2019, 11:50

You didn't get a quick answer. You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Being able to replicate your problem makes it easier for us to help you.

This kind of long list of questions is unlikely to get a response. If you don't know and can't find the answers to these questions (which are statistics not Stata per se), then ask your adviser.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

07 Feb 2019, 05:44

Sabrina:
Phi is right; you can't hope to get a full reply to all you queries.
Let's pick some of them up:
2) and 3) -fe- and (pooled) OLS estimators work under different assumptions (say, the weak endogeneity concerning the correlation of ui terms of error allowed by -fe-, which is a violation of OLS assumptions). Conversely, it's right that cluster (or robust) standard errors acounts for hetetroskedasticity and/or autocorrelation under -xtreg- (but not under -regress-).
I do not follow you about your constant statement; moreover, -fe- estimator wipes out time-invariant predictors, not the constant of the regression model.
You can plot the residuals after -xtreg. ans see their behaviour.
4) you should perform the model that gives the truest and fairest view of the data generating process according to the literature in your research field.
5) if you invoke robust or clustered standard errors under -xtreg-, you should not worry about residual heteroskedasticity.
Conversely, omitted variable bias can conceal non-linearity of therelationship between predictor and regressand or a more severe endogeneity.
You can test for misspecification via a Pregibon test:

Code:

. use "http://www.stata-press.com/data/r15/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage age, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                F(1,23799)        =    2720.20
corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
       _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000

. predict fitted, xb
(24 missing values generated)

. g square_fitted=fitted^2
(24 missing values generated)

. xtreg ln_wage fitted square_fitted

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1015                                         avg =        6.1
     overall = 0.0870                                         max =         15

                                                Wald chi2(2)      =    3388.51
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

-------------------------------------------------------------------------------
      ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
       fitted |   7.974516   .4636544    17.20   0.000      7.06577    8.883262
square_fitted |  -2.055037   .1369835   -15.00   0.000     -2.32352   -1.786555
        _cons |  -5.899127   .3907481   -15.10   0.000     -6.66498   -5.133275
--------------+----------------------------------------------------------------
      sigma_u |  .36540489
      sigma_e |  .30245467
          rho |  .59342665   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

. test square_fitted

 ( 1)  square_fitted = 0

           chi2(  1) =  225.06
         Prob > chi2 =    0.0000

.

In the toy-example reported abobe, accordin to -test- outcome, the regression model is clearly misspecified.

Kind regards,
Carlo
(Stata 19.0)

Comment

daniel klein

Join Date: Mar 2014

Posts: 3824
#4

07 Feb 2019, 06:17

I will follow Carlo (although I respectfully disagree with some of his statements) and pick on some selected issues.

Concerning the listed assumptions in #1, (5) is not part of the Gauss-Markov theorem and it is not required for OLS to be BLUE (best linear unbiased estimator). However, rewriting (5) as

\( \epsilon \sim (0, \sigma^{2}I) \)

where \(I\) is the identity matrix, combines assumptions (1), (2), and (3).

Concerning the constant/intercept: including one (that is, a constant/intercept) in the model does not guarantee that (1) holds; however, violation of (1) affects the estimate for the constant/intercept term. Also, the -fe- estimator does wipe out the constant/intercept. This is because the constant/intercept is indeed a constant; subtracting the mean (or first difference) from a constant eliminates the term from the equation. What Stata reports in the output of xtreg, fe is the average fixed-effect.

About clustered standard-errors: they do correct for heteroscedasticity and auto-correlation also in regress. It is just that vce(robust),when used with xtreg, is automatically interpreted as vce(cluster id), where id is the panel identifier.

I am happy to be corrected in the above should any of my statements not be accurate.

Best
Daniel

Last edited by daniel klein; 07 Feb 2019, 06:27. Reason: LaTeX
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

07 Feb 2019, 07:41

Sabrina:
reading Daniel's helpful reply makes me think that I should have been clearer in my previous reply:
- it's true that -fe- estimator wipes out the constant/intercept. I've interpreted Sabrina's point as saying that -xtreg, fe- outcome table does not report a constant. It does, but it's true that -xtreg,fe- constant is not the constant the we find in -regress- outcome table; as Daniel's says is the average fixed effect (see https://www.stata.com/support/faqs/s...fects-model/);
- invoking cluster or robust standard errors under -xtreg- produces the same standard errors for the reason Daniel provides:

Code:

use "http://www.stata-press.com/data/r15/nlswork.dta"
. xtreg ln_wage age, fe robust

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                F(1,4709)         =     884.05
corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
       _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg ln_wage age, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                F(1,4709)         =     884.05
corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
       _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

but, as I meant in my previous reply, the two options produce different standard errors under -regress-:

Code:

. sysuse auto.dta
(1978 Automobile Data)

. regress price mpg, robust

Linear regression                               Number of obs     =         74
                                                F(1, 72)          =      17.28
                                                Prob > F          =     0.0001
                                                R-squared         =     0.2196
                                                Root MSE          =     2623.7

------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   57.47701    -4.16   0.000    -353.4727    -124.316
       _cons |   11253.06   1376.393     8.18   0.000     8509.272    13996.85
------------------------------------------------------------------------------

. regress price mpg, vce(cluster foreign)

Linear regression                               Number of obs     =         74
                                                F(0, 1)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.2196
                                                Root MSE          =     2623.7

                                (Std. Err. adjusted for 2 clusters in foreign)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   57.46835    -4.16   0.150     -969.099    491.3103
       _cons |   11253.06   595.4638    18.90   0.034     3686.976    18819.15
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(Stata 19.0)

Comment

daniel klein

Join Date: Mar 2014
Posts: 3824

07 Feb 2019, 07:50

Originally posted by Carlo Lazzaro View Post

but, as I meant in my previous reply, the two options produce different standard errors under -regress-:

Code:

...
. regress price mpg, vce(cluster foreign)

Carlo, foreign is not the correct variable to cluster on. With xtreg, you would also get different results from vce(robust) when you did cluster on something other than panel identifier. In the auto dataset, each observation is its own panel. Try

Code:

. sysuse auto , clear
(1978 Automobile Data)

. generate id = _n

. regress price mpg , vce(robust)

Linear regression                               Number of obs     =         74
                                                F(1, 72)          =      17.28
                                                Prob > F          =     0.0001
                                                R-squared         =     0.2196
                                                Root MSE          =     2623.7

------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   57.47701    -4.16   0.000    -353.4727    -124.316
       _cons |   11253.06   1376.393     8.18   0.000     8509.272    13996.85
------------------------------------------------------------------------------

. regress price mpg , vce(cluster id)

Linear regression                               Number of obs     =         74
                                                F(1, 73)          =      17.28
                                                Prob > F          =     0.0001
                                                R-squared         =     0.2196
                                                Root MSE          =     2623.7

                                    (Std. Err. adjusted for 74 clusters in id)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   57.47701    -4.16   0.000    -353.4459   -124.3428
       _cons |   11253.06   1376.393     8.18   0.000     8509.914    13996.21
------------------------------------------------------------------------------

.
end of do-file

to see that the standard errors match exactly.

Best
Daniel

Last edited by daniel klein; 07 Feb 2019, 07:52.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#7

07 Feb 2019, 08:44

Daniel:
I do agree with you: in a panel data setting the -panelid- is the right variable for -cluster()-.
I see your point highlighted in your toy-example: the standard errors match perfectly indeed.
However, in a -regress- setting, with one wave of data only, I would have clustered on some other variable, such as -foreign- or -manufacturer- just to check whether some interesting differences come alive.
Interesting thread indeed (for me, at least). Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sabrina Muller

Join Date: Jan 2019

Posts: 21
#8

07 Feb 2019, 14:45

Hello everybody,

first of all thank you so much for your answers!!

daniel klein , I'm sorry my theoretical statistical knowledge is so limited, but I do need normally distributed residuals in order to be able to work with regress and xtreg,fe, no? But it is not part of the Gauss Markov theorem or better to say only implicitly implied by (1)-(4)?

I've read in some blog entries you don't actually need normality to be able to use OLS and to interpret coefficients, but only for testing hypothesis. I don't fully understand that statement, since I need the p-values of the variable first to be even able to interpret the coefficients, no? I wouldn't interpret an insignificant coefficient and this is a hypothesis test by itself, isn't it?

I have been able now to make some of my regressions look like they have normally distributed residuals now by taking the ln of the dependent variable and some other variables. The tests like sktest don't show me normality though, but I've read that this is due to high sample size (my data consists of 20.000+ observations)? Anything I can do about that or can I rely on the graphics in this case?

However, since some of my dependent variables can also take on negative values and zeros (it's performance variables like ROS etc.) I can't use the ln there. Are there any other good ways to transform the data?

And yeah what you are discussing above is what I meant about the intercept in fe models. I've actually learned this in university, that including an intercept guarantees (1) and it's funny that it's apparently not true. But I guess if the residuals are normally distributed (1) is also met.

Thank you so much for your help!
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#9

08 Feb 2019, 01:22

Originally posted by Sabrina Muller View Post

[...] but I do need normally distributed residuals in order to be able to work with regress and xtreg,fe, no? But it is not part of the Gauss Markov theorem or better to say only implicitly implied by (1)-(4)?

Again, normal residuals are neither implied by nor part of the Gauss-Markov theorem. The theorem is used in the proof that the OLS estimator is unbiased with minimal variance. That proof requires assumptions (1)-(4) to hold. I do not know what you mean by "work with" in this context; perhaps this is related to apparent misunderstandings below.

Originally posted by Sabrina Muller View Post

I've read in some blog entries you don't actually need normality to be able to use OLS and to interpret coefficients, but only for testing hypothesis.

You should find that statement in any decent introductory books to econonometrics. Usually, you would find an elaboration stating that normality is only required to study the small sample properties of the OLS estimator. In other words: with large samples, you do not even have to assume normality for testing coefficients.

Originally posted by Sabrina Muller View Post

I don't fully understand that statement, since I need the p-values of the variable first to be even able to interpret the coefficients, no? I wouldn't interpret an insignificant coefficient and this is a hypothesis test by itself, isn't it?

The coefficients express the direction and magnitude of associations; the p-value is more a statement about the accuracy of the estimated coefficients. Both are important points to consider, but your substantial interest probably lies in the coefficient, first.

Originally posted by Sabrina Muller View Post

I have been able now to make some of my regressions look like they have normally distributed residuals now by taking the ln of the dependent variable and some other variables. The tests like sktest don't show me normality though, but I've read that this is due to high sample size (my data consists of 20.000+ observations)? Anything I can do about that or can I rely on the graphics in this case?

With large samples, I would not be too concerned about normal errors. Something bell shaped will do. Note that tests for normal distribution can never confirm that there is a normal distribution. The best you can hope for is that evidence against a normal distribution is so weak that you cannot reject the null at conventional levels.

Originally posted by Sabrina Muller View Post

However, since some of my dependent variables can also take on negative values and zeros (it's performance variables like ROS etc.) I can't use the ln there. Are there any other good ways to transform the data?

In general, you do not transform the data to have a normal distribution in the residuals, you transform the data to get a linear relationship of your variables; normal residuals might be a side-effect. I will just point you to generalized linear models that are often preferable to transforming the data (especially in non-linear ways) but I will not elaborate on this since you seem to be asked to use linear regression anyway.

Originally posted by Sabrina Muller View Post

I've actually learned this in university, that including an intercept guarantees (1) and it's funny that it's apparently not true. But I guess if the residuals are normally distributed (1) is also met.

Perhaps you are mixing things up here. The empirical residuals, that is, the difference between predicted values from your model and the observed values in your data will have mean 0 even if the theoretical/"true" residuals do not have mean 0. Also, the empirical residuals will always be uncorrelated with the predictor variables in the model. But that is just how OLS works, mechanically; it does by no means imply that the respective assumptions hold.

Your last statement is inaccurate. A normal distribution does not imply a mean of 0. If the residuals follow a normal distribution with mean 42, then obviously (1) does not hold.

Best
Daniel
1 like
Comment
Sabrina Muller

Join Date: Jan 2019

Posts: 21
#10

08 Feb 2019, 03:46

Thank you very very much for your help. With "work with" I mean that I have to make sure to meet the assumptions first before using OLS for my thesis. And I meant to see if I actually need to prove I cannot reject the null of normality or if I don't have to do this if my sample is large enough (say 20.000 observations).
About the coefficients: My hypotheses are expressed in a way that they say "positively moderates", " has a positive influence" etc., so mostly I just need significance and the sign and I was wondering if I can rely on that in case of non-normality of residuals. Again, thank you very much!
Comment

Announcement