Panel data: Pooled v. FE v. RE v. GLS

Sebastian Kruk

Join Date: Jul 2017
Posts: 72

Panel data: Pooled v. FE v. RE v. GLS

11 Feb 2022, 18:56

I am writing my thesis about CO2 emissions'determinants: lagged CO2, GDP, energy intensity and share of renewable energies into the primary energy.

I do it:

1- Pooled OLS

Code:

reg ln_co2pc_gr l.ln_co2pc_gr ln_gdppc_gr ei_ch res_share_ch
estimates store pooled
  Source
SS
df
MS Number of obs =
987


F(4, 982) =
271.95

Model
5.83456066
4
1.45864016 Prob > F =
0.0000

Residual
5.26702695
982
.005363571 R-squared =
0.5256


Adj R-squared =
0.5236

Total
11.1015876
986
.011259217 Root MSE =
.07324





ln_co2pc_gr
Coefficient
Std. err.
t P>t [95% conf.
interval]




ln_co2pc_gr

L1.
-.1760077
.0223423
-7.88 0.000 -.2198518
-.1321636



ln_gdppc_gr
1.173666
.0611717
19.19 0.000 1.053623
1.293708

ei_ch
6.246011
.4014166
15.56 0.000 5.458278
7.033744

res_share_ch
-.0167874
.0008761
-19.16 0.000 -.0185066
-.0150682

_cons
.0000834
.0024518
0.03 0.973 -.0047279
.0048946

2- Chow Test in the estimation FE - With this test I verified that pooled is better FE

Code:

xtreg ln_co2pc_gr l.ln_co2pc_gr ln_gdppc_gr ei_ch res_share_ch,fe
estimates store fixed
  Fixed-effects (within) regression Number of obs = 987

Group variable: pais Number of groups = 21

R-squared: Obs per group:

Within = 0.5277 min = 47

Between = 0.3574 avg = 47.0

Overall = 0.5254 max = 47

F(4,962) = 268.67

corr(u_i, Xb) = -0.0369 Prob > F = 0.0000



ln_co2pc_gr Coefficient Std. err. t P>t [95% conf. interval]

ln_co2pc_gr

L1. -.182082 .0225374 -8.08 0.000 -.2263102 -.1378538

ln_gdppc_gr 1.202518 .0628348 19.14 0.000 1.079209 1.325827

ei_ch 6.177734 .4055494 15.23 0.000 5.381871 6.973597

res_share_ch -.0166086 .0008886 -18.69 0.000 -.0183525 -.0148648

_cons -.0001908 .002465 -0.08 0.938 -.0050283 .0046467

sigma_u .00889845

sigma_e .07348315

rho .01445209 (fraction of variance due to u_i)

F test that all u_i=0: F(20, 962) = 0.67 Prob > F = 0.8574

3- Modified Wald test for groupwise heteroskedasticity - The result indicates that i have to reject H0. So I have heteroskedasticity.

Code:

xttest3 Modified Wald test for groupwise
heteroskedasticity

in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all
i

chi2 (21) = 5626.37

Prob>chi2 = 0.0000

3' - Following FAQ: Testing for panel-level heteroskedasticity and autocorrelation | Stata to test heteroskedasticity - The result indicates that i have to reject H0. So I have heteroskedasticity.

Code:

xtgls ln_co2pc_gr l.ln_co2pc_gr ln_gdppc_gr ei_ch res_share_ch, igls panels(heteroskedastic)
estimates store hetero
  Iteration 1: tolerance = .01253158

Iteration 2: tolerance = .00224603

Iteration 3: tolerance = .00018464

Iteration 4: tolerance = .00008188

Iteration 5: tolerance = .00006792

Iteration 6: tolerance = .00003587

Iteration 7: tolerance = .0000168

Iteration 8: tolerance = 7.513e-06

Iteration 9: tolerance = 3.295e-06

Iteration 10: tolerance = 1.432e-06

Iteration 11: tolerance = 6.200e-07

Iteration 12: tolerance = 2.679e-07

Iteration 13: tolerance = 1.157e-07

Iteration 14: tolerance = 4.992e-08



Cross-sectional time-series FGLS regression

Coefficients: generalized least squares

Panels: heteroskedastic

Correlation: no autocorrelation

Estimated covariances = 21
Number of obs =
987

Estimated autocorrelations = 0
Number of groups =
21

Estimated coefficients = 5
Time periods =
47

Wald chi2(4) =
3209.74

Log likelihood = 1666.573
Prob > chi2 =
0.0000




ln_co2pc_gr Coefficient Std. err. z
P>z [95% conf.
interval]



ln_co2pc_gr

L1. -.0365769 .0157667 -2.32
0.020 -.067479
-.0056748

ln_gdppc_gr .966336 .0285631 33.83
0.000 .9103533
1.022319

ei_ch 6.084885 .1788927 34.01
0.000 5.734262
6.435508

res_share_ch -.0150411 .0005018 -29.97
0.000 -.0160246
-.0140576

_cons -.0009792 .0011706 -0.84
0.403 -.0032736
.0013152




xtgls ln_co2pc_gr l.ln_co2pc_gr ln_gdppc_gr ei_ch res_share_ch, igls
  Iteration 1: tolerance = 0



Cross-sectional time-series FGLS regression

Coefficients: generalized least squares

Panels: homoskedastic

Correlation: no autocorrelation

Estimated covariances = 1
Number of obs =
987

Estimated autocorrelations = 0
Number of groups =
21

Estimated coefficients = 5
Time periods =
47

Wald chi2(4) =
1093.35

Log likelihood = 1182.094
Prob > chi2 =
0.0000




ln_co2pc_gr Coefficient Std. err. z
P>z [95% conf.
interval]



ln_co2pc_gr

L1. -.1760077 .0222856 -7.90
0.000 -.2196867
-.1323287

ln_gdppc_gr 1.173666 .0610166 19.24
0.000 1.054075
1.293256

ei_ch 6.246011 .4003985 15.60
0.000 5.461244
7.030778

res_share_ch -.0167874 .0008738 -19.21
0.000 -.0185001
-.0150747

_cons .0000834 .0024455 0.03
0.973 -.0047098
.0048765




local df = e(N_g) - 1
lrtest hetero . , df(`df')
  Likelihood-ratio test

Assumption: . nested within
hetero

LR chi2(20) = 968.96

Prob > chi2 = 0.0000

4 - Breusch-Pagan LM test for cross-sectional correlation in fixed effects model - The result indicates that i can't reject H0. So I don't have cross-sectional correlation.

Code:

xttest2
  Correlation matrix of residuals:

__e1 __e4 __e5 __e6 __e7
__e8 __e10 __e11
__e13
__e14
__e15
__e16
__e17
__e18

__e1 1.0000

__e4 0.0230 1.0000

__e5 0.1774 -0.2663 1.0000

__e6 -0.0815 -0.1596 0.3337 1.0000

__e7 0.0378 -0.0339 0.0931 -0.4827 1.0000

__e8 -0.0775 0.0884 -0.1391 0.0130 -0.2160
1.0000

__e10 0.1728 -0.1214 0.3537 -0.0791 0.0612
-0.0255 1.0000

__e11 -0.0227 -0.1197 0.2107 0.1892 0.0045
0.1437 0.1316 1.0000

__e13 -0.0605 -0.2207 -0.0571 0.0261 0.0191
0.0425 0.0010 0.0492
1.0000

__e14 0.0869 -0.0064 0.0060 -0.1281 -0.0103
0.0488 0.1306 0.0719
-0.0029
1.0000

__e15 0.1708 -0.1080 0.0993 0.0243 0.0373
-0.2299 0.1401 -0.0315
-0.1551
0.2435
1.0000

__e16 0.0628 0.0825 0.0666 0.2075 -0.0526
0.1230 -0.0705 0.0390
-0.0794
0.2468
-0.0093
1.0000

__e17 0.0355 -0.0747 0.2266 -0.0418 -0.0541
-0.2315 0.2137 -0.0571
0.1571
0.0463
-0.1197
-0.0884
1.0000

__e18 0.1185 0.1001 0.2537 0.1797 -0.1182
0.2911 -0.0325 0.1856
-0.2174
0.1771
0.1690
0.1377
-0.2270
1.0000

__e19 -0.1699 -0.2237 0.1343 0.0290 0.0353
0.2445 -0.0164 0.3058
0.1293
0.0199
-0.0604
-0.0287
-0.3070
0.0850

__e20 -0.2560 -0.0242 0.0560 -0.0847 0.3125
-0.0103 -0.0130 0.1775
0.0206
-0.1220
0.0793
0.2789
0.0166
0.0722

__e21 0.1885 -0.1850 0.2959 0.0675 0.1458
0.0512 0.2397 0.1864
0.3013
0.0005
-0.0682
-0.0539
0.0846
0.2271

__e22 0.1250 0.0499 -0.1973 -0.0485 0.1092
0.0224 -0.0751 -0.2712
0.0114
0.0947
0.2086
0.0161
-0.0104
0.0202

__e23 0.0544 0.0267 0.1008 0.0286 -0.1474
0.0368 0.1464 0.2453
0.0411
0.0314
-0.0384
0.0662
0.0609
0.0481

__e24 -0.0303 0.1201 -0.0874 0.0920 -0.0274
-0.0487 -0.0419 0.0871
0.0408
0.0059
0.0166
0.0027
0.1309
-0.0986

__e26 0.4876 -0.0495 0.4233 0.3092 0.0435
-0.0122 0.3230 0.3330
-0.0773
-0.0259
0.2067
0.1298
0.1149
0.1722

__e19 __e20 __e21 __e22 __e23
__e24 __e26

__e19 1.0000

__e20 0.0276 1.0000

__e21 0.2877 0.1445 1.0000

__e22 0.0194 0.0504 0.1057 1.0000

__e23 0.2771 0.0291 0.2581 -0.0727 1.0000

__e24 -0.0341 0.0722 -0.1132 0.1107 -0.0321
1.0000

__e26 -0.0919 0.0470 0.2584 0.0817 0.1381
-0.0107 1.0000

Breusch-Pagan LM test of independence: chi2(210) =
224.533, Pr = 0.2340

Based on 46 complete observations over panel units

5 - Estimation with RE

Code:

xtreg ln_co2pc_gr l.ln_co2pc_gr ln_gdppc_gr ei_ch res_share_ch, re
estimates store random
  Random-effects GLS regression
Number of obs =
987

Group variable: pais
Number of groups =
21

R-squared:
Obs per group:

Within = 0.5275
min =
47

Between = 0.3787
avg =
47.0

Overall = 0.5256
max =
47


Wald chi2(4) =
1087.81

corr(u_i, X) = 0 (assumed)
Prob > chi2 =
0.0000




ln_co2pc_gr Coefficient Std. err.
z P>z [95% conf.
interval]



ln_co2pc_gr

L1. -.1760077 .0223423
-7.88 0.000 -.2197977
-.1322176

ln_gdppc_gr 1.173666 .0611717
19.19 0.000 1.053771
1.29356

ei_ch 6.246011 .4014166
15.56 0.000 5.459249
7.032773

res_share_ch -.0167874 .0008761
-19.16 0.000 -.0185044
-.0150703

_cons .0000834 .0024518
0.03 0.973 -.004722
.0048887



sigma_u 0

sigma_e .07348315

rho 0 (fraction
of variance due to u_i)

6 - tests of overidentifying restrictions - Why fail?

Code:

xtoverid
  Error -
saved
RE
estimates
are
degenerate
(sigma_u=0)
and
equivalent
to
pooled
OLS

r(198);

7 - Breusch Pagan Test - With this test I verified that pooled is better than RE

Code:

xttest0
  Breusch
and Pagan Lagrangian multiplier test for random
effects


ln_co2pc_gr[pais,t] = Xb + u[pais] + e[pais,t]


Estimated results:

Var SD = sqrt(Var)



ln_co2p~r .0112592 .1061095

e .0053998 .0734831

u 0 0


Test: Var(u) = 0

chibar2(01) = 0.00

Prob > chibar2 = 1.0000

8 - Hausman Test - With this test I verified that FE is better than RE

Code:

hausman fixed random, sigmamore Coefficients ----

(b) (B) (b-B)
sqrt(diag(V_b
V_B))

fixed random Difference
Std. err.



ln_co2pc_gr

L1. -.182082 -.1760077 -.0060743
.0023138

ln_gdppc_gr 1.202518 1.173666 .0288517
.0134072

ei_ch 6.177734 6.246011 -.0682771
.0472479

res_share_ch -.0166086 -.0167874 .0001788
.0001298



b = Consistent under H0 and Ha;
obtained from
xtreg.

B = Inconsistent under Ha, efficient under H0;
obtained from
xtreg.

Test of H0: Difference in coefficients not systematic

chi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)

= 12.28

Prob > chi2 = 0.0154

9 - Wooldrigde Test for autocorrelation in panel data - i can reject H0. So I have first-order autocorrelation

Code:

xtserial ln_co2pc_gr ln_co2pc_gr_1 ln_gdppc_gr ei_ch res_share_ch Wooldridge test for autocorrelation
in
panel
data

H0: no first-order autocorrelation

F( 1, 20) = 46.802

Prob > F = 0.0000

So I decide to do it:

10 – The last step was the estimation with xtgls with the option panels (heteroskedastic) and corr(ar1).

Code:

xtgls ln_co2pc_gr ln_co2pc_gr_1 ln_gdppc_gr ei_ch res_share_ch, panels(heteroskedastic) corr(ar1)
  Cross-sectional time-series FGLS regression

Coefficients: generalized least squares

Panels: heteroskedastic

Correlation: common AR(1) coefficient for all
panels (-0.0149)

Estimated covariances = 21
Number of obs =
1,008

Estimated autocorrelations = 1
Number of groups =
21

Estimated coefficients = 5
Time periods =
48

Wald chi2(4) =
2909.76

Prob > chi2 =
0.0000




ln_co2pc_gr Coefficient Std. err. z
P>z [95% conf.
interval]



ln_co2pc_gr_1 -.0412811 .0166903 -2.47
0.013 -.0739934
-.0085687

ln_gdppc_gr .9839163 .0317807 30.96
0.000 .9216273
1.046205

ei_ch 6.192212 .1930138 32.08
0.000 5.813912
6.570512

res_share_ch -.0155283 .0005325 -29.16
0.000 -.0165719
-.0144847

_cons -.0007851 .0012703 -0.62
0.537 -.0032749
.0017046

Is it wise to use xtgls or are better options?

Thanks in advance,

Sebastián.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17603
#2

12 Feb 2022, 03:20

Sebastián:
I'm under the impression that yours is no the most fruitful way to tackle this issue.
That said:
1) if you actually have panel data (with a continuous regressand), why starting from -regress-?
2) the usual approach is to compare -fe- with -re- specification (that is, using -xtreg- if you have a N>T panel dataset);
3) what does the literature in your research field suggest?
4) last but not least, as this one is not yor first message on this forum, please use CODE delimiters to share what you typed and what Stata gave you back (as per FAQ). Thanks.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Sebastian Kruk

Join Date: Jul 2017

Posts: 72
#3

12 Feb 2022, 13:59

Carlos:

Thank you for your fast reply.

1) I started from regress because I I would like to have all the alternatives,

2) Sorry, I should explain more. I have 21 countries and anual data between 1971-2019 so N = 21 and T = 48.

3) You can read: Barrera-Santana, J., Marrero, G. A., Puch, L. A., & Díaz, A. (2021). CO2 emissions and energy technologies in Western Europe. SERIEs, 12(2), 105-150.

4) Do you mean that only what I type goes between the CODE delimiters and Stata gave me back outside the CODE limiters?

Sorry for my English.

Regards,

Sebastián.
Comment
Sebastian Kruk

Join Date: Jul 2017

Posts: 72
#4

12 Feb 2022, 13:59

Carlo:

Thank you for your fast reply.

1) I started from regress because I I would like to have all the alternatives,

2) Sorry, I should explain more. I have 21 countries and anual data between 1971-2019 so N = 21 and T = 48.

3) You can read: Barrera-Santana, J., Marrero, G. A., Puch, L. A., & Díaz, A. (2021). CO2 emissions and energy technologies in Western Europe. SERIEs, 12(2), 105-150.

4) Do you mean that only what I type goes between the CODE delimiters and Stata gave me back outside the CODE limiters?

Sorry for my English.

Regards,

Sebastián.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17603
#5

13 Feb 2022, 03:36

Sebastian:
1) and 2): I do not think starting from (pooled) OLS is the way to go here, as you are dealing with T>N panel dataset. Pooled OLS is usually the last resort, when data do not support the evidence of apanel-wise effect.
3) the paper you quoted focus on dynamic panel data model (see -help xtabond-), which is a different (and much more demanding) inferential procedure vs -xtgls- or -xtregar- (that you shoul consider when dealing with long panel datasets). In addition,
https://www.stata.com/bookstore/environmental-econometrics-using-stata might be interesting to read.
4)

Code:

I mean that you should include among CODE delimiters what you typed and what Stata gave you back. Just click on the #-shaped toggle available from the tool bar appearing at the top of the post

Many/Most of the listers are not American/British English mother-tongue (I'm clearly a case in point); I think a very useful by-product of participating to this forum is reading and understanding how English mother-tongue listers phrase (and think), which is, in my case, really different from Italian.
Once apologizing for my far-from-Oxonian English, Nick Cox humourously replied: "Don't worry, I've studied at Cambridge!".

Last edited by Carlo Lazzaro; 13 Feb 2022, 03:39.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Sebastian Kruk

Join Date: Jul 2017

Posts: 72
#6

13 Feb 2022, 17:23

Carlo:

Molto grazie! Thank you so much!

I like too much Environmental Econometrics Using Stata.

I read that xtabond can be applied when N>T, but I have T>N, and fixed effect were lost If I use xtgls.

My Stata is 17 BE.

Greetings,

Sebastián.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17603
#7

14 Feb 2022, 02:16

Sebastian:
you may want to take a look at: http://fmwww.bc.edu/EC-C/S2013/823/E...n05.slides.pdf

Kind regards,
Carlo
(StataNow 18.5)
Comment
Sebastian Kruk

Join Date: Jul 2017

Posts: 72
#8

21 Feb 2022, 20:27

Carlo,

Thank you again Carlo!

I reformulated the problem in new post in https://www.statalist.org/forums/for...-or-panel-ardl.

Regards,

Sebastián.
Comment

Announcement

Panel data: Pooled v. FE v. RE v. GLS

Comment

Comment

Comment

Comment

Comment

Comment

Comment