Pooled OLS, fixed & random effects: Panel Data

Walther Larsen

Join Date: Apr 2020
Posts: 2

Pooled OLS, fixed & random effects: Panel Data

28 Apr 2020, 02:13

Hey everyone, (Data description is posted in the buttom) I'm currently writing by bachelor at Copenhagen Business School, and ran into an issue with Stata that i haven't been able to find the solution to on my own.

Since it is a university assignment the normal approach (as i have been taught, and is the recommendations in https://www.iuj.ac.jp/faculty/kucc62...blq5Qmk7KvdJLg) would be to start of with a simple model like a Pooled OLS, and then if that isn't sufficient, or the assumptions of the model don't seem to hold up, then you move on to fixed or random effects models. Gladly correct me if this approach isn't optimal.

My first issue when doing the Pooled OLS, is figuring out if it is actually done correctly (As i have seen different approaches from different sources). From what i can tell you do this by running clustered standard errors.

Code:

reg Covid19_cases x1 x2 x3 Country, vce(cluster Country)

Question 1. is this approach to Pooled OLS correct, and how should i include my time variable in the -reg?

Question 2. How do i test the assumptions of heteroskedasticity and autocorrelation when using clustered standard errors, as this seems to make it impossible to run a Breusch-Pagan test.

Code:

. hettest hettest not appropriate after robust cluster() r(498);

Furthermore, i know that -xtreg usually outperforms -reg (with clustered standard errors) when it comes to panel data regression.

So my Question 3 (See output from Pooled OLS and Random effects below) is how do i based on the stata output determine whether i should use Pooled OLS, fixed or random effects model. (As almost all my variables are static, i know that i'll probably end up with a -re effects model. I just simply haven't been able to statistically argue for this point of view, as i can't even test for things like heteroskedasticity and autocorrelation)

output from Pooled OLS:

Code:

Linear regression                               Number of obs     =      4,592
                                                F(19, 41)         =     303.69
                                                Prob > F          =     0.0000
                                                R-squared         =     0.7294
                                                Root MSE          =       1242

                                                (Std. Err. adjusted for 42 clusters in Country)
-----------------------------------------------------------------------------------------------
                              |               Robust
                Covid19_cases |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------+----------------------------------------------------------------
                  Ages0_14Pct |   8123.495   11156.99     0.73   0.471     -14408.5    30655.49
                 Ages65_99Pct |   3038.334   11320.49     0.27   0.790    -19823.86    25900.53
                 Ages15_64Pct |   7851.408   11239.37     0.70   0.489    -14846.96    30549.78
               Covid19_deaths |   9.630924   1.715408     5.61   0.000     6.166587    13.09526
                   CrimeIndex |  -8.226521   5.282897    -1.56   0.127    -18.89555    2.442507
                  DAI_B_index |   1999.881   1276.736     1.57   0.125    -578.5393    4578.301
                  DAI_G_index |   290.8351   454.5301     0.64   0.526     -627.107    1208.777
                  DAI_P_index |   3.885746    690.839     0.01   0.996    -1391.292    1399.064
                      Gdp2018 |   .1692471   .0260085     6.51   0.000     .1167219    .2217724
           GdpAgriculturalPct |    1772.46   2558.133     0.69   0.492    -3393.794    6938.715
             GdpIndustrialPct |  -35.28193   2158.263    -0.02   0.987    -4393.983    4323.419
                GdpServicePct |   150.5583   2216.818     0.07   0.946    -4326.396    4627.512
         InternetUsage2014Pct |  -283.9534   701.2951    -0.40   0.688    -1700.248    1132.341
                  popData2018 |  -9.46e-07   4.60e-07    -2.06   0.046    -1.87e-06   -1.64e-08
pop_AnnualGrowthPct_2010_2018 |  -15230.27   11371.27    -1.34   0.188    -38195.02    7734.475
                pop_density18 |  -.3554776   .4322918    -0.82   0.416    -1.228509    .5175533
          SocialMobilityIndex |  -9.681271    21.7794    -0.44   0.659    -53.66567    34.30313
              StringencyIndex |   3.398476   1.255605     2.71   0.010     .8627302    5.934222
                      Country |   .3061796     1.2523     0.24   0.808    -2.222892    2.835251
                        _cons |  -7908.159   12120.52    -0.65   0.518    -32386.04    16569.72
-----------------------------------------------------------------------------------------------

output from Random effects:

Code:

xtset Country Date

Code:

Random-effects GLS regression                   Number of obs     =      4,592
Group variable: Country                         Number of groups  =         42

R-sq:                                           Obs per group:
     within  = 0.6600                                         min =         51
     between = 0.9450                                         avg =      109.3
     overall = 0.7293                                         max =        113

                                                Wald chi2(18)     =    9283.67
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

-----------------------------------------------------------------------------------------------
                Covid19_cases |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------+----------------------------------------------------------------
                  Ages0_14Pct |   7719.581   17073.57     0.45   0.651       -25744    41183.17
                 Ages65_99Pct |   2600.686   16826.48     0.15   0.877    -30378.61    35579.98
                 Ages15_64Pct |   7477.483   17549.24     0.43   0.670    -26918.39    41873.36
               Covid19_deaths |   9.728414   .1100511    88.40   0.000     9.512718     9.94411
                   CrimeIndex |   -8.30544   6.737789    -1.23   0.218    -21.51126    4.900383
                  DAI_B_index |   1956.444   1257.037     1.56   0.120    -507.3033    4420.191
                  DAI_G_index |   308.9836   463.2871     0.67   0.505    -599.0424     1217.01
                  DAI_P_index |   64.28808   1023.292     0.06   0.950    -1941.327    2069.903
                      Gdp2018 |   .1685496   .0209757     8.04   0.000      .127438    .2096611
           GdpAgriculturalPct |   1816.435   4797.581     0.38   0.705    -7586.651    11219.52
             GdpIndustrialPct |  -124.5615   4119.166    -0.03   0.976    -8197.979    7948.856
                GdpServicePct |   135.0255   4126.775     0.03   0.974    -7953.304    8223.356
         InternetUsage2014Pct |  -392.6615   1069.255    -0.37   0.713    -2488.362    1703.039
                  popData2018 |  -9.59e-07   3.03e-07    -3.16   0.002    -1.55e-06   -3.64e-07
pop_AnnualGrowthPct_2010_2018 |  -15717.27   16056.61    -0.98   0.328    -47187.65    15753.11
                pop_density18 |  -.3674106   .5401536    -0.68   0.496    -1.426092     .691271
          SocialMobilityIndex |  -8.118353   21.31227    -0.38   0.703    -49.88963    33.65293
              StringencyIndex |   3.613653   .5180906     6.97   0.000     2.598214    4.629092
                        _cons |  -7511.384    18162.4    -0.41   0.679    -43109.03    28086.26
------------------------------+----------------------------------------------------------------
                      sigma_u |   319.2514
                      sigma_e |  1214.6213
                          rho |  .06462069   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------------

Data description:
21 variables, and 4592 observations. (unbalanced dataset)

Variable	Description
Date	Time indicator (In days)
StringencyIndex	Index measuring the goverment response to Covid19. 100 being the most severe response, and 0 being the loosest response.
Covid19_cases	Dependent variable Measuring the number of recorded covid19 cases
Covid19_deaths	Measuring the number of recording deaths caused by covid19
popData2018	2018 country population data
DAI_index	Digital adoption index Measuring a countries digital adoption across three dimensions of the economy: people, government, and business
DAI_B_index	Measuring a countries digital adoption across business
DAI_P_index	Measuring a countries digital adoption across people
DAI_G_index	Measuring a countries digital adoption across government
pop_AnnualGrowthPct_2010_2018	Measuring a countries annual growth in population from 2010 to 2018 in pct.
Ages0_14Pct	Measuring the pct. of a countries population who are between 0 and 14 years of age.
Ages15_64Pct	Measuring the pct. of a countries population who are between 15 and 64 years of age.
Ages65_99Pct	Measuring the pct. of a countries population who are between 65 and 99 years of age.
Ages0_99Pct	Measuring the pct. of a countries population who are between 0 and 99 years of age.
CrimeIndex	Index measuring crime rates by country. 100 being the highest crimes rates and 0 being the lowest
SocialMobilityIndex	Index measuring social mobility by country 100 being the highest social mobility and 0 being the lowest
Gdp2018	Country GDP by 2018 numbers
GdpAgriculturalPct	Pct. of a countries GDP that comes from the agriculture sector
GdpIndustrialPct	Pct. of a countries GDP that comes from the industrial sector
GdpServicePct	Pct. of a countries GDP that comes from the service sector
InternetUsage2014Pct	% of a countries population that uses the internet, by 2014 numbers
Country	Entity indicator
Continent	Continent
pop_density18	Population density by country by 2018 numbers

I hope i have been as precise and informative as possible.

Best regards, Walther Larsen

Last edited by Walther Larsen; 28 Apr 2020, 02:15.

Tags: panel data, regression

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#2

28 Apr 2020, 02:51

Walther:
welcome to this forum.
Some comments about your queries:
1) usually, -xtreg- outperforms pooled OLS, regardless non-default standard errors. That said, I would have started off with -xtreg- and switch to pooled OLS only in absence of a panel-wise effect.
2) If you actually have 109 dates and 42 panels, you should consider estimators developed for long panels (see -xtgls- and -xtregar, fe-);
3) you have sky-rocketing R-sq (-regress-) and R-sq between (-xtreg,re-) but most of your coefficients does not reach statistical significance: you may have quasi-extreme multicollinearity issue to deal with;
4) I'm not clear with the reason underlying non-default standard errors in -xtreg,re-. Did you detect heteroskedasticity and/or autocorrelation?
5) Re-checking for heteroskedasticity after imposing non-default standard errors in -reg- is not allowed in order to save your time, as these options change the way standard errors are calculated to take heteroskedastcity into account.

Kind regards,
Carlo
(Stata 19.0)
Comment

Walther Larsen

Join Date: Apr 2020
Posts: 2

28 Apr 2020, 03:10

Hey Carlo, thanks for the quick response.

1) Alright, and how would one go about checking whether or not their is a panel wise effect? (This might sound dumb as i'm nowhere near a statistics expert).

2) I have 137 countries, and a varying amount of dates on each country. As when it comes to Covid19, some countries startet tracking later or earlier than others. Does this change anything?

3) I didn't want to drop insignificant variables, before knowing if the approach was actually correct. (In university the usual approach has been to drop the most insignificant variable until your model only consists of significant variables.

I did check for multicollinearity using the -correlate command, and even though there were some high ones, there shouldn't be any perfect linear correlation.

Code:

(obs=4,592)

             | Covid~es Ag~14Pct Ages65~t Ages15~t Covid~hs CrimeI~x DAI_B_~x DAI_G_~x DAI_P_~x  Gdp2018 GdpAgr~t
-------------+---------------------------------------------------------------------------------------------------
Covid19_ca~s |   1.0000
 Ages0_14Pct |  -0.0650   1.0000
Ages65_99Pct |   0.0729  -0.8413   1.0000
Ages15_64Pct |  -0.0192  -0.2351  -0.3243   1.0000
Covid19_de~s |   0.8284  -0.0804   0.1027  -0.0471   1.0000
  CrimeIndex |   0.0735   0.5069  -0.5786   0.1635   0.0773   1.0000
 DAI_B_index |   0.0462  -0.6566   0.7053  -0.1302   0.0556  -0.4926   1.0000
 DAI_G_index |   0.0803  -0.3036   0.2144   0.1369   0.0817  -0.1012   0.1430   1.0000
 DAI_P_index |   0.0529  -0.7608   0.7580  -0.0227   0.0562  -0.4781   0.8669   0.2378   1.0000
     Gdp2018 |   0.4099  -0.1009   0.0619   0.0760   0.2912   0.1024  -0.0623   0.1350   0.0212   1.0000
GdpAgricul~t |  -0.1049   0.7472  -0.7381   0.0206  -0.1109   0.3785  -0.7905  -0.3257  -0.8314  -0.1010   1.0000
GdpIndustr~t |  -0.1235   0.2016  -0.2698   0.1609  -0.1442   0.1165  -0.3897  -0.0471  -0.2602  -0.0313   0.1847
GdpService~t |   0.1549  -0.5576   0.5992  -0.1236   0.1727  -0.2731   0.7056   0.1864   0.6379   0.0874  -0.6770
InternetUs~t |   0.0828  -0.6886   0.7333  -0.1119   0.0803  -0.4299   0.9325   0.1832   0.9251   0.0672  -0.8192
 popData2018 |   0.0705   0.2050  -0.3260   0.2435   0.0339   0.1451  -0.4350   0.0730  -0.4413   0.5231   0.3975
pop_Ann~2018 |  -0.0481   0.6871  -0.7264   0.0985  -0.0540   0.4207  -0.2593  -0.1416  -0.3387  -0.1141   0.4084
pop_densi~18 |  -0.0435   0.2362  -0.1174  -0.2076  -0.0102  -0.0853  -0.2068   0.1238  -0.2611  -0.0327   0.2311
SocialMobi~x |   0.0406  -0.7323   0.7647  -0.0932   0.0517  -0.5304   0.9203   0.1618   0.9260   0.0043  -0.8167
Stringency~x |   0.2341   0.0336  -0.0450   0.0214   0.2466   0.0006  -0.1070   0.0318  -0.0860   0.0192   0.0762
     Country |   0.1387   0.0778  -0.1600   0.1357   0.1142   0.1075  -0.1323  -0.0079  -0.0756   0.1096   0.1351
        Date |   0.2247  -0.0155   0.0158  -0.0019   0.2402  -0.0086   0.0046   0.0155   0.0070  -0.0020  -0.0085

             | GdpInd~t GdpSer~t Intern~t popDat~8 pop_An~8 pop_d~18 Social~x String~x  Country     Date
-------------+------------------------------------------------------------------------------------------
GdpIndustr~t |   1.0000
GdpService~t |  -0.8280   1.0000
InternetUs~t |  -0.3244   0.6751   1.0000
 popData2018 |   0.1418  -0.3188  -0.4196   1.0000
pop_Ann~2018 |  -0.0456  -0.1789  -0.2812   0.0664   1.0000
pop_densi~18 |  -0.1784  -0.0728  -0.2572   0.2385   0.0846   1.0000
SocialMobi~x |  -0.2180   0.5891   0.9548  -0.3984  -0.3351  -0.1964   1.0000
Stringency~x |   0.0249  -0.0585  -0.0945   0.0747   0.0100   0.0837  -0.0713   1.0000
     Country |  -0.0081  -0.0843  -0.1189  -0.1015   0.0637   0.1292  -0.1378   0.0133   1.0000
        Date |  -0.0092   0.0120   0.0019  -0.0032  -0.0146  -0.0031   0.0068   0.8703   0.0075   1.0000

4) Before using the clustered standard errors i found heteroskedasticity. The reason why i wanted to use -re, was because almost all of my variables are static over time, and from what i can understand in those cases its appropriate to use -re. I might be wrong though.

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

28 Apr 2020, 03:41

Walther:
1) see -xttest0- to test whether there's evidence of a panel-wise effect after -xtreg,re-,
2) the outcome provided by -xtreg,re- tells that you actually have 42 countries, and this detail is confirmed by (Std. Err. adjusted for 42 clusters in Country). I cannot follow you on 137 countries as cross-sectional dimension of your panel dataset, then.
3) If perfet multicollinearity was detected, Stata would have omitted one of the variables included in the perfect multicollinearity issue. I wouod have checked the -vce- matrix after -regress- and -xtreg,re- via -estat vce-.
4) if with static (which means something different in panel data regerssion setting) you mean time-invariant, I follow you. However, it may well be that -fe- specification fits your data better. You can check it via the user-written command -xtoverid- (see -search xtoverid-). Please note that, unlike -hausman-, -xtoverid- needs the -re- regression only to work properly, as you can see from the following toy-example:

Code:

use "https://www.stata-press.com/data/r16/nlswork.dta"
. xtreg ln_wage age, vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                Wald chi2(1)      =    1064.91
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0185667    .000569    32.63   0.000     .0174516    .0196819
       _cons |   1.120439   .0159154    70.40   0.000     1.089245    1.151632
-------------+----------------------------------------------------------------
     sigma_u |  .36972456
     sigma_e |  .30349389
         rho |  .59743613   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(idcode)
Sargan-Hansen statistic  14.529  Chi-sq(1)    P-value = 0.0001

.
*The -xtoverid- output points out to -fe- specification, the null being, loosely speaking, that the -re- specification id OK for your data*

As an aside, I cannot help from wondering to myself what's the support that you get from your supervisor, as you seem a bit lost with this actually demanding statistical methods.

Kind regards,
Carlo
(Stata 19.0)

Announcement

Pooled OLS, fixed & random effects: Panel Data

Comment

Comment

Comment