Significant time dummies

Edgar Larson

Join Date: May 2020
Posts: 12

Significant time dummies

04 May 2020, 19:28

Hello!

I am doing research on corruption and have a balanced panel data set of 363 observations for 33 countries and 11-year span. I was not sure what model to use but after some tests that rejected OLS (fair enough, it is a panel) and RE I decided to go for fixed effects model. Tests on heteroskedasticity and autocorrelation showed presence on both. I was looking for solutions for that and found a thread on this forum that had an answer from Carlo that said fixed effects with cluster adjusted standard errors adjust for both of those effects so I settled for FE model with clustered standard errors.

Thereafter I ran that model with time dummies (i.year) and the model showed that last 6 years are significant at 0.000 levels, which I guess means that there is severe (?) autocorrelation? Now I am stuck and unsure how to go on from here. Should I forget about time dummies and continue with the FE that adjusts for clusters and report results on that or should I report the results with the time dummies? The difference is that one of my main independent variables that was significant and is such in most previous research on corruption not significant anymore. Should I perhaps go for another model that might fit better or is the model good enough and I should stick to it?

I am very new with stata and fairly new with econometrics so any tips would help and mean a lot!
Thank you heaps in advance!

Code:

. xtreg corr fdipct faidpct loggdppc trade rule natres ethnic i.year, fe vce(cluster c_id)
note: ethnic omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        363
Group variable: c_id                            Number of groups  =         33

R-sq:                                           Obs per group:
     within  = 0.5078                                         min =         11
     between = 0.6435                                         avg =       11.0
     overall = 0.5920                                         max =         11

                                                F(16,32)          =      18.55
corr(u_i, Xb)  = 0.3889                         Prob > F          =     0.0000

                                  (Std. Err. adjusted for 33 clusters in c_id)
------------------------------------------------------------------------------
             |               Robust
        corr |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      fdipct |  -.0361876   .0147794    -2.45   0.020    -.0662922    -.006083
     faidpct |  -.0209003   .0459333    -0.46   0.652    -.1144634    .0726628
    loggdppc |  -2.380897   3.963144    -0.60   0.552    -10.45356    5.691764
       trade |   .0236044   .0208849     1.13   0.267    -.0189367    .0661455
        rule |  -9.406962   2.827707    -3.33   0.002    -15.16681   -3.647111
      natres |  -.1399976   .0717829    -1.95   0.060    -.2862145    .0062194
      ethnic |          0  (omitted)
             |
        year |
       2008  |   .3452493   .5045135     0.68   0.499    -.6824111     1.37291
       2009  |  -.6404473   .5801911    -1.10   0.278    -1.822258    .5413634
       2010  |  -1.131243   .8078917    -1.40   0.171    -2.776865    .5143785
       2011  |  -1.485195   1.013161    -1.47   0.152    -3.548936    .5785452
       2012  |  -5.358887   1.232563    -4.35   0.000    -7.869536   -2.848238
       2013  |  -5.224643   1.229671    -4.25   0.000    -7.729402   -2.719885
       2014  |  -4.625518   1.182611    -3.91   0.000    -7.034417   -2.216619
       2015  |  -4.594838   1.304793    -3.52   0.001    -7.252614   -1.937063
       2016  |  -5.051435   1.520751    -3.32   0.002    -8.149104   -1.953766
       2017  |  -5.483272   1.687407    -3.25   0.003    -8.920407   -2.046136
             |
       _cons |   86.17475   30.16134     2.86   0.007     24.73811    147.6114
-------------+----------------------------------------------------------------
     sigma_u |   5.983148
     sigma_e |  3.0626503
         rho |  .79238014   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. testparm(i.year)

 ( 1)  2008.year = 0
 ( 2)  2009.year = 0
 ( 3)  2010.year = 0
 ( 4)  2011.year = 0
 ( 5)  2012.year = 0
 ( 6)  2013.year = 0
 ( 7)  2014.year = 0
 ( 8)  2015.year = 0
 ( 9)  2016.year = 0
 (10)  2017.year = 0

       F( 10,    32) =    6.21
            Prob > F =    0.0000

Last edited by Edgar Larson; 04 May 2020, 20:21.

Tags: None

Jacob Sano

Join Date: Feb 2019

Posts: 17
#2

05 May 2020, 09:53

Edgar,
The time dummies coming in significant isn't, in itself, a problem. The high -testparm- F statistic suggests that the inclusion of time fixed effects serves an important purpose in your regression. The "shape" of the time dummy coefficients, though, is interesting, but this is kind of a roundabout way of thinking about autocorrelation.
If you want to take a closer look at this you might want to look at your residuals by year. Postestimation code for this could be something like:

Code:

predict rsidl, ue scatter rsidl year graph box rsidl, over(year)

The boxplot can help visualize the residuals for the panel regression.
Ideally, you won't see a pattern here, it should look like white noise. Looking at the residuals for both of your regressions might give you a hint as to which is the better specification or if you need to create a different specification.
1 like
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

05 May 2020, 12:09

You need to be careful about the distinction between time dummies in the structural model and serial correlation which normally is a characteristic of the error terms. There is no problem with having time dummies for years – your results suggest that findings of corruption have decreased over time. That is surprising to me, but the results are what the results are.

With data that has high serial correlations, I would seriously consider xtregar. While it doesn't allow for heteroskedasticity, it does correct the parameter estimates for serial correlation. Given a choice between a less biased estimate of beta and a less problematic estimate of the standard error, I would generally go for the beta. You also do not have an enormous sample so some of the nice things from consistency proofs may not really apply in your case.
1 like
Comment

Edgar Larson

Join Date: May 2020
Posts: 12

05 May 2020, 14:17

Thank you for your responses Jacob and Phil! They are greatly appreciated!

Click image for larger version

Name: image_18048.png
Views: 1
Size: 207.8 KB
ID: 1551422

Click image for larger version

Name: image_18049.png
Views: 1
Size: 145.3 KB
ID: 1551425

This is the output I got. I have never looked at residuals like this so correct me if I am wrong but the long lines in the second image point out to strong autocorrelation and meaning that the error terms conditional mean is not zero? That would break the assumption in the fixed effects model so the estimation would not be correct?

What are the possible solutions? Would dropping some variables or some years make the model more precise or should I go for the GLS estimation as you Phil suggest?

Update:
I dropped the first 6 years, so now the t = 5. The time dummies are now insignificant besides 2015 which would mean that they do not necessarily explain the model better and could be left off? Now the log GDP is significant which is one variable significant in almost all research in the field. Should I rather stick to this model or try my luck with the one above?

Code:

. xtreg corr fdipct faidpct loggdppc trade rule natres ethnic i.year, fe vce(clu
> ster c_id)
note: ethnic omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        198
Group variable: c_id                            Number of groups  =         33

R-sq:                                           Obs per group:
     within  = 0.2075                                         min =          6
     between = 0.2569                                         avg =        6.0
     overall = 0.2544                                         max =          6

                                                F(11,32)          =       9.41
corr(u_i, Xb)  = -0.1394                        Prob > F          =     0.0000

                                  (Std. Err. adjusted for 33 clusters in c_id)
------------------------------------------------------------------------------
             |               Robust
        corr |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      fdipct |  -.0645351   .0235702    -2.74   0.010     -.112546   -.0165243
     faidpct |  -.0490388    .074639    -0.66   0.516    -.2010735    .1029959
    loggdppc |  -8.227668   4.063324    -2.02   0.051    -16.50439    .0490524
       trade |  -.0054194   .0270917    -0.20   0.843    -.0606033    .0497645
        rule |  -5.833429   2.596523    -2.25   0.032    -11.12237   -.5444847
      natres |   .0458662   .0660814     0.69   0.493    -.0887373    .1804697
      ethnic |          0  (omitted)
             |
        year |
       2013  |    .298473   .4915347     0.61   0.548    -.7027505    1.299697
       2014  |   .9862322   .5905877     1.67   0.105    -.2167555     2.18922
       2015  |   1.462624   .6984068     2.09   0.044     .0400156    2.885232
       2016  |   1.197786   .8332912     1.44   0.160    -.4995723    2.895145
       2017  |   .7862924   1.017996     0.77   0.446    -1.287297    2.859882
             |
       _cons |   128.3729   31.84454     4.03   0.000     63.50768    193.2381
-------------+----------------------------------------------------------------
     sigma_u |  8.9744688
     sigma_e |    2.18788
         rho |   .9439009   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Last edited by Edgar Larson; 05 May 2020, 14:45.

Comment

Jacob Sano

Join Date: Feb 2019

Posts: 17
#5

05 May 2020, 20:58

Edgar,

I think your error terms look pretty much just as they should. The widening of the error distributions around 2013-2015 doesn't suggest that your errors are serially correlated. Again, there's no problem with having significant time dummies. I'm not sure that I would take them out of your last regression here, especially with 2015 significant at 0.05. If you run -testparm(i.timevar)- and it shows that they're not significantly jointly nonzero, its generally safe to take them out, unless there's a theory-based reason to keep them in.
As Phil mentioned, you might try xtregar if you're still concerned about serial corr, but your T is pretty short, especially if you want to cut the early years of your sample.
I would be careful about removing years from your regression, though, unless you have good reason to believe that corruption behaves differently in different time periods. Its reassuring to find that your regressor behaves as expected, but it's not worth making data cuts just to make this happen.
1 like
Comment
Edgar Larson

Join Date: May 2020

Posts: 12
#6

05 May 2020, 21:32

Thank you so much Jacob! I truly appreciate the help!

The testparm F value is 0.37 which I guess would mean it is safe to take them out of the 5 year model.
I would like to use full dataset and the full regression model, would you advise me to keep the time dummies in or leave them out? If I was to leave them in, I am not sure if I quite understand what the findings mean. Does "findings of corruption decrease over time" mean that the explanatory power of variables decreases over time or have I misunderstood it? Because if we do time dummies that would mean the model is time fixed effects and omits country-invariant variables by looking at variables that change over time? And if I was to leave them out would I have to mention anything about it in my research paper?

My apologies for the many questions!
Comment
Jacob Sano

Join Date: Feb 2019

Posts: 17
#7

05 May 2020, 22:43

If we're looking at the original regression results that you posted, I would definitely keep the time dummy variables. It appears that they're serving an important purpose in that regression. What the estimates for the time dummies is probably pointing to is a negative relationship between your dep var (corr) and year. However, I don't think that you really need to analyze these estimates in your discussion of the regression. The time fixed effects are there to account for omitted variable bias from difference in time period characteristics, so they're mostly tools to help ensure the accuracy of your other coefficient estimates. If you use the regression without time dummies you should include a discussion. The relationship between corr and loggdppc may be a result of the variables trending differently over time--corruption goes down, gdp goes up. The "spuriousness" of this relationship may be being picked up by the time dummies.
2 likes
Comment

Announcement

Significant time dummies

Comment

Comment

Comment

Comment

Comment

Comment