Panel Data Fixed Effect Model, Heteroskedasticity, and Autocorrelation Correction Procedures

Hyunjin Cha

Join Date: Dec 2024
Posts: 13

Panel Data Fixed Effect Model, Heteroskedasticity, and Autocorrelation Correction Procedures

13 Dec 2024, 21:24

Dear Members,

I am conducting a study on the impact of Government R&D program evaluation results on budget decisions in my country.

My independent variables are evaluation result dummies for year t-1, namely dum_Grade2 (indicating "Excellent") and dum_Grade3 (indicating "Insufficient").

My dependent variable is the government-proposed budget for year t, log-transformed as ln_GBUDGETt.

Control variables include:

ln_Period: the log-transformed program duration,
ln_BUDGETt_1: the log-transformed congressional confirmed budget for year t-1,
dum_Scale2: a dummy variable for large-scale programs,
dum_NationalProject2: a dummy variable for the president's key projects, which changes depending on the presidential term (5 years in my country; the study spans three terms)
dum_Congress2: a dummy variable for programs of congressional interest, which varies annually, and
ln_GDPgrowth: the log-transformed GDP growth rate for year t-1, transformed as log(GDP growth + 1) due to the presence of negative values.

I have a balanced panel dataset with 77 programs (n) over 11 years (t). To determine the appropriate model and address potential issues with heteroskedasticity and autocorrelation, I followed the steps below:

Step 1. Model Selection

(1) I compared Pooled OLS with the Fixed Effect Model using an F-test and rejected the null hypothesis, thereby selecting the Fixed Effect Model.
(2) I compared Pooled OLS with the Random Effect Model using an LM test (xttest0) but failed to reject the null hypothesis, indicating that the Random Effect Model was not suitable.
(3) Since the RE model was deemed inappropriate based on the LM test, the Hausman test was omitted, and the FE model was chosen for further analysis.

Step 2. Testing for Heteroskedasticity and Autocorrelation

To address issues of heteroskedasticity and autocorrelation:

I ran xttest3 after the Fixed Effect Model to test for heteroskedasticity and confirmed its presence.
I conducted the Wooldridge test for autocorrelation (xtserial) and found evidence of first-order autocorrelation.

Step 3. Final Model Selection

Given the above results, I applied the Fixed Effect Model with cluster-robust standard errors (fe vce(cluster ID)) to account for heteroskedasticity and autocorrelation.

Code:

. xtsum ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth

Variable         |      Mean   Std. dev.       Min        Max |    Observations
-----------------+--------------------------------------------+----------------
ln_GBU~t overall |  10.53161   1.327408    5.32301   14.25384 |     N =     847
         between |             1.249814   7.779367    13.7978 |     n =      77
         within  |             .4673756   6.761568   11.85269 |     T =      11
                 |                                            |
dum_Gr~2 overall |  .2857143   .4520209          0          1 |     N =     847
         between |             .2616217          0   .9090909 |     n =      77
         within  |             .3697107  -.6233766   1.194805 |     T =      11
                 |                                            |
dum_Gr~3 overall |  .0932704   .2909828          0          1 |     N =     847
         between |             .1510972          0   .6363636 |     n =      77
         within  |             .2492197  -.5430933   1.002361 |     T =      11
                 |                                            |
ln_Per~d overall |  2.805034   .6446103          0   4.644391 |     N =     847
         between |             .5950932   1.591119   4.594609 |     n =      77
         within  |             .2560713   1.213915   3.611811 |     T =      11
                 |                                            |
ln_BUD~1 overall |  10.53581   1.298766    5.32301   14.25384 |     N =     847
         between |             1.241565   7.852146   13.70951 |     n =      77
         within  |             .4043875   6.935485   11.78642 |     T =      11
                 |                                            |
dum_Sc~2 overall |  .2597403   .4387511          0          1 |     N =     847
         between |             .3858959          0          1 |     n =      77
         within  |             .2129486  -.6493506   1.077922 |     T =      11
                 |                                            |
dum_Na~2 overall |  .8004723   .3998815          0          1 |     N =     847
         between |             .3173396          0          1 |     n =      77
         within  |             .2457461  -.0177096   1.618654 |     T =      11
                 |                                            |
dum_Co~2 overall |  .3730815   .4839092          0          1 |     N =     847
         between |             .2594807          0          1 |     n =      77
         within  |              .409431  -.5360094   1.282172 |     T =      11
                 |                                            |
ln_GDP~h overall |  1.100366   .7633282  -1.234432    1.66865 |     N =     847
         between |                    0   1.100366   1.100366 |     n =      77
         within  |             .7633282  -1.234432    1.66865 |     T =      11

Code:

. pwcorr ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, star(0.05)

             | ln_GBU~t dum_Gr~2 dum_Gr~3 ln_Per~d ln_BUD~1 dum_Sc~2 dum_Na~2
-------------+---------------------------------------------------------------
 ln_GBUDGETt |   1.0000
  dum_Grade2 |  -0.0094   1.0000
  dum_Grade3 |   0.0981* -0.2028*  1.0000
   ln_Period |   0.1209*  0.1911* -0.0440   1.0000
ln_BUDGETt_1 |   0.9642* -0.0017   0.1148*  0.1381*  1.0000
  dum_Scale2 |   0.7134* -0.0409   0.1433* -0.0606   0.7235*  1.0000
dum_Nation~2 |   0.2925* -0.0047   0.0585   0.0328   0.3036*  0.2688*  1.0000
dum_Congre~2 |   0.3300* -0.0826*  0.0800* -0.1461*  0.3269*  0.3392*  0.1408*
ln_GDPgrowth |  -0.0242  -0.1343*  0.0239  -0.1083* -0.0115  -0.0000  -0.0371

             | dum_Co~2 ln_GDP~h
-------------+------------------
dum_Congre~2 |   1.0000
ln_GDPgrowth |   0.0714*  1.0000

Code:

. regress ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth

      Source |       SS           df       MS      Number of obs   =       847
-------------+----------------------------------   F(8, 838)       =   1412.35
       Model |  1387.73691         8  173.467114   Prob > F        =    0.0000
    Residual |   102.92487       838   .12282204   R-squared       =    0.9310
-------------+----------------------------------   Adj R-squared   =    0.9303
       Total |  1490.66178       846  1.76201156   Root MSE        =    .35046

--------------------------------------------------------------------------------------
         ln_GBUDGETt | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
          dum_Grade2 |  -.0290667    .027911    -1.04   0.298    -.0838504    .0257171
          dum_Grade3 |  -.0777321   .0427223    -1.82   0.069    -.1615874    .0061233
           ln_Period |  -.0120277   .0200842    -0.60   0.549    -.0514489    .0273934
        ln_BUDGETt_1 |   .9622544   .0142967    67.31   0.000      .934193    .9903159
          dum_Scale2 |   .0910479   .0414141     2.20   0.028     .0097605    .1723354
dum_NationalProject2 |  -.0088824   .0317637    -0.28   0.780    -.0712282    .0534634
       dum_Congress2 |   .0363053   .0271923     1.34   0.182    -.0170677    .0896783
        ln_GDPgrowth |  -.0276368   .0160256    -1.72   0.085    -.0590917    .0038181
               _cons |   .4431037   .1380401     3.21   0.001     .1721588    .7140487
--------------------------------------------------------------------------------------

. estat vif

    Variable |       VIF       1/VIF  
-------------+----------------------
ln_BUDGETt_1 |      2.37    0.421091
  dum_Scale2 |      2.27    0.439718
dum_Congre~2 |      1.19    0.838468
   ln_Period |      1.15    0.866172
dum_Nation~2 |      1.11    0.899870
  dum_Grade2 |      1.10    0.912088
  dum_Grade3 |      1.06    0.939424
ln_GDPgrowth |      1.03    0.970191
-------------+----------------------
    Mean VIF |      1.41

Code:

. // F Test
.
. xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe

Fixed-effects (within) regression               Number of obs     =        847
Group variable: ID                              Number of groups  =         77

R-squared:                                      Obs per group:
     Within  = 0.5262                                         min =         11
     Between = 0.9848                                         avg =       11.0
     Overall = 0.9257                                         max =         11

                                                F(8, 762)         =     105.77
corr(u_i, Xb) = 0.7558                          Prob > F          =     0.0000

--------------------------------------------------------------------------------------
         ln_GBUDGETt | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
          dum_Grade2 |  -.0177626    .033814    -0.53   0.600    -.0841423    .0486171
          dum_Grade3 |  -.1330524   .0480334    -2.77   0.006    -.2273459   -.0387588
           ln_Period |  -.1041668   .0521856    -2.00   0.046    -.2066115   -.0017221
        ln_BUDGETt_1 |   .7838549   .0332889    23.55   0.000      .718506    .8492038
          dum_Scale2 |   .2111513   .0630147     3.35   0.001     .0874483    .3348544
dum_NationalProject2 |   .0330602   .0481965     0.69   0.493    -.0615535    .1276739
       dum_Congress2 |     .02627   .0292156     0.90   0.369    -.0310825    .0836225
        ln_GDPgrowth |  -.0368959   .0159432    -2.31   0.021    -.0681936   -.0055982
               _cons |   2.532233   .3416635     7.41   0.000     1.861519    3.202946
---------------------+----------------------------------------------------------------
             sigma_u |  .25471816
             sigma_e |  .33899387
                 rho |  .36085649   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------
F test that all u_i=0: F(76, 762) = 1.76                     Prob > F = 0.0001


. // LM Test
.
. xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, re

Random-effects GLS regression                   Number of obs     =        847
Group variable: ID                              Number of groups  =         77

R-squared:                                      Obs per group:
     Within  = 0.5201                                         min =         11
     Between = 0.9919                                         avg =       11.0
     Overall = 0.9309                                         max =         11

                                                Wald chi2(8)      =    9521.32
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

--------------------------------------------------------------------------------------
         ln_GBUDGETt | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
---------------------+----------------------------------------------------------------
          dum_Grade2 |  -.0296086   .0284956    -1.04   0.299    -.0854589    .0262417
          dum_Grade3 |  -.0832005   .0432639    -1.92   0.054    -.1679962    .0015953
           ln_Period |  -.0144255   .0216639    -0.67   0.505     -.056886    .0280349
        ln_BUDGETt_1 |   .9562089   .0152102    62.87   0.000     .9263974    .9860204
          dum_Scale2 |   .1009814   .0434738     2.32   0.020     .0157744    .1861884
dum_NationalProject2 |  -.0040338   .0332985    -0.12   0.904    -.0692977      .06123
       dum_Congress2 |   .0362844   .0273873     1.32   0.185    -.0173938    .0899625
        ln_GDPgrowth |   -.027873   .0158873    -1.75   0.079    -.0590116    .0032655
               _cons |   .5079956   .1475616     3.44   0.001     .2187801    .7972111
---------------------+----------------------------------------------------------------
             sigma_u |  .04893557
             sigma_e |  .33899387
                 rho |  .02041309   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------


.
. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

        ln_GBUDGETt[ID,t] = Xb + u[ID] + e[ID,t]

        Estimated results:
                         |       Var     SD = sqrt(Var)
                ---------+-----------------------------
               ln_GBUD~t |   1.762012       1.327408
                       e |   .1149168       .3389939
                       u |   .0023947       .0489356

        Test: Var(u) = 0
                             chibar2(01) =     0.98
                          Prob > chibar2 =   0.1610

Code:

. qui xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe

.
. xttest3   // heteroskedasticity

Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (77)  =   30407.79
Prob>chi2 =      0.0000


.
.
. // 8. autocorrelation
. xtserial ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth

Wooldridge test for autocorrelation in panel data
H0: no first-order autocorrelation
    F(  1,      76) =      8.138
           Prob > F =      0.0056

Code:

. xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)

Fixed-effects (within) regression               Number of obs     =        847
Group variable: ID                              Number of groups  =         77

R-squared:                                      Obs per group:
     Within  = 0.5262                                         min =         11
     Between = 0.9848                                         avg =       11.0
     Overall = 0.9257                                         max =         11

                                                F(8, 76)          =      46.95
corr(u_i, Xb) = 0.7558                          Prob > F          =     0.0000

                                            (Std. err. adjusted for 77 clusters in ID)
--------------------------------------------------------------------------------------
                     |               Robust
         ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
          dum_Grade2 |  -.0177626   .0323326    -0.55   0.584    -.0821585    .0466333
          dum_Grade3 |  -.1330524   .0646188    -2.06   0.043    -.2617518   -.0043529
           ln_Period |  -.1041668   .0853683    -1.22   0.226    -.2741925     .065859
        ln_BUDGETt_1 |   .7838549   .1440891     5.44   0.000     .4968766    1.070833
          dum_Scale2 |   .2111513    .122003     1.73   0.088    -.0318388    .4541414
dum_NationalProject2 |   .0330602   .0639943     0.52   0.607    -.0943954    .1605158
       dum_Congress2 |     .02627    .026613     0.99   0.327    -.0267344    .0792744
        ln_GDPgrowth |  -.0368959    .019917    -1.85   0.068    -.0765641    .0027723
               _cons |   2.532233   1.351934     1.87   0.065    -.1603775    5.224843
---------------------+----------------------------------------------------------------
             sigma_u |  .25471816
             sigma_e |  .33899387
                 rho |  .36085649   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------

I have the following questions regarding my approach and results:

Question 1: Model Selection Process

Are the steps I followed to determine the appropriate model (F-test for Fixed Effect, LM test for Random Effect, and heteroskedasticity and autocorrelation testing) correct for a balanced panel with n=77, t=11?

Question 2: Correlation vs. Causation

From my correlation analysis, Grade2 ("Excellent") shows a negative correlation with ln_GBUDGETt, while Grade3 ("Insufficient") shows a positive correlation. However, in the panel analysis results, Grade3 exhibits a negative coefficient.
Does this discrepancy between correlation and causation indicate an issue with the analysis, or is it sufficient to explain the inconsistency during interpretation?

Question 3: Heteroskedasticity and Autocorrelation Correction

In many Statalist discussions, cluster-robust standard errors (vce(cluster ID)) are commonly recommended to handle heteroskedasticity and autocorrelation. Would this approach be sufficient for my data (n = 77, t = 11)?
Or would alternative methods such as xtgls, xtscc or pcse be more appropriate given the detected heteroskedasticity and autocorrelation?
Some of my independent variables, such as dum_NationalProject2, change every five years (presidential term), while others, like dum_Congress2, vary annually. Do these characteristics affect the appropriateness of using cluster ID in my analysis?

Thank you in advance for your insights and guidance on these issues. I look forward to your advice.

Best regards,

Last edited by Hyunjin Cha; 13 Dec 2024, 22:19.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

14 Dec 2024, 00:24

Hyuhjin:
1) I am not clear why bot comparing -fe- vs. -re- directly (without considering POLS);
2) your panel data regression shows nothing about causal inference (see -xtdidregress- about this topic);
3) go -vce(cluster panelid)- or -robust- standard errors ans then compare if -re- (only) is the way to go via the community-contributed module -xtoverid-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Hyunjin Cha

Join Date: Dec 2024

Posts: 13
#3

14 Dec 2024, 06:43

Originally posted by Carlo Lazzaro View Post

Hyuhjin:
1) I am not clear why bot comparing -fe- vs. -re- directly (without considering POLS);
2) your panel data regression shows nothing about causal inference (see -xtdidregress- about this topic);
3) go -vce(cluster panelid)- or -robust- standard errors ans then compare if -re- (only) is the way to go via the community-contributed module -xtoverid-.

Dear Carlo Lazzaro, It is an honor to receive your response and guidance regarding my research. Thank you for taking the time to provide valuable insights.

1) Regarding your question on why I compared -fe- vs. -re- indirectly (via POLS), my intention was to test whether the unobserved characteristics of each program are distinct enough to justify estimating individual fixed effects, as opposed to assuming that these characteristics are negligible and pooling the data into a single regression line. I wanted to ensure that my approach for model selection was methodologically sound. Please let me know if my reasoning is flawed or could be improved.

2) Regarding your point on causal inference (2), I understand that xtdidregress is designed to estimate causal effects in a Difference-in-Differences (DiD) framework, which typically requires a clear intervention point, a treated group, and a control group. However, my study does not analyze a specific policy intervention or treatment event. Instead, it examines the continuous impact of annual evaluation results (Grade2 and Grade3) on budget decisions over time using a balanced panel dataset.

Given this context, I believe my focus is more on identifying associations or relationships, controlling for program-level and time-varying factors. While I acknowledge that my current approach (xtreg, fe vce(cluster ID)) may not fully establish causality, do you think it is sufficient for analyzing this type of continuous panel data? Alternatively, are there causal inference techniques that can be adapted to my study design, which lacks a clear intervention point?

3) I greatly appreciate your recommendation to use -xtoverid- to compare robust standard errors (-vce(cluster panelid)- or -robust-) and check whether the Random Effects model could be appropriate. I will make sure to explore this approach and apply it to my data.

Thank you again for your guidance and expertise. I deeply value your input, and it has provided me with a clearer direction to refine my analysis.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#4

14 Dec 2024, 07:29

Hyuhjin:
thanks for clarfying.
As far as you point #2 is concerned, -xtreg,fe- is enough fro your data, provided that you're not interested in a causal inference exercise.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Hyunjin Cha

Join Date: Dec 2024
Posts: 13

14 Dec 2024, 08:00

Originally posted by Carlo Lazzaro View Post

Hyuhjin:
thanks for clarfying.
As far as you point #2 is concerned, -xtreg,fe- is enough fro your data, provided that you're not interested in a causal inference exercise.

Dear Carlo Lazzaro, It seems I initially phrased my second question incorrectly. I would like to revise and restate my question.

In the correlation analysis, Grade2 ("Excellent") shows a negative correlation with ln_GBUDGETt, while Grade3 ("Insufficient") shows a positive correlation. However, in the results of the -xtreg, fe- analysis, Grade3 exhibits a negative coefficient.

Does this discrepancy between the correlation analysis and the -xtreg, fe- results indicate an issue with the analysis, or is it sufficient to address this discrepancy through interpretation?

Code:

. pwcorr ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, star(0.05)

             | ln_GBU~t dum_Gr~2 dum_Gr~3 ln_Per~d ln_BUD~1 dum_Sc~2 dum_Na~2
-------------+---------------------------------------------------------------
 ln_GBUDGETt |   1.0000 
  dum_Grade2 |  -0.0094   1.0000 
  dum_Grade3 |   0.0981* -0.2028*  1.0000 
   ln_Period |   0.1209*  0.1911* -0.0440   1.0000 
ln_BUDGETt_1 |   0.9642* -0.0017   0.1148*  0.1381*  1.0000 
  dum_Scale2 |   0.7134* -0.0409   0.1433* -0.0606   0.7235*  1.0000 
dum_Nation~2 |   0.2925* -0.0047   0.0585   0.0328   0.3036*  0.2688*  1.0000 
dum_Congre~2 |   0.3300* -0.0826*  0.0800* -0.1461*  0.3269*  0.3392*  0.1408*
ln_GDPgrowth |  -0.0242  -0.1343*  0.0239  -0.1083* -0.0115  -0.0000  -0.0371 

             | dum_Co~2 ln_GDP~h
-------------+------------------
dum_Congre~2 |   1.0000 
ln_GDPgrowth |   0.0714*  1.0000

Code:

. xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)

Fixed-effects (within) regression               Number of obs     =        847
Group variable: ID                              Number of groups  =         77

R-squared:                                      Obs per group:
     Within  = 0.5262                                         min =         11
     Between = 0.9848                                         avg =       11.0
     Overall = 0.9257                                         max =         11

                                                F(8, 76)          =      46.95
corr(u_i, Xb) = 0.7558                          Prob > F          =     0.0000

                                            (Std. err. adjusted for 77 clusters in ID)
--------------------------------------------------------------------------------------
                     |               Robust
         ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
          dum_Grade2 |  -.0177626   .0323326    -0.55   0.584    -.0821585    .0466333
          dum_Grade3 |  -.1330524   .0646188    -2.06   0.043    -.2617518   -.0043529
           ln_Period |  -.1041668   .0853683    -1.22   0.226    -.2741925     .065859
        ln_BUDGETt_1 |   .7838549   .1440891     5.44   0.000     .4968766    1.070833
          dum_Scale2 |   .2111513    .122003     1.73   0.088    -.0318388    .4541414
dum_NationalProject2 |   .0330602   .0639943     0.52   0.607    -.0943954    .1605158
       dum_Congress2 |     .02627    .026613     0.99   0.327    -.0267344    .0792744
        ln_GDPgrowth |  -.0368959    .019917    -1.85   0.068    -.0765641    .0027723
               _cons |   2.532233   1.351934     1.87   0.065    -.1603775    5.224843
---------------------+----------------------------------------------------------------
             sigma_u |  .25471816
             sigma_e |  .33899387
                 rho |  .36085649   (fraction of variance due to u_i)
------------------------------

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#6

14 Dec 2024, 10:03

Hyuhjin:
thanks for this further clarification.
I got your previous post wrong.
What does -estat vce, corr- after -xtreg,fe- give you back?

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Hyunjin Cha

Join Date: Dec 2024
Posts: 13

14 Dec 2024, 10:23

Originally posted by Carlo Lazzaro View Post

Hyuhjin:
thanks for this further clarification.
I got your previous post wrong.
What does -estat vce, corr- after -xtreg,fe- give you back?

Dear Carlo Lazzaro, Thank you for your thoughtful follow-up question.
I have run estat vce, corr as you suggested, and the correlation matrix of the coefficients is as follows:

Code:

. xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)

Fixed-effects (within) regression               Number of obs     =        847
Group variable: ID                              Number of groups  =         77

R-squared:                                      Obs per group:
     Within  = 0.5262                                         min =         11
     Between = 0.9848                                         avg =       11.0
     Overall = 0.9257                                         max =         11

                                                F(8, 76)          =      46.95
corr(u_i, Xb) = 0.7558                          Prob > F          =     0.0000

                                            (Std. err. adjusted for 77 clusters in ID)
--------------------------------------------------------------------------------------
                     |               Robust
         ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
          dum_Grade2 |  -.0177626   .0323326    -0.55   0.584    -.0821585    .0466333
          dum_Grade3 |  -.1330524   .0646188    -2.06   0.043    -.2617518   -.0043529
           ln_Period |  -.1041668   .0853683    -1.22   0.226    -.2741925     .065859
        ln_BUDGETt_1 |   .7838549   .1440891     5.44   0.000     .4968766    1.070833
          dum_Scale2 |   .2111513    .122003     1.73   0.088    -.0318388    .4541414
dum_NationalProject2 |   .0330602   .0639943     0.52   0.607    -.0943954    .1605158
       dum_Congress2 |     .02627    .026613     0.99   0.327    -.0267344    .0792744
        ln_GDPgrowth |  -.0368959    .019917    -1.85   0.068    -.0765641    .0027723
               _cons |   2.532233   1.351934     1.87   0.065    -.1603775    5.224843
---------------------+----------------------------------------------------------------
             sigma_u |  .25471816
             sigma_e |  .33899387
                 rho |  .36085649   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------

Code:

. estat vce, corr

Correlation matrix of coefficients of xtreg model

        e(V) | dum_Gr~2  dum_Gr~3  ln_Per~d  ln_BUD~1  dum_Sc~2  dum_Na~2  dum_Co~2  ln_GDP~h     _cons 
-------------+-----------------------------------------------------------------------------------------
  dum_Grade2 |   1.0000                                                                                 
  dum_Grade3 |   0.0835    1.0000                                                                       
   ln_Period |  -0.3234    0.3226    1.0000                                                             
ln_BUDGETt_1 |  -0.0114   -0.1681   -0.6358    1.0000                                                   
  dum_Scale2 |  -0.2988   -0.0788    0.5396   -0.7225    1.0000                                         
dum_Nation~2 |   0.3595    0.0575    0.0140   -0.3630   -0.1237    1.0000                               
dum_Congre~2 |  -0.3819   -0.1735    0.1494   -0.0580    0.3258   -0.2617    1.0000                     
ln_GDPgrowth |   0.0442    0.1701   -0.3150    0.7669   -0.7166   -0.1316   -0.1493    1.0000           
       _cons |   0.0583    0.1248    0.5284   -0.9908    0.7086    0.3715    0.0394   -0.7997    1.0000 

.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#8

14 Dec 2024, 11:42

Hyuhjin:,
the only concern here is the overall R-squared, that is really sky-rocketing.
As the overall R-squared is the squared correlation of y and yhat, given that many coefficients do not reach statistical signiifcance, you might be overfitting the model.
What does happen if you go with a more parsimonious model?

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Hyunjin Cha

Join Date: Dec 2024
Posts: 13

14 Dec 2024, 23:59

Originally posted by Carlo Lazzaro View Post

Hyuhjin:,
the only concern here is the overall R-squared, that is really sky-rocketing.
As the overall R-squared is the squared correlation of y and yhat, given that many coefficients do not reach statistical signiifcance, you might be overfitting the model.
What does happen if you go with a more parsimonious model?

Dear Carlo Lazzaro, Thank you very much for your thoughtful review and suggestions. As you recommended, I tried removing control variables one by one to simplify the model. However, I observed that removing most of the control variables had minimal impact on the overall R-squared. Only when I excluded ln_BUDGETt_1, the overall R-squared dropped significantly, by about 0.5.

The variable ln_BUDGETt_1 represents the budget confirmed by the legislature in year t-1 for each program. It aligns with the incremental budget decision-making theory proposed by A. Wildavsky, which suggests that "this year's budget is primarily influenced by last year's budget," as well as prior empirical studies supporting this theory.

In my country, the control variable ln_BUDGETt_1 reflects the budget determined by the legislature, while the dependent variable ln_GBUDGETt refers to the budget proposed by the government to the legislature.

When I exclude ln_BUDGETt_1, the variable of interest in my study, dum_Grade, becomes statistically insignificant.

In this case, should I conclude that dum_Grade is not statistically significant and exclude ln_BUDGETt_1 from the model? Or, given that ln_BUDGETt_1 is theoretically justified by budget decision-making theories and prior studies, and considering its high correlation with the dependent variable and its significant contribution to the overall R-squared, is it still valid to retain ln_BUDGETt_1 in the model?

I would greatly appreciate your insights on this matter.

Code:

. pwcorr ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 ln_GDPgrowth, star(0.05)

             | ln_GBU~t dum_Gr~2 dum_Gr~3 ln_Per~d ln_BUD~1 dum_Sc~2 dum_Na~2
-------------+---------------------------------------------------------------
 ln_GBUDGETt |   1.0000 
  dum_Grade2 |  -0.0094   1.0000 
  dum_Grade3 |   0.0981* -0.2028*  1.0000 
   ln_Period |   0.1209*  0.1911* -0.0440   1.0000 
ln_BUDGETt_1 |   0.9642* -0.0017   0.1148*  0.1381*  1.0000 
  dum_Scale2 |   0.7134* -0.0409   0.1433* -0.0606   0.7235*  1.0000 
dum_Nation~2 |   0.2925* -0.0047   0.0585   0.0328   0.3036*  0.2688*  1.0000 
ln_GDPgrowth |  -0.0242  -0.1343*  0.0239  -0.1083* -0.0115  -0.0000  -0.0371 

             | ln_GDP~h
-------------+---------
ln_GDPgrowth |   1.0000

Code:

. regress ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 ln_GDPgrowth

      Source |       SS           df       MS      Number of obs   =       847
-------------+----------------------------------   F(7, 839)       =   1612.35
       Model |  1387.51797         7  198.216853   Prob > F        =    0.0000
    Residual |  103.143809       839  .122936602   R-squared       =    0.9308
-------------+----------------------------------   Adj R-squared   =    0.9302
       Total |  1490.66178       846  1.76201156   Root MSE        =    .35062

--------------------------------------------------------------------------------------
         ln_GBUDGETt | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
          dum_Grade2 |  -.0304609   .0279045    -1.09   0.275    -.0852317      .02431
          dum_Grade3 |  -.0768521   .0427372    -1.80   0.072    -.1607364    .0070322
           ln_Period |  -.0162763   .0198397    -0.82   0.412    -.0552176    .0226649
        ln_BUDGETt_1 |   .9652985   .0141203    68.36   0.000     .9375833    .9930137
          dum_Scale2 |   .0971869   .0411772     2.36   0.018     .0163645    .1780093
dum_NationalProject2 |  -.0072418   .0317548    -0.23   0.820    -.0695699    .0550863
        ln_GDPgrowth |  -.0264102   .0160067    -1.65   0.099     -.057828    .0050076
               _cons |   .4325534    .137878     3.14   0.002      .161927    .7031797
--------------------------------------------------------------------------------------

. estat vif

    Variable |       VIF       1/VIF  
-------------+----------------------
ln_BUDGETt_1 |      2.31    0.432079
  dum_Scale2 |      2.25    0.445206
   ln_Period |      1.13    0.888476
dum_Nation~2 |      1.11    0.901219
  dum_Grade2 |      1.09    0.913367
  dum_Grade3 |      1.06    0.939647
ln_GDPgrowth |      1.03    0.973390
-------------+----------------------
    Mean VIF |      1.43

Code:

. xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 ln_GDPgrowth, fe vce(cluster ID)

Fixed-effects (within) regression               Number of obs     =        847
Group variable: ID                              Number of groups  =         77

R-squared:                                      Obs per group:
     Within  = 0.5257                                         min =         11
     Between = 0.9844                                         avg =       11.0
     Overall = 0.9253                                         max =         11

                                                F(7, 76)          =      54.55
corr(u_i, Xb) = 0.7536                          Prob > F          =     0.0000

                                            (Std. err. adjusted for 77 clusters in ID)
--------------------------------------------------------------------------------------
                     |               Robust
         ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
          dum_Grade2 |  -.0187521   .0329575    -0.57   0.571    -.0843927    .0468885
          dum_Grade3 |  -.1333652    .064617    -2.06   0.042    -.2620611   -.0046693
           ln_Period |  -.1110027   .0848705    -1.31   0.195    -.2800371    .0580316
        ln_BUDGETt_1 |   .7856016   .1438449     5.46   0.000     .4991097    1.072093
          dum_Scale2 |   .2155172   .1230524     1.75   0.084    -.0295629    .4605973
dum_NationalProject2 |   .0324699   .0644762     0.50   0.616    -.0959456    .1608854
        ln_GDPgrowth |  -.0363856   .0198447    -1.83   0.071    -.0759098    .0031386
               _cons |   2.541895   1.350217     1.88   0.064    -.1472946    5.231085
---------------------+----------------------------------------------------------------
             sigma_u |  .25650139
             sigma_e |  .33895133
                 rho |  .36413886   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------

. 
. predict fitted, xb     

. gen sq_fitted = fitted^2 

. 
. xtreg ln_GBUDGETt fitted sq_fitted, fe vce(cluster ID)

Fixed-effects (within) regression               Number of obs     =        847
Group variable: ID                              Number of groups  =         77

R-squared:                                      Obs per group:
     Within  = 0.5460                                         min =         11
     Between = 0.9712                                         avg =       11.0
     Overall = 0.9171                                         max =         11

                                                F(2, 76)          =      61.94
corr(u_i, Xb) = 0.5756                          Prob > F          =     0.0000

                                    (Std. err. adjusted for 77 clusters in ID)
------------------------------------------------------------------------------
             |               Robust
 ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      fitted |  -.6212747   1.250379    -0.50   0.621     -3.11162     1.86907
   sq_fitted |   .0780771   .0570936     1.37   0.175    -.0356346    .1917889
       _cons |   8.322973   6.807135     1.22   0.225    -5.234611    21.88056
-------------+----------------------------------------------------------------
     sigma_u |  .26648214
     sigma_e |  .33053229
         rho |  .39393667   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Comment

Hyunjin Cha

Join Date: Dec 2024
Posts: 13

#10

15 Dec 2024, 01:11

The code above removes the control variable dum_Congress2, while the code below removes ln_BUDGETt_1.

Code:

. xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)

Fixed-effects (within) regression               Number of obs     =        847
Group variable: ID                              Number of groups  =         77

R-squared:                                      Obs per group:
     Within  = 0.1814                                         min =         11
     Between = 0.6548                                         avg =       11.0
     Overall = 0.5429                                         max =         11

                                                F(7, 76)          =       6.29
corr(u_i, Xb) = 0.5789                          Prob > F          =     0.0000

                                            (Std. err. adjusted for 77 clusters in ID)
--------------------------------------------------------------------------------------
                     |               Robust
         ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
          dum_Grade2 |  -.0467292   .0598614    -0.78   0.437    -.1659534    .0724951
          dum_Grade3 |  -.1572548   .1106507    -1.42   0.159    -.3776348    .0631251
           ln_Period |   .1889261   .1123134     1.68   0.097    -.0347655    .4126177
          dum_Scale2 |   .8865106   .1913068     4.63   0.000       .50549    1.267531
dum_NationalProject2 |   .0643002   .1141213     0.56   0.575    -.1629921    .2915925
       dum_Congress2 |   .0664125   .0507169     1.31   0.194     -.034599     .167424
        ln_GDPgrowth |  -.0287897    .013549    -2.12   0.037    -.0557748   -.0018046
               _cons |   9.754854   .3145342    31.01   0.000     9.128405     10.3813
---------------------+----------------------------------------------------------------
             sigma_u |  .97658389
             sigma_e |  .44528062
                 rho |  .82788508   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------

.
. predict fitted, xb        

. gen sq_fitted = fitted^2  

.
. xtreg ln_GBUDGETt fitted sq_fitted, fe vce(cluster ID)

Fixed-effects (within) regression               Number of obs     =        847
Group variable: ID                              Number of groups  =         77

R-squared:                                      Obs per group:
     Within  = 0.1932                                         min =         11
     Between = 0.6370                                         avg =       11.0
     Overall = 0.5346                                         max =         11

                                                F(2, 76)          =      17.36
corr(u_i, Xb) = 0.5562                          Prob > F          =     0.0000

                                    (Std. err. adjusted for 77 clusters in ID)
------------------------------------------------------------------------------
             |               Robust
 ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      fitted |  -13.22706   10.06177    -1.31   0.193    -33.26681    6.812693
   sq_fitted |    .664144   .4741009     1.40   0.165    -.2801097    1.608398
       _cons |   76.05559    53.3117     1.43   0.158    -30.12387    182.2351
-------------+----------------------------------------------------------------
     sigma_u |  .97132238
     sigma_e |  .44062131
         rho |  .82933835   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Last edited by Hyunjin Cha; 15 Dec 2024, 01:14.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#11

15 Dec 2024, 01:57

Hyunjin:
I would go:

Code:

xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Hyunjin Cha

Join Date: Dec 2024

Posts: 13
#12

15 Dec 2024, 06:19

Originally posted by Carlo Lazzaro View Post

Hyunjin:
I would go:

Code:

xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)

Dear Carlo Lazzaro, I sincerely appreciate your detailed and thoughtful responses. Your advice has been invaluable, and I am grateful for the time and expertise you have shared.

Thank you once again for your kind support.
Comment

Announcement

Panel Data Fixed Effect Model, Heteroskedasticity, and Autocorrelation Correction Procedures

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment