Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data Fixed Effect Model, Heteroskedasticity, and Autocorrelation Correction Procedures

    Dear Members,

    I am conducting a study on the impact of Government R&D program evaluation results on budget decisions in my country.

    My independent variables are evaluation result dummies for year t-1, namely dum_Grade2 (indicating "Excellent") and dum_Grade3 (indicating "Insufficient").

    My dependent variable is the government-proposed budget for year t, log-transformed as ln_GBUDGETt.

    Control variables include:
    • ln_Period: the log-transformed program duration,
    • ln_BUDGETt_1: the log-transformed congressional confirmed budget for year t-1,
    • dum_Scale2: a dummy variable for large-scale programs,
    • dum_NationalProject2: a dummy variable for the president's key projects, which changes depending on the presidential term (5 years in my country; the study spans three terms)
    • dum_Congress2: a dummy variable for programs of congressional interest, which varies annually, and
    • ln_GDPgrowth: the log-transformed GDP growth rate for year t-1, transformed as log(GDP growth + 1) due to the presence of negative values.
    I have a balanced panel dataset with 77 programs (n) over 11 years (t). To determine the appropriate model and address potential issues with heteroskedasticity and autocorrelation, I followed the steps below:

    Step 1. Model Selection

    (1) I compared Pooled OLS with the Fixed Effect Model using an F-test and rejected the null hypothesis, thereby selecting the Fixed Effect Model.
    (2) I compared Pooled OLS with the Random Effect Model using an LM test (xttest0) but failed to reject the null hypothesis, indicating that the Random Effect Model was not suitable.
    (3) Since the RE model was deemed inappropriate based on the LM test, the Hausman test was omitted, and the FE model was chosen for further analysis.

    Step 2. Testing for Heteroskedasticity and Autocorrelation

    To address issues of heteroskedasticity and autocorrelation:
    • I ran xttest3 after the Fixed Effect Model to test for heteroskedasticity and confirmed its presence.
    • I conducted the Wooldridge test for autocorrelation (xtserial) and found evidence of first-order autocorrelation.
    Step 3. Final Model Selection

    Given the above results, I applied the Fixed Effect Model with cluster-robust standard errors (fe vce(cluster ID)) to account for heteroskedasticity and autocorrelation.



    Code:
    . xtsum ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth
    
    Variable         |      Mean   Std. dev.       Min        Max |    Observations
    -----------------+--------------------------------------------+----------------
    ln_GBU~t overall |  10.53161   1.327408    5.32301   14.25384 |     N =     847
             between |             1.249814   7.779367    13.7978 |     n =      77
             within  |             .4673756   6.761568   11.85269 |     T =      11
                     |                                            |
    dum_Gr~2 overall |  .2857143   .4520209          0          1 |     N =     847
             between |             .2616217          0   .9090909 |     n =      77
             within  |             .3697107  -.6233766   1.194805 |     T =      11
                     |                                            |
    dum_Gr~3 overall |  .0932704   .2909828          0          1 |     N =     847
             between |             .1510972          0   .6363636 |     n =      77
             within  |             .2492197  -.5430933   1.002361 |     T =      11
                     |                                            |
    ln_Per~d overall |  2.805034   .6446103          0   4.644391 |     N =     847
             between |             .5950932   1.591119   4.594609 |     n =      77
             within  |             .2560713   1.213915   3.611811 |     T =      11
                     |                                            |
    ln_BUD~1 overall |  10.53581   1.298766    5.32301   14.25384 |     N =     847
             between |             1.241565   7.852146   13.70951 |     n =      77
             within  |             .4043875   6.935485   11.78642 |     T =      11
                     |                                            |
    dum_Sc~2 overall |  .2597403   .4387511          0          1 |     N =     847
             between |             .3858959          0          1 |     n =      77
             within  |             .2129486  -.6493506   1.077922 |     T =      11
                     |                                            |
    dum_Na~2 overall |  .8004723   .3998815          0          1 |     N =     847
             between |             .3173396          0          1 |     n =      77
             within  |             .2457461  -.0177096   1.618654 |     T =      11
                     |                                            |
    dum_Co~2 overall |  .3730815   .4839092          0          1 |     N =     847
             between |             .2594807          0          1 |     n =      77
             within  |              .409431  -.5360094   1.282172 |     T =      11
                     |                                            |
    ln_GDP~h overall |  1.100366   .7633282  -1.234432    1.66865 |     N =     847
             between |                    0   1.100366   1.100366 |     n =      77
             within  |             .7633282  -1.234432    1.66865 |     T =      11

    Code:
    . pwcorr ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, star(0.05)
    
                 | ln_GBU~t dum_Gr~2 dum_Gr~3 ln_Per~d ln_BUD~1 dum_Sc~2 dum_Na~2
    -------------+---------------------------------------------------------------
     ln_GBUDGETt |   1.0000
      dum_Grade2 |  -0.0094   1.0000
      dum_Grade3 |   0.0981* -0.2028*  1.0000
       ln_Period |   0.1209*  0.1911* -0.0440   1.0000
    ln_BUDGETt_1 |   0.9642* -0.0017   0.1148*  0.1381*  1.0000
      dum_Scale2 |   0.7134* -0.0409   0.1433* -0.0606   0.7235*  1.0000
    dum_Nation~2 |   0.2925* -0.0047   0.0585   0.0328   0.3036*  0.2688*  1.0000
    dum_Congre~2 |   0.3300* -0.0826*  0.0800* -0.1461*  0.3269*  0.3392*  0.1408*
    ln_GDPgrowth |  -0.0242  -0.1343*  0.0239  -0.1083* -0.0115  -0.0000  -0.0371
    
                 | dum_Co~2 ln_GDP~h
    -------------+------------------
    dum_Congre~2 |   1.0000
    ln_GDPgrowth |   0.0714*  1.0000

    Code:
    . regress ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth
    
          Source |       SS           df       MS      Number of obs   =       847
    -------------+----------------------------------   F(8, 838)       =   1412.35
           Model |  1387.73691         8  173.467114   Prob > F        =    0.0000
        Residual |   102.92487       838   .12282204   R-squared       =    0.9310
    -------------+----------------------------------   Adj R-squared   =    0.9303
           Total |  1490.66178       846  1.76201156   Root MSE        =    .35046
    
    --------------------------------------------------------------------------------------
             ln_GBUDGETt | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    ---------------------+----------------------------------------------------------------
              dum_Grade2 |  -.0290667    .027911    -1.04   0.298    -.0838504    .0257171
              dum_Grade3 |  -.0777321   .0427223    -1.82   0.069    -.1615874    .0061233
               ln_Period |  -.0120277   .0200842    -0.60   0.549    -.0514489    .0273934
            ln_BUDGETt_1 |   .9622544   .0142967    67.31   0.000      .934193    .9903159
              dum_Scale2 |   .0910479   .0414141     2.20   0.028     .0097605    .1723354
    dum_NationalProject2 |  -.0088824   .0317637    -0.28   0.780    -.0712282    .0534634
           dum_Congress2 |   .0363053   .0271923     1.34   0.182    -.0170677    .0896783
            ln_GDPgrowth |  -.0276368   .0160256    -1.72   0.085    -.0590917    .0038181
                   _cons |   .4431037   .1380401     3.21   0.001     .1721588    .7140487
    --------------------------------------------------------------------------------------
    
    . estat vif
    
        Variable |       VIF       1/VIF  
    -------------+----------------------
    ln_BUDGETt_1 |      2.37    0.421091
      dum_Scale2 |      2.27    0.439718
    dum_Congre~2 |      1.19    0.838468
       ln_Period |      1.15    0.866172
    dum_Nation~2 |      1.11    0.899870
      dum_Grade2 |      1.10    0.912088
      dum_Grade3 |      1.06    0.939424
    ln_GDPgrowth |      1.03    0.970191
    -------------+----------------------
        Mean VIF |      1.41

    Code:
    . // F Test
    .
    . xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe
    
    Fixed-effects (within) regression               Number of obs     =        847
    Group variable: ID                              Number of groups  =         77
    
    R-squared:                                      Obs per group:
         Within  = 0.5262                                         min =         11
         Between = 0.9848                                         avg =       11.0
         Overall = 0.9257                                         max =         11
    
                                                    F(8, 762)         =     105.77
    corr(u_i, Xb) = 0.7558                          Prob > F          =     0.0000
    
    --------------------------------------------------------------------------------------
             ln_GBUDGETt | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    ---------------------+----------------------------------------------------------------
              dum_Grade2 |  -.0177626    .033814    -0.53   0.600    -.0841423    .0486171
              dum_Grade3 |  -.1330524   .0480334    -2.77   0.006    -.2273459   -.0387588
               ln_Period |  -.1041668   .0521856    -2.00   0.046    -.2066115   -.0017221
            ln_BUDGETt_1 |   .7838549   .0332889    23.55   0.000      .718506    .8492038
              dum_Scale2 |   .2111513   .0630147     3.35   0.001     .0874483    .3348544
    dum_NationalProject2 |   .0330602   .0481965     0.69   0.493    -.0615535    .1276739
           dum_Congress2 |     .02627   .0292156     0.90   0.369    -.0310825    .0836225
            ln_GDPgrowth |  -.0368959   .0159432    -2.31   0.021    -.0681936   -.0055982
                   _cons |   2.532233   .3416635     7.41   0.000     1.861519    3.202946
    ---------------------+----------------------------------------------------------------
                 sigma_u |  .25471816
                 sigma_e |  .33899387
                     rho |  .36085649   (fraction of variance due to u_i)
    --------------------------------------------------------------------------------------
    F test that all u_i=0: F(76, 762) = 1.76                     Prob > F = 0.0001
    
    
    . // LM Test
    .
    . xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, re
    
    Random-effects GLS regression                   Number of obs     =        847
    Group variable: ID                              Number of groups  =         77
    
    R-squared:                                      Obs per group:
         Within  = 0.5201                                         min =         11
         Between = 0.9919                                         avg =       11.0
         Overall = 0.9309                                         max =         11
    
                                                    Wald chi2(8)      =    9521.32
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
    
    --------------------------------------------------------------------------------------
             ln_GBUDGETt | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    ---------------------+----------------------------------------------------------------
              dum_Grade2 |  -.0296086   .0284956    -1.04   0.299    -.0854589    .0262417
              dum_Grade3 |  -.0832005   .0432639    -1.92   0.054    -.1679962    .0015953
               ln_Period |  -.0144255   .0216639    -0.67   0.505     -.056886    .0280349
            ln_BUDGETt_1 |   .9562089   .0152102    62.87   0.000     .9263974    .9860204
              dum_Scale2 |   .1009814   .0434738     2.32   0.020     .0157744    .1861884
    dum_NationalProject2 |  -.0040338   .0332985    -0.12   0.904    -.0692977      .06123
           dum_Congress2 |   .0362844   .0273873     1.32   0.185    -.0173938    .0899625
            ln_GDPgrowth |   -.027873   .0158873    -1.75   0.079    -.0590116    .0032655
                   _cons |   .5079956   .1475616     3.44   0.001     .2187801    .7972111
    ---------------------+----------------------------------------------------------------
                 sigma_u |  .04893557
                 sigma_e |  .33899387
                     rho |  .02041309   (fraction of variance due to u_i)
    --------------------------------------------------------------------------------------
    
    
    .
    . xttest0
    
    Breusch and Pagan Lagrangian multiplier test for random effects
    
            ln_GBUDGETt[ID,t] = Xb + u[ID] + e[ID,t]
    
            Estimated results:
                             |       Var     SD = sqrt(Var)
                    ---------+-----------------------------
                   ln_GBUD~t |   1.762012       1.327408
                           e |   .1149168       .3389939
                           u |   .0023947       .0489356
    
            Test: Var(u) = 0
                                 chibar2(01) =     0.98
                              Prob > chibar2 =   0.1610
    Code:
    . qui xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe
    
    .
    . xttest3   // heteroskedasticity
    
    Modified Wald test for groupwise heteroskedasticity
    in fixed effect regression model
    
    H0: sigma(i)^2 = sigma^2 for all i
    
    chi2 (77)  =   30407.79
    Prob>chi2 =      0.0000
    
    
    .
    .
    . // 8. autocorrelation
    . xtserial ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth
    
    Wooldridge test for autocorrelation in panel data
    H0: no first-order autocorrelation
        F(  1,      76) =      8.138
               Prob > F =      0.0056
    Code:
    . xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)
    
    Fixed-effects (within) regression               Number of obs     =        847
    Group variable: ID                              Number of groups  =         77
    
    R-squared:                                      Obs per group:
         Within  = 0.5262                                         min =         11
         Between = 0.9848                                         avg =       11.0
         Overall = 0.9257                                         max =         11
    
                                                    F(8, 76)          =      46.95
    corr(u_i, Xb) = 0.7558                          Prob > F          =     0.0000
    
                                                (Std. err. adjusted for 77 clusters in ID)
    --------------------------------------------------------------------------------------
                         |               Robust
             ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ---------------------+----------------------------------------------------------------
              dum_Grade2 |  -.0177626   .0323326    -0.55   0.584    -.0821585    .0466333
              dum_Grade3 |  -.1330524   .0646188    -2.06   0.043    -.2617518   -.0043529
               ln_Period |  -.1041668   .0853683    -1.22   0.226    -.2741925     .065859
            ln_BUDGETt_1 |   .7838549   .1440891     5.44   0.000     .4968766    1.070833
              dum_Scale2 |   .2111513    .122003     1.73   0.088    -.0318388    .4541414
    dum_NationalProject2 |   .0330602   .0639943     0.52   0.607    -.0943954    .1605158
           dum_Congress2 |     .02627    .026613     0.99   0.327    -.0267344    .0792744
            ln_GDPgrowth |  -.0368959    .019917    -1.85   0.068    -.0765641    .0027723
                   _cons |   2.532233   1.351934     1.87   0.065    -.1603775    5.224843
    ---------------------+----------------------------------------------------------------
                 sigma_u |  .25471816
                 sigma_e |  .33899387
                     rho |  .36085649   (fraction of variance due to u_i)
    --------------------------------------------------------------------------------------


    I have the following questions regarding my approach and results:
    Question 1: Model Selection Process

    Are the steps I followed to determine the appropriate model (F-test for Fixed Effect, LM test for Random Effect, and heteroskedasticity and autocorrelation testing) correct for a balanced panel with n=77, t=11?


    Question 2: Correlation vs. Causation

    From my correlation analysis, Grade2 ("Excellent") shows a negative correlation with ln_GBUDGETt, while Grade3 ("Insufficient") shows a positive correlation. However, in the panel analysis results, Grade3 exhibits a negative coefficient.
    Does this discrepancy between correlation and causation indicate an issue with the analysis, or is it sufficient to explain the inconsistency during interpretation?


    Question 3: Heteroskedasticity and Autocorrelation Correction

    In many Statalist discussions, cluster-robust standard errors (vce(cluster ID)) are commonly recommended to handle heteroskedasticity and autocorrelation. Would this approach be sufficient for my data (n = 77, t = 11)?
    Or would alternative methods such as xtgls, xtscc or pcse be more appropriate given the detected heteroskedasticity and autocorrelation?
    Some of my independent variables, such as dum_NationalProject2, change every five years (presidential term), while others, like dum_Congress2, vary annually. Do these characteristics affect the appropriateness of using cluster ID in my analysis?

    Thank you in advance for your insights and guidance on these issues. I look forward to your advice.

    Best regards,
    Last edited by Hyunjin Cha; 13 Dec 2024, 23:19.

  • #2
    Hyuhjin:
    1) I am not clear why bot comparing -fe- vs. -re- directly (without considering POLS);
    2) your panel data regression shows nothing about causal inference (see -xtdidregress- about this topic);
    3) go -vce(cluster panelid)- or -robust- standard errors ans then compare if -re- (only) is the way to go via the community-contributed module -xtoverid-.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Hyuhjin:
      1) I am not clear why bot comparing -fe- vs. -re- directly (without considering POLS);
      2) your panel data regression shows nothing about causal inference (see -xtdidregress- about this topic);
      3) go -vce(cluster panelid)- or -robust- standard errors ans then compare if -re- (only) is the way to go via the community-contributed module -xtoverid-.

      Dear Carlo Lazzaro, It is an honor to receive your response and guidance regarding my research. Thank you for taking the time to provide valuable insights.

      1) Regarding your question on why I compared -fe- vs. -re- indirectly (via POLS), my intention was to test whether the unobserved characteristics of each program are distinct enough to justify estimating individual fixed effects, as opposed to assuming that these characteristics are negligible and pooling the data into a single regression line. I wanted to ensure that my approach for model selection was methodologically sound. Please let me know if my reasoning is flawed or could be improved.


      2) Regarding your point on causal inference (2), I understand that xtdidregress is designed to estimate causal effects in a Difference-in-Differences (DiD) framework, which typically requires a clear intervention point, a treated group, and a control group. However, my study does not analyze a specific policy intervention or treatment event. Instead, it examines the continuous impact of annual evaluation results (Grade2 and Grade3) on budget decisions over time using a balanced panel dataset.

      Given this context, I believe my focus is more on identifying associations or relationships, controlling for program-level and time-varying factors. While I acknowledge that my current approach (xtreg, fe vce(cluster ID)) may not fully establish causality, do you think it is sufficient for analyzing this type of continuous panel data? Alternatively, are there causal inference techniques that can be adapted to my study design, which lacks a clear intervention point?


      3) I greatly appreciate your recommendation to use -xtoverid- to compare robust standard errors (-vce(cluster panelid)- or -robust-) and check whether the Random Effects model could be appropriate. I will make sure to explore this approach and apply it to my data.

      Thank you again for your guidance and expertise. I deeply value your input, and it has provided me with a clearer direction to refine my analysis.

      Comment


      • #4
        Hyuhjin:
        thanks for clarfying.
        As far as you point #2 is concerned, -xtreg,fe- is enough fro your data, provided that you're not interested in a causal inference exercise.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Hyuhjin:
          thanks for clarfying.
          As far as you point #2 is concerned, -xtreg,fe- is enough fro your data, provided that you're not interested in a causal inference exercise.

          Dear Carlo Lazzaro, It seems I initially phrased my second question incorrectly. I would like to revise and restate my question.

          In the correlation analysis, Grade2 ("Excellent") shows a negative correlation with ln_GBUDGETt, while Grade3 ("Insufficient") shows a positive correlation. However, in the results of the -xtreg, fe- analysis, Grade3 exhibits a negative coefficient.

          Does this discrepancy between the correlation analysis and the -xtreg, fe- results indicate an issue with the analysis, or is it sufficient to address this discrepancy through interpretation?

          Code:
          . pwcorr ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, star(0.05)
          
                       | ln_GBU~t dum_Gr~2 dum_Gr~3 ln_Per~d ln_BUD~1 dum_Sc~2 dum_Na~2
          -------------+---------------------------------------------------------------
           ln_GBUDGETt |   1.0000 
            dum_Grade2 |  -0.0094   1.0000 
            dum_Grade3 |   0.0981* -0.2028*  1.0000 
             ln_Period |   0.1209*  0.1911* -0.0440   1.0000 
          ln_BUDGETt_1 |   0.9642* -0.0017   0.1148*  0.1381*  1.0000 
            dum_Scale2 |   0.7134* -0.0409   0.1433* -0.0606   0.7235*  1.0000 
          dum_Nation~2 |   0.2925* -0.0047   0.0585   0.0328   0.3036*  0.2688*  1.0000 
          dum_Congre~2 |   0.3300* -0.0826*  0.0800* -0.1461*  0.3269*  0.3392*  0.1408*
          ln_GDPgrowth |  -0.0242  -0.1343*  0.0239  -0.1083* -0.0115  -0.0000  -0.0371 
          
                       | dum_Co~2 ln_GDP~h
          -------------+------------------
          dum_Congre~2 |   1.0000 
          ln_GDPgrowth |   0.0714*  1.0000
          Code:
          . xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)
          
          Fixed-effects (within) regression               Number of obs     =        847
          Group variable: ID                              Number of groups  =         77
          
          R-squared:                                      Obs per group:
               Within  = 0.5262                                         min =         11
               Between = 0.9848                                         avg =       11.0
               Overall = 0.9257                                         max =         11
          
                                                          F(8, 76)          =      46.95
          corr(u_i, Xb) = 0.7558                          Prob > F          =     0.0000
          
                                                      (Std. err. adjusted for 77 clusters in ID)
          --------------------------------------------------------------------------------------
                               |               Robust
                   ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          ---------------------+----------------------------------------------------------------
                    dum_Grade2 |  -.0177626   .0323326    -0.55   0.584    -.0821585    .0466333
                    dum_Grade3 |  -.1330524   .0646188    -2.06   0.043    -.2617518   -.0043529
                     ln_Period |  -.1041668   .0853683    -1.22   0.226    -.2741925     .065859
                  ln_BUDGETt_1 |   .7838549   .1440891     5.44   0.000     .4968766    1.070833
                    dum_Scale2 |   .2111513    .122003     1.73   0.088    -.0318388    .4541414
          dum_NationalProject2 |   .0330602   .0639943     0.52   0.607    -.0943954    .1605158
                 dum_Congress2 |     .02627    .026613     0.99   0.327    -.0267344    .0792744
                  ln_GDPgrowth |  -.0368959    .019917    -1.85   0.068    -.0765641    .0027723
                         _cons |   2.532233   1.351934     1.87   0.065    -.1603775    5.224843
          ---------------------+----------------------------------------------------------------
                       sigma_u |  .25471816
                       sigma_e |  .33899387
                           rho |  .36085649   (fraction of variance due to u_i)
          ------------------------------

          Comment


          • #6
            Hyuhjin:
            thanks for this further clarification.
            I got your previous post wrong.
            What does -estat vce, corr- after -xtreg,fe- give you back?
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              Hyuhjin:
              thanks for this further clarification.
              I got your previous post wrong.
              What does -estat vce, corr- after -xtreg,fe- give you back?
              Dear Carlo Lazzaro, Thank you for your thoughtful follow-up question.
              I have run estat vce, corr as you suggested, and the correlation matrix of the coefficients is as follows:

              Code:
              . xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)
              
              Fixed-effects (within) regression               Number of obs     =        847
              Group variable: ID                              Number of groups  =         77
              
              R-squared:                                      Obs per group:
                   Within  = 0.5262                                         min =         11
                   Between = 0.9848                                         avg =       11.0
                   Overall = 0.9257                                         max =         11
              
                                                              F(8, 76)          =      46.95
              corr(u_i, Xb) = 0.7558                          Prob > F          =     0.0000
              
                                                          (Std. err. adjusted for 77 clusters in ID)
              --------------------------------------------------------------------------------------
                                   |               Robust
                       ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
              ---------------------+----------------------------------------------------------------
                        dum_Grade2 |  -.0177626   .0323326    -0.55   0.584    -.0821585    .0466333
                        dum_Grade3 |  -.1330524   .0646188    -2.06   0.043    -.2617518   -.0043529
                         ln_Period |  -.1041668   .0853683    -1.22   0.226    -.2741925     .065859
                      ln_BUDGETt_1 |   .7838549   .1440891     5.44   0.000     .4968766    1.070833
                        dum_Scale2 |   .2111513    .122003     1.73   0.088    -.0318388    .4541414
              dum_NationalProject2 |   .0330602   .0639943     0.52   0.607    -.0943954    .1605158
                     dum_Congress2 |     .02627    .026613     0.99   0.327    -.0267344    .0792744
                      ln_GDPgrowth |  -.0368959    .019917    -1.85   0.068    -.0765641    .0027723
                             _cons |   2.532233   1.351934     1.87   0.065    -.1603775    5.224843
              ---------------------+----------------------------------------------------------------
                           sigma_u |  .25471816
                           sigma_e |  .33899387
                               rho |  .36085649   (fraction of variance due to u_i)
              --------------------------------------------------------------------------------------

              Code:
              . estat vce, corr
              
              Correlation matrix of coefficients of xtreg model
              
                      e(V) | dum_Gr~2  dum_Gr~3  ln_Per~d  ln_BUD~1  dum_Sc~2  dum_Na~2  dum_Co~2  ln_GDP~h     _cons 
              -------------+-----------------------------------------------------------------------------------------
                dum_Grade2 |   1.0000                                                                                 
                dum_Grade3 |   0.0835    1.0000                                                                       
                 ln_Period |  -0.3234    0.3226    1.0000                                                             
              ln_BUDGETt_1 |  -0.0114   -0.1681   -0.6358    1.0000                                                   
                dum_Scale2 |  -0.2988   -0.0788    0.5396   -0.7225    1.0000                                         
              dum_Nation~2 |   0.3595    0.0575    0.0140   -0.3630   -0.1237    1.0000                               
              dum_Congre~2 |  -0.3819   -0.1735    0.1494   -0.0580    0.3258   -0.2617    1.0000                     
              ln_GDPgrowth |   0.0442    0.1701   -0.3150    0.7669   -0.7166   -0.1316   -0.1493    1.0000           
                     _cons |   0.0583    0.1248    0.5284   -0.9908    0.7086    0.3715    0.0394   -0.7997    1.0000 
              
              .

              Comment


              • #8
                Hyuhjin:,
                the only concern here is the overall R-squared, that is really sky-rocketing.
                As the overall R-squared is the squared correlation of y and yhat, given that many coefficients do not reach statistical signiifcance, you might be overfitting the model.
                What does happen if you go with a more parsimonious model?
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Hyuhjin:,
                  the only concern here is the overall R-squared, that is really sky-rocketing.
                  As the overall R-squared is the squared correlation of y and yhat, given that many coefficients do not reach statistical signiifcance, you might be overfitting the model.
                  What does happen if you go with a more parsimonious model?
                  Dear Carlo Lazzaro, Thank you very much for your thoughtful review and suggestions. As you recommended, I tried removing control variables one by one to simplify the model. However, I observed that removing most of the control variables had minimal impact on the overall R-squared. Only when I excluded ln_BUDGETt_1, the overall R-squared dropped significantly, by about 0.5.

                  The variable ln_BUDGETt_1 represents the budget confirmed by the legislature in year t-1 for each program. It aligns with the incremental budget decision-making theory proposed by A. Wildavsky, which suggests that "this year's budget is primarily influenced by last year's budget," as well as prior empirical studies supporting this theory.

                  In my country, the control variable ln_BUDGETt_1 reflects the budget determined by the legislature, while the dependent variable ln_GBUDGETt refers to the budget proposed by the government to the legislature.

                  When I exclude ln_BUDGETt_1, the variable of interest in my study, dum_Grade, becomes statistically insignificant.

                  In this case, should I conclude that dum_Grade is not statistically significant and exclude ln_BUDGETt_1 from the model? Or, given that ln_BUDGETt_1 is theoretically justified by budget decision-making theories and prior studies, and considering its high correlation with the dependent variable and its significant contribution to the overall R-squared, is it still valid to retain ln_BUDGETt_1 in the model?

                  I would greatly appreciate your insights on this matter.



                  Code:
                  . pwcorr ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 ln_GDPgrowth, star(0.05)
                  
                               | ln_GBU~t dum_Gr~2 dum_Gr~3 ln_Per~d ln_BUD~1 dum_Sc~2 dum_Na~2
                  -------------+---------------------------------------------------------------
                   ln_GBUDGETt |   1.0000 
                    dum_Grade2 |  -0.0094   1.0000 
                    dum_Grade3 |   0.0981* -0.2028*  1.0000 
                     ln_Period |   0.1209*  0.1911* -0.0440   1.0000 
                  ln_BUDGETt_1 |   0.9642* -0.0017   0.1148*  0.1381*  1.0000 
                    dum_Scale2 |   0.7134* -0.0409   0.1433* -0.0606   0.7235*  1.0000 
                  dum_Nation~2 |   0.2925* -0.0047   0.0585   0.0328   0.3036*  0.2688*  1.0000 
                  ln_GDPgrowth |  -0.0242  -0.1343*  0.0239  -0.1083* -0.0115  -0.0000  -0.0371 
                  
                               | ln_GDP~h
                  -------------+---------
                  ln_GDPgrowth |   1.0000

                  Code:
                  . regress ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 ln_GDPgrowth
                  
                        Source |       SS           df       MS      Number of obs   =       847
                  -------------+----------------------------------   F(7, 839)       =   1612.35
                         Model |  1387.51797         7  198.216853   Prob > F        =    0.0000
                      Residual |  103.143809       839  .122936602   R-squared       =    0.9308
                  -------------+----------------------------------   Adj R-squared   =    0.9302
                         Total |  1490.66178       846  1.76201156   Root MSE        =    .35062
                  
                  --------------------------------------------------------------------------------------
                           ln_GBUDGETt | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                  ---------------------+----------------------------------------------------------------
                            dum_Grade2 |  -.0304609   .0279045    -1.09   0.275    -.0852317      .02431
                            dum_Grade3 |  -.0768521   .0427372    -1.80   0.072    -.1607364    .0070322
                             ln_Period |  -.0162763   .0198397    -0.82   0.412    -.0552176    .0226649
                          ln_BUDGETt_1 |   .9652985   .0141203    68.36   0.000     .9375833    .9930137
                            dum_Scale2 |   .0971869   .0411772     2.36   0.018     .0163645    .1780093
                  dum_NationalProject2 |  -.0072418   .0317548    -0.23   0.820    -.0695699    .0550863
                          ln_GDPgrowth |  -.0264102   .0160067    -1.65   0.099     -.057828    .0050076
                                 _cons |   .4325534    .137878     3.14   0.002      .161927    .7031797
                  --------------------------------------------------------------------------------------
                  
                  . estat vif
                  
                      Variable |       VIF       1/VIF  
                  -------------+----------------------
                  ln_BUDGETt_1 |      2.31    0.432079
                    dum_Scale2 |      2.25    0.445206
                     ln_Period |      1.13    0.888476
                  dum_Nation~2 |      1.11    0.901219
                    dum_Grade2 |      1.09    0.913367
                    dum_Grade3 |      1.06    0.939647
                  ln_GDPgrowth |      1.03    0.973390
                  -------------+----------------------
                      Mean VIF |      1.43

                  Code:
                  . xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 ln_GDPgrowth, fe vce(cluster ID)
                  
                  Fixed-effects (within) regression               Number of obs     =        847
                  Group variable: ID                              Number of groups  =         77
                  
                  R-squared:                                      Obs per group:
                       Within  = 0.5257                                         min =         11
                       Between = 0.9844                                         avg =       11.0
                       Overall = 0.9253                                         max =         11
                  
                                                                  F(7, 76)          =      54.55
                  corr(u_i, Xb) = 0.7536                          Prob > F          =     0.0000
                  
                                                              (Std. err. adjusted for 77 clusters in ID)
                  --------------------------------------------------------------------------------------
                                       |               Robust
                           ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  ---------------------+----------------------------------------------------------------
                            dum_Grade2 |  -.0187521   .0329575    -0.57   0.571    -.0843927    .0468885
                            dum_Grade3 |  -.1333652    .064617    -2.06   0.042    -.2620611   -.0046693
                             ln_Period |  -.1110027   .0848705    -1.31   0.195    -.2800371    .0580316
                          ln_BUDGETt_1 |   .7856016   .1438449     5.46   0.000     .4991097    1.072093
                            dum_Scale2 |   .2155172   .1230524     1.75   0.084    -.0295629    .4605973
                  dum_NationalProject2 |   .0324699   .0644762     0.50   0.616    -.0959456    .1608854
                          ln_GDPgrowth |  -.0363856   .0198447    -1.83   0.071    -.0759098    .0031386
                                 _cons |   2.541895   1.350217     1.88   0.064    -.1472946    5.231085
                  ---------------------+----------------------------------------------------------------
                               sigma_u |  .25650139
                               sigma_e |  .33895133
                                   rho |  .36413886   (fraction of variance due to u_i)
                  --------------------------------------------------------------------------------------
                  
                  . 
                  . predict fitted, xb     
                  
                  . gen sq_fitted = fitted^2 
                  
                  . 
                  . xtreg ln_GBUDGETt fitted sq_fitted, fe vce(cluster ID)
                  
                  Fixed-effects (within) regression               Number of obs     =        847
                  Group variable: ID                              Number of groups  =         77
                  
                  R-squared:                                      Obs per group:
                       Within  = 0.5460                                         min =         11
                       Between = 0.9712                                         avg =       11.0
                       Overall = 0.9171                                         max =         11
                  
                                                                  F(2, 76)          =      61.94
                  corr(u_i, Xb) = 0.5756                          Prob > F          =     0.0000
                  
                                                      (Std. err. adjusted for 77 clusters in ID)
                  ------------------------------------------------------------------------------
                               |               Robust
                   ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                        fitted |  -.6212747   1.250379    -0.50   0.621     -3.11162     1.86907
                     sq_fitted |   .0780771   .0570936     1.37   0.175    -.0356346    .1917889
                         _cons |   8.322973   6.807135     1.22   0.225    -5.234611    21.88056
                  -------------+----------------------------------------------------------------
                       sigma_u |  .26648214
                       sigma_e |  .33053229
                           rho |  .39393667   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------

                  Comment


                  • #10


                    The code above removes the control variable dum_Congress2, while the code below removes ln_BUDGETt_1.



                    Code:
                    . xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)
                    
                    Fixed-effects (within) regression               Number of obs     =        847
                    Group variable: ID                              Number of groups  =         77
                    
                    R-squared:                                      Obs per group:
                         Within  = 0.1814                                         min =         11
                         Between = 0.6548                                         avg =       11.0
                         Overall = 0.5429                                         max =         11
                    
                                                                    F(7, 76)          =       6.29
                    corr(u_i, Xb) = 0.5789                          Prob > F          =     0.0000
                    
                                                                (Std. err. adjusted for 77 clusters in ID)
                    --------------------------------------------------------------------------------------
                                         |               Robust
                             ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                    ---------------------+----------------------------------------------------------------
                              dum_Grade2 |  -.0467292   .0598614    -0.78   0.437    -.1659534    .0724951
                              dum_Grade3 |  -.1572548   .1106507    -1.42   0.159    -.3776348    .0631251
                               ln_Period |   .1889261   .1123134     1.68   0.097    -.0347655    .4126177
                              dum_Scale2 |   .8865106   .1913068     4.63   0.000       .50549    1.267531
                    dum_NationalProject2 |   .0643002   .1141213     0.56   0.575    -.1629921    .2915925
                           dum_Congress2 |   .0664125   .0507169     1.31   0.194     -.034599     .167424
                            ln_GDPgrowth |  -.0287897    .013549    -2.12   0.037    -.0557748   -.0018046
                                   _cons |   9.754854   .3145342    31.01   0.000     9.128405     10.3813
                    ---------------------+----------------------------------------------------------------
                                 sigma_u |  .97658389
                                 sigma_e |  .44528062
                                     rho |  .82788508   (fraction of variance due to u_i)
                    --------------------------------------------------------------------------------------
                    
                    .
                    . predict fitted, xb        
                    
                    . gen sq_fitted = fitted^2  
                    
                    .
                    . xtreg ln_GBUDGETt fitted sq_fitted, fe vce(cluster ID)
                    
                    Fixed-effects (within) regression               Number of obs     =        847
                    Group variable: ID                              Number of groups  =         77
                    
                    R-squared:                                      Obs per group:
                         Within  = 0.1932                                         min =         11
                         Between = 0.6370                                         avg =       11.0
                         Overall = 0.5346                                         max =         11
                    
                                                                    F(2, 76)          =      17.36
                    corr(u_i, Xb) = 0.5562                          Prob > F          =     0.0000
                    
                                                        (Std. err. adjusted for 77 clusters in ID)
                    ------------------------------------------------------------------------------
                                 |               Robust
                     ln_GBUDGETt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                          fitted |  -13.22706   10.06177    -1.31   0.193    -33.26681    6.812693
                       sq_fitted |    .664144   .4741009     1.40   0.165    -.2801097    1.608398
                           _cons |   76.05559    53.3117     1.43   0.158    -30.12387    182.2351
                    -------------+----------------------------------------------------------------
                         sigma_u |  .97132238
                         sigma_e |  .44062131
                             rho |  .82933835   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    
                    .
                    Last edited by Hyunjin Cha; 15 Dec 2024, 02:14.

                    Comment


                    • #11
                      Hyunjin:
                      I would go:
                      Code:
                       
                       xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)
                      Kind regards,
                      Carlo
                      (StataNow 18.5)

                      Comment


                      • #12
                        Originally posted by Carlo Lazzaro View Post
                        Hyunjin:
                        I would go:
                        Code:
                        xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID)
                        Dear Carlo Lazzaro, I sincerely appreciate your detailed and thoughtful responses. Your advice has been invaluable, and I am grateful for the time and expertise you have shared.

                        Thank you once again for your kind support.

                        Comment

                        Working...
                        X