Dear Members,
I am conducting a study on the impact of Government R&D program evaluation results on budget decisions in my country.
My independent variables are evaluation result dummies for year t-1, namely dum_Grade2 (indicating "Excellent") and dum_Grade3 (indicating "Insufficient").
My dependent variable is the government-proposed budget for year t, log-transformed as ln_GBUDGETt.
Control variables include:
Step 1. Model Selection
(1) I compared Pooled OLS with the Fixed Effect Model using an F-test and rejected the null hypothesis, thereby selecting the Fixed Effect Model.
(2) I compared Pooled OLS with the Random Effect Model using an LM test (xttest0) but failed to reject the null hypothesis, indicating that the Random Effect Model was not suitable.
(3) Since the RE model was deemed inappropriate based on the LM test, the Hausman test was omitted, and the FE model was chosen for further analysis.
Step 2. Testing for Heteroskedasticity and Autocorrelation
To address issues of heteroskedasticity and autocorrelation:
Given the above results, I applied the Fixed Effect Model with cluster-robust standard errors (fe vce(cluster ID)) to account for heteroskedasticity and autocorrelation.
I have the following questions regarding my approach and results:
Question 1: Model Selection Process
Are the steps I followed to determine the appropriate model (F-test for Fixed Effect, LM test for Random Effect, and heteroskedasticity and autocorrelation testing) correct for a balanced panel with n=77, t=11?
Question 2: Correlation vs. Causation
From my correlation analysis, Grade2 ("Excellent") shows a negative correlation with ln_GBUDGETt, while Grade3 ("Insufficient") shows a positive correlation. However, in the panel analysis results, Grade3 exhibits a negative coefficient.
Does this discrepancy between correlation and causation indicate an issue with the analysis, or is it sufficient to explain the inconsistency during interpretation?
Question 3: Heteroskedasticity and Autocorrelation Correction
In many Statalist discussions, cluster-robust standard errors (vce(cluster ID)) are commonly recommended to handle heteroskedasticity and autocorrelation. Would this approach be sufficient for my data (n = 77, t = 11)?
Or would alternative methods such as xtgls, xtscc or pcse be more appropriate given the detected heteroskedasticity and autocorrelation?
Some of my independent variables, such as dum_NationalProject2, change every five years (presidential term), while others, like dum_Congress2, vary annually. Do these characteristics affect the appropriateness of using cluster ID in my analysis?
Thank you in advance for your insights and guidance on these issues. I look forward to your advice.
Best regards,
I am conducting a study on the impact of Government R&D program evaluation results on budget decisions in my country.
My independent variables are evaluation result dummies for year t-1, namely dum_Grade2 (indicating "Excellent") and dum_Grade3 (indicating "Insufficient").
My dependent variable is the government-proposed budget for year t, log-transformed as ln_GBUDGETt.
Control variables include:
- ln_Period: the log-transformed program duration,
- ln_BUDGETt_1: the log-transformed congressional confirmed budget for year t-1,
- dum_Scale2: a dummy variable for large-scale programs,
- dum_NationalProject2: a dummy variable for the president's key projects, which changes depending on the presidential term (5 years in my country; the study spans three terms)
- dum_Congress2: a dummy variable for programs of congressional interest, which varies annually, and
- ln_GDPgrowth: the log-transformed GDP growth rate for year t-1, transformed as log(GDP growth + 1) due to the presence of negative values.
Step 1. Model Selection
(1) I compared Pooled OLS with the Fixed Effect Model using an F-test and rejected the null hypothesis, thereby selecting the Fixed Effect Model.
(2) I compared Pooled OLS with the Random Effect Model using an LM test (xttest0) but failed to reject the null hypothesis, indicating that the Random Effect Model was not suitable.
(3) Since the RE model was deemed inappropriate based on the LM test, the Hausman test was omitted, and the FE model was chosen for further analysis.
Step 2. Testing for Heteroskedasticity and Autocorrelation
To address issues of heteroskedasticity and autocorrelation:
- I ran xttest3 after the Fixed Effect Model to test for heteroskedasticity and confirmed its presence.
- I conducted the Wooldridge test for autocorrelation (xtserial) and found evidence of first-order autocorrelation.
Given the above results, I applied the Fixed Effect Model with cluster-robust standard errors (fe vce(cluster ID)) to account for heteroskedasticity and autocorrelation.
Code:
. xtsum ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth Variable | Mean Std. dev. Min Max | Observations -----------------+--------------------------------------------+---------------- ln_GBU~t overall | 10.53161 1.327408 5.32301 14.25384 | N = 847 between | 1.249814 7.779367 13.7978 | n = 77 within | .4673756 6.761568 11.85269 | T = 11 | | dum_Gr~2 overall | .2857143 .4520209 0 1 | N = 847 between | .2616217 0 .9090909 | n = 77 within | .3697107 -.6233766 1.194805 | T = 11 | | dum_Gr~3 overall | .0932704 .2909828 0 1 | N = 847 between | .1510972 0 .6363636 | n = 77 within | .2492197 -.5430933 1.002361 | T = 11 | | ln_Per~d overall | 2.805034 .6446103 0 4.644391 | N = 847 between | .5950932 1.591119 4.594609 | n = 77 within | .2560713 1.213915 3.611811 | T = 11 | | ln_BUD~1 overall | 10.53581 1.298766 5.32301 14.25384 | N = 847 between | 1.241565 7.852146 13.70951 | n = 77 within | .4043875 6.935485 11.78642 | T = 11 | | dum_Sc~2 overall | .2597403 .4387511 0 1 | N = 847 between | .3858959 0 1 | n = 77 within | .2129486 -.6493506 1.077922 | T = 11 | | dum_Na~2 overall | .8004723 .3998815 0 1 | N = 847 between | .3173396 0 1 | n = 77 within | .2457461 -.0177096 1.618654 | T = 11 | | dum_Co~2 overall | .3730815 .4839092 0 1 | N = 847 between | .2594807 0 1 | n = 77 within | .409431 -.5360094 1.282172 | T = 11 | | ln_GDP~h overall | 1.100366 .7633282 -1.234432 1.66865 | N = 847 between | 0 1.100366 1.100366 | n = 77 within | .7633282 -1.234432 1.66865 | T = 11
Code:
. pwcorr ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, star(0.05) | ln_GBU~t dum_Gr~2 dum_Gr~3 ln_Per~d ln_BUD~1 dum_Sc~2 dum_Na~2 -------------+--------------------------------------------------------------- ln_GBUDGETt | 1.0000 dum_Grade2 | -0.0094 1.0000 dum_Grade3 | 0.0981* -0.2028* 1.0000 ln_Period | 0.1209* 0.1911* -0.0440 1.0000 ln_BUDGETt_1 | 0.9642* -0.0017 0.1148* 0.1381* 1.0000 dum_Scale2 | 0.7134* -0.0409 0.1433* -0.0606 0.7235* 1.0000 dum_Nation~2 | 0.2925* -0.0047 0.0585 0.0328 0.3036* 0.2688* 1.0000 dum_Congre~2 | 0.3300* -0.0826* 0.0800* -0.1461* 0.3269* 0.3392* 0.1408* ln_GDPgrowth | -0.0242 -0.1343* 0.0239 -0.1083* -0.0115 -0.0000 -0.0371 | dum_Co~2 ln_GDP~h -------------+------------------ dum_Congre~2 | 1.0000 ln_GDPgrowth | 0.0714* 1.0000
Code:
. regress ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth Source | SS df MS Number of obs = 847 -------------+---------------------------------- F(8, 838) = 1412.35 Model | 1387.73691 8 173.467114 Prob > F = 0.0000 Residual | 102.92487 838 .12282204 R-squared = 0.9310 -------------+---------------------------------- Adj R-squared = 0.9303 Total | 1490.66178 846 1.76201156 Root MSE = .35046 -------------------------------------------------------------------------------------- ln_GBUDGETt | Coefficient Std. err. t P>|t| [95% conf. interval] ---------------------+---------------------------------------------------------------- dum_Grade2 | -.0290667 .027911 -1.04 0.298 -.0838504 .0257171 dum_Grade3 | -.0777321 .0427223 -1.82 0.069 -.1615874 .0061233 ln_Period | -.0120277 .0200842 -0.60 0.549 -.0514489 .0273934 ln_BUDGETt_1 | .9622544 .0142967 67.31 0.000 .934193 .9903159 dum_Scale2 | .0910479 .0414141 2.20 0.028 .0097605 .1723354 dum_NationalProject2 | -.0088824 .0317637 -0.28 0.780 -.0712282 .0534634 dum_Congress2 | .0363053 .0271923 1.34 0.182 -.0170677 .0896783 ln_GDPgrowth | -.0276368 .0160256 -1.72 0.085 -.0590917 .0038181 _cons | .4431037 .1380401 3.21 0.001 .1721588 .7140487 -------------------------------------------------------------------------------------- . estat vif Variable | VIF 1/VIF -------------+---------------------- ln_BUDGETt_1 | 2.37 0.421091 dum_Scale2 | 2.27 0.439718 dum_Congre~2 | 1.19 0.838468 ln_Period | 1.15 0.866172 dum_Nation~2 | 1.11 0.899870 dum_Grade2 | 1.10 0.912088 dum_Grade3 | 1.06 0.939424 ln_GDPgrowth | 1.03 0.970191 -------------+---------------------- Mean VIF | 1.41
Code:
. // F Test . . xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe Fixed-effects (within) regression Number of obs = 847 Group variable: ID Number of groups = 77 R-squared: Obs per group: Within = 0.5262 min = 11 Between = 0.9848 avg = 11.0 Overall = 0.9257 max = 11 F(8, 762) = 105.77 corr(u_i, Xb) = 0.7558 Prob > F = 0.0000 -------------------------------------------------------------------------------------- ln_GBUDGETt | Coefficient Std. err. t P>|t| [95% conf. interval] ---------------------+---------------------------------------------------------------- dum_Grade2 | -.0177626 .033814 -0.53 0.600 -.0841423 .0486171 dum_Grade3 | -.1330524 .0480334 -2.77 0.006 -.2273459 -.0387588 ln_Period | -.1041668 .0521856 -2.00 0.046 -.2066115 -.0017221 ln_BUDGETt_1 | .7838549 .0332889 23.55 0.000 .718506 .8492038 dum_Scale2 | .2111513 .0630147 3.35 0.001 .0874483 .3348544 dum_NationalProject2 | .0330602 .0481965 0.69 0.493 -.0615535 .1276739 dum_Congress2 | .02627 .0292156 0.90 0.369 -.0310825 .0836225 ln_GDPgrowth | -.0368959 .0159432 -2.31 0.021 -.0681936 -.0055982 _cons | 2.532233 .3416635 7.41 0.000 1.861519 3.202946 ---------------------+---------------------------------------------------------------- sigma_u | .25471816 sigma_e | .33899387 rho | .36085649 (fraction of variance due to u_i) -------------------------------------------------------------------------------------- F test that all u_i=0: F(76, 762) = 1.76 Prob > F = 0.0001 . // LM Test . . xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, re Random-effects GLS regression Number of obs = 847 Group variable: ID Number of groups = 77 R-squared: Obs per group: Within = 0.5201 min = 11 Between = 0.9919 avg = 11.0 Overall = 0.9309 max = 11 Wald chi2(8) = 9521.32 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 -------------------------------------------------------------------------------------- ln_GBUDGETt | Coefficient Std. err. z P>|z| [95% conf. interval] ---------------------+---------------------------------------------------------------- dum_Grade2 | -.0296086 .0284956 -1.04 0.299 -.0854589 .0262417 dum_Grade3 | -.0832005 .0432639 -1.92 0.054 -.1679962 .0015953 ln_Period | -.0144255 .0216639 -0.67 0.505 -.056886 .0280349 ln_BUDGETt_1 | .9562089 .0152102 62.87 0.000 .9263974 .9860204 dum_Scale2 | .1009814 .0434738 2.32 0.020 .0157744 .1861884 dum_NationalProject2 | -.0040338 .0332985 -0.12 0.904 -.0692977 .06123 dum_Congress2 | .0362844 .0273873 1.32 0.185 -.0173938 .0899625 ln_GDPgrowth | -.027873 .0158873 -1.75 0.079 -.0590116 .0032655 _cons | .5079956 .1475616 3.44 0.001 .2187801 .7972111 ---------------------+---------------------------------------------------------------- sigma_u | .04893557 sigma_e | .33899387 rho | .02041309 (fraction of variance due to u_i) -------------------------------------------------------------------------------------- . . xttest0 Breusch and Pagan Lagrangian multiplier test for random effects ln_GBUDGETt[ID,t] = Xb + u[ID] + e[ID,t] Estimated results: | Var SD = sqrt(Var) ---------+----------------------------- ln_GBUD~t | 1.762012 1.327408 e | .1149168 .3389939 u | .0023947 .0489356 Test: Var(u) = 0 chibar2(01) = 0.98 Prob > chibar2 = 0.1610
Code:
. qui xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe . . xttest3 // heteroskedasticity Modified Wald test for groupwise heteroskedasticity in fixed effect regression model H0: sigma(i)^2 = sigma^2 for all i chi2 (77) = 30407.79 Prob>chi2 = 0.0000 . . . // 8. autocorrelation . xtserial ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth Wooldridge test for autocorrelation in panel data H0: no first-order autocorrelation F( 1, 76) = 8.138 Prob > F = 0.0056
Code:
. xtreg ln_GBUDGETt dum_Grade2 dum_Grade3 ln_Period ln_BUDGETt_1 dum_Scale2 dum_NationalProject2 dum_Congress2 ln_GDPgrowth, fe vce(cluster ID) Fixed-effects (within) regression Number of obs = 847 Group variable: ID Number of groups = 77 R-squared: Obs per group: Within = 0.5262 min = 11 Between = 0.9848 avg = 11.0 Overall = 0.9257 max = 11 F(8, 76) = 46.95 corr(u_i, Xb) = 0.7558 Prob > F = 0.0000 (Std. err. adjusted for 77 clusters in ID) -------------------------------------------------------------------------------------- | Robust ln_GBUDGETt | Coefficient std. err. t P>|t| [95% conf. interval] ---------------------+---------------------------------------------------------------- dum_Grade2 | -.0177626 .0323326 -0.55 0.584 -.0821585 .0466333 dum_Grade3 | -.1330524 .0646188 -2.06 0.043 -.2617518 -.0043529 ln_Period | -.1041668 .0853683 -1.22 0.226 -.2741925 .065859 ln_BUDGETt_1 | .7838549 .1440891 5.44 0.000 .4968766 1.070833 dum_Scale2 | .2111513 .122003 1.73 0.088 -.0318388 .4541414 dum_NationalProject2 | .0330602 .0639943 0.52 0.607 -.0943954 .1605158 dum_Congress2 | .02627 .026613 0.99 0.327 -.0267344 .0792744 ln_GDPgrowth | -.0368959 .019917 -1.85 0.068 -.0765641 .0027723 _cons | 2.532233 1.351934 1.87 0.065 -.1603775 5.224843 ---------------------+---------------------------------------------------------------- sigma_u | .25471816 sigma_e | .33899387 rho | .36085649 (fraction of variance due to u_i) --------------------------------------------------------------------------------------
I have the following questions regarding my approach and results:
Question 1: Model Selection Process
Are the steps I followed to determine the appropriate model (F-test for Fixed Effect, LM test for Random Effect, and heteroskedasticity and autocorrelation testing) correct for a balanced panel with n=77, t=11?
Question 2: Correlation vs. Causation
From my correlation analysis, Grade2 ("Excellent") shows a negative correlation with ln_GBUDGETt, while Grade3 ("Insufficient") shows a positive correlation. However, in the panel analysis results, Grade3 exhibits a negative coefficient.
Does this discrepancy between correlation and causation indicate an issue with the analysis, or is it sufficient to explain the inconsistency during interpretation?
Question 3: Heteroskedasticity and Autocorrelation Correction
In many Statalist discussions, cluster-robust standard errors (vce(cluster ID)) are commonly recommended to handle heteroskedasticity and autocorrelation. Would this approach be sufficient for my data (n = 77, t = 11)?
Or would alternative methods such as xtgls, xtscc or pcse be more appropriate given the detected heteroskedasticity and autocorrelation?
Some of my independent variables, such as dum_NationalProject2, change every five years (presidential term), while others, like dum_Congress2, vary annually. Do these characteristics affect the appropriateness of using cluster ID in my analysis?
Thank you in advance for your insights and guidance on these issues. I look forward to your advice.
Best regards,
Comment