Good morning all,
I am running a dynamic panel looking at the relationship between terms of trade growth and real GDP growth (including more controls and country and year fixed effects), using N=120 and T=40. The simple model I'm looking to run is:

My main question is about whether I should run difference GMM on the already-differenced variables or if I need to run it on the levels (which gives strange results), and whether you have any other insights on the models.
The simple specification would involve OLS with fixed effects as T is large. Running the specification using log levels
gives very high coefficients on the lagged dependent variable and the R squared is 0.9993, pointing that the model is probably misspecified and I should be using log differences instead - which is more what my interest is anyway. See below (I remove the coefficients on the fixed effects from the codes for ease of reading)
If I use the first difference of GDP and TOT, this would become
However, using log differences does not allow me to use OLS with FE as the regressors will be correlated with the error term by construction. This is it, however:
I would like to use Anderson-Hsiao IV (with L2.lngdp as an instrument) or difference GMM in this case, but I am not sure what the right way to implement difference GMM is in Stata: would I run it with D.lncgdp and D.lntot or lncgdp and lntot as the GMM estimation on its own takes first differences? How would my coefficients be interpreted in that case? Running it with lngdp and lntot still gives a very high coefficient on the lagged dependent variable which worries me + what I am looking for is actually the relationship between GDP and TOT growth. Using D.lncgdp and D.lntot gives more healthy-looking estimates.
This is what
gives me:
Running GMM on the already-differenced variables with the level as the instrument, then, gives a more similar coefficient on the lagged variable to the OLS one:
Using Anderson-Hsiao-style IV gives an absurdly high coefficient on the lagged variable (would imply that a 1% increase in the growth of GDP last year is associated with a 1.4% increase in the growth of GDP this year) and the centred R2 is -1.13..
Any thoughts?
Thank you!
I am running a dynamic panel looking at the relationship between terms of trade growth and real GDP growth (including more controls and country and year fixed effects), using N=120 and T=40. The simple model I'm looking to run is:
My main question is about whether I should run difference GMM on the already-differenced variables or if I need to run it on the levels (which gives strange results), and whether you have any other insights on the models.
The simple specification would involve OLS with fixed effects as T is large. Running the specification using log levels
HTML Code:
reg lncgdp L.lncgdp lntot i.country i.year, r
Code:
reg lncgdp L.lncgdp lntotx i.country i.year, r Linear regression Number of obs = 2,074 F(92, 1981) = 56427.43 Prob > F = 0.0000 R-squared = 0.9993 Root MSE = .04408 ------------------------------------------------------------------------------------------- | Robust lncgdp | Coefficient std. err. t P>|t| [95% conf. interval] --------------------------+---------------------------------------------------------------- lncgdp | L1. | .9829976 .0056776 173.14 0.000 .971863 .9941322 | lntotx | -.0001803 .0045674 -0.04 0.969 -.0091378 .0087772
Code:
reg D.lncgdp L.D.lncgdp D.lntot i.country i.year, r
Code:
reg dcgdp L.dcgdp dtotx i.country i.year, r Linear regression Number of obs = 2,027 F(91, 1935) = 6.57 Prob > F = 0.0000 R-squared = 0.1893 Root MSE = .04268 ------------------------------------------------------------------------------------------- | Robust dcgdp | Coefficient std. err. t P>|t| [95% conf. interval] --------------------------+---------------------------------------------------------------- dcgdp | L1. | .2230587 .0397349 5.61 0.000 .1451309 .3009865 | dtotx | .0347277 .043905 0.79 0.429 -.0513784 .1208338
This is what
Code:
xtabond2 lncgdp L.lncgdp lntotx i.country i.year, gmm(L.lncgdp) noleveleq r
Code:
xtabond2 lncgdp L.lncgdp lntotx i.country i.year, gmm(L.lncgdp) noleveleq r Warning: Number of instruments may be large relative to number of observations. Warning: Two-step estimated covariance matrix of moments is singular. Using a generalized inverse to calculate robust weighting matrix for Hansen test. Difference-in-Sargan/Hansen statistics may be negative. Dynamic panel-data estimation, one-step difference GMM ------------------------------------------------------------------------------ Group variable: country Number of obs = 2027 Time variable : year Number of groups = 47 Number of instruments = 990 Obs per group: min = 32 Wald chi2(0) = . avg = 43.13 Prob > chi2 = . max = 44 ------------------------------------------------------------------------------ | Robust lncgdp | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- lncgdp | L1. | .9499016 .0133067 71.39 0.000 .923821 .9759823 | lntotx | -.0150696 .0099702 -1.51 0.131 -.0346108 .0044716 ------------------------------------------------------------------------------ Instruments for first differences equation GMM-type (missing=0, separate instruments for each period unless collapsed) L(1/45).L.lncgdp ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -4.39 Pr > z = 0.000 Arellano-Bond test for AR(2) in first differences: z = -1.30 Pr > z = 0.194 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(944) =1370.11 Prob > chi2 = 0.000 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(944) = 2.61 Prob > chi2 = 1.000 (Robust, but weakened by many instruments.)
Code:
xtabond2 dcgdp L.dcgdp dtotx i.country i.year, gmm(L.lncgdp) noleveleq r Dynamic panel-data estimation, one-step difference GMM ------------------------------------------------------------------------------ Group variable: country Number of obs = 1980 Time variable : year Number of groups = 47 Number of instruments = 989 Obs per group: min = 31 Wald chi2(0) = . avg = 42.13 Prob > chi2 = . max = 43 ------------------------------------------------------------------------------ | Robust dcgdp | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- dcgdp | L1. | .2182032 .0443686 4.92 0.000 .1312423 .3051641 | dtotx | -.0018409 .0144273 -0.13 0.898 -.0301179 .0264362 | Instruments for first differences equation GMM-type (missing=0, separate instruments for each period unless collapsed) L(1/45).L.lncgdp ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -4.98 Pr > z = 0.000 Arellano-Bond test for AR(2) in first differences: z = 0.75 Pr > z = 0.456 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(944) =1028.39 Prob > chi2 = 0.029 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(944) = 2.90 Prob > chi2 = 1.000 (Robust, but weakened by many instruments.)
Using Anderson-Hsiao-style IV gives an absurdly high coefficient on the lagged variable (would imply that a 1% increase in the growth of GDP last year is associated with a 1.4% increase in the growth of GDP this year) and the centred R2 is -1.13..
Code:
ivreg2 dcgdp dtotx i.country i.year (L.dcgdp = L2.lncgdp), r IV (2SLS) estimation -------------------- Estimates efficient for homoskedasticity only Statistics robust to heteroskedasticity Number of obs = 2027 F( 91, 1935) = 2.82 Prob > F = 0.0000 Total (centered) SS = 4.34847292 Centered R2 = -1.1282 Total (uncentered) SS = 6.941141002 Uncentered R2 = -0.3333 Residual SS = 9.25437924 Root MSE = .06757 ------------------------------------------------------------------------------------------- | Robust dcgdp | Coefficient std. err. z P>|z| [95% conf. interval] --------------------------+---------------------------------------------------------------- dcgdp | L1. | 1.444518 .4410191 3.28 0.001 .5801359 2.308899 | dtotx | .0299543 .0171802 1.74 0.081 -.0037182 .0636268 | Underidentification test (Kleibergen-Paap rk LM statistic): 10.688 Chi-sq(1) P-val = 0.0011 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 15.366 (Kleibergen-Paap rk Wald F statistic): 10.561 Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38 15% maximal IV size 8.96 20% maximal IV size 6.66 25% maximal IV size 5.53 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors. ------------------------------------------------------------------------------ Hansen J statistic (overidentification test of all instruments): 0.000 (equation exactly identified) ------------------------------------------------------------------------------ Instrumented: L.dcgdp
Any thoughts?
Thank you!