Dear Statalist,
I am using stata v.14... I have unbalanced panel data with T = 17 and N = 18. I run ols, fe, and re and reached that clustered fe is the best amongst the three model using robust Hausman test. I know somehow that there is multicollinearity, but I read that most of the papers did not test for its existence. Also, I read here that it is oversold. I tested existence of time fixed effect using testsparm but found it insig..
From literature, there is mostly reversal causality from 2 of my control variables (z1 and z2 )and my interest independent variable which I use 3 proxies for it in separate runs (M1, M2, and M3). most of the papers use two-step GMM technique to account for this problem and their results do not change so much, if not better.. the following is the results of the clustered fe . i used clustered due to the serial correlation and hetero problems.
.
However, when I run two-step system GMM, the results change dramatically eroding the significance of most, if not all, the variables, although the model is valid according to AR(2), sargan , and Hansan tests, as follows:
So what is the wrong in my command pls. I tried to put the endogenous variables in the gmm style part, however it does not provide much changes in the significance problem, unfortunately. I also tried to change the number of lags, however it is relatively useless as well. I also have other questions :
2- is it necessary to have the coefficient of L.lny to be less than 1 as I read. does it really mean that the model is unstable if it is higher than 1?
3- I read it is better to add "collapse" to reduce the number of instruments. I found that when I don't, the number of instruments become really high with sargan and Hansan tests to be 1, indicating instrument proliferation problem. so is it correct to adding it ?
4- I only have 60% of complete observation (190/(17*18)).. does this high % of missing affect my estimation results when I go for GMM ? shall I use FOD as I read it is better in case of unbalanced data. However, it does not yield a better results as well.
5- would it differ adding small option. I noticed it use normal distribution instead of T in calculating statistics of coefficients. Is it okay to use z instead of t ?
much appreciated for any advice !!
I am using stata v.14... I have unbalanced panel data with T = 17 and N = 18. I run ols, fe, and re and reached that clustered fe is the best amongst the three model using robust Hausman test. I know somehow that there is multicollinearity, but I read that most of the papers did not test for its existence. Also, I read here that it is oversold. I tested existence of time fixed effect using testsparm but found it insig..
From literature, there is mostly reversal causality from 2 of my control variables (z1 and z2 )and my interest independent variable which I use 3 proxies for it in separate runs (M1, M2, and M3). most of the papers use two-step GMM technique to account for this problem and their results do not change so much, if not better.. the following is the results of the clustered fe . i used clustered due to the serial correlation and hetero problems.
Code:
xtreg lny lnz1 x1 x2 x3 x4 x5 m1 m1sq z2 x6 x7 lm1_z2 , fe cluster(country) Fixed-effects (within) regression Number of obs = 190 Group variable: country Number of groups = 18 R-sq: Obs per group: within = 0.7369 min = 1 between = 0.0005 avg = 10.6 overall = 0.1111 max = 17 F(12,17) = 99.23 corr(u_i, Xb) = -0.6006 Prob > F = 0.0000 (Std. Err. adjusted for 18 clusters in country) ------------------------------------------------------------------------------ | Robust lny | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnz1 | .030846 .0132167 2.33 0.032 .0029613 .0587307 x1| .0006796 .0004414 1.54 0.142 -.0002518 .0016109 x2| -.0026081 .0009207 -2.83 0.011 -.0045505 -.0006657 x3| -.0156356 .0083621 -1.87 0.079 -.033278 .0020069 x4| -.000387 .0001654 -2.34 0.032 -.000736 -.0000379 x5| -.0000584 .0002823 -0.21 0.839 -.0006539 .0005371 m1| .0722394 .0158132 4.57 0.000 .0388764 .1056023 m1sq| -.0151598 .0036502 -4.15 0.001 -.0228609 -.0074586 z2| -.0636088 .0188611 -3.37 0.004 -.1034023 -.0238153 x6| .0482388 .0224125 2.15 0.046 .0009526 .0955249 x7| .0003355 .0002021 1.66 0.115 -.0000909 .0007619 m1_z2 .0191029 .0070425 2.71 0.015 .0042446 .0339612 _cons | 2.845722 .3811176 7.47 0.000 2.041634 3.64981 -------------+---------------------------------------------------------------- sigma_u | .18251934 sigma_e | .00809359 rho | .99803749 (fraction of variance due to u_i) ------------------------------------------------------------------------------
However, when I run two-step system GMM, the results change dramatically eroding the significance of most, if not all, the variables, although the model is valid according to AR(2), sargan , and Hansan tests, as follows:
Code:
xtabond2 lny l.lny lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7, /// gmm (l.lny , lag(1 3) collapse) iv ( lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7 ) twostep cluster(country) nodiffsargan Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: country Number of obs = 177 Time variable : time Number of groups = 17 Number of instruments = 17 Obs per group: min = 1 Wald chi2(13) = 1.40e+10 avg = 10.41 Prob > chi2 = 0.000 max = 16 (Std. Err. adjusted for clustering on country) ------------------------------------------------------------------------------ | Corrected lny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lny| L1. | 1.041343 .0277953 37.46 0.000 .9868654 1.095821 | lnz1 | .0005234 .0033601 0.16 0.876 -.0060623 .007109 x1| .0000164 .0000536 0.31 0.760 -.0000887 .0001214 x2| .0000259 .0001718 0.15 0.880 -.0003108 .0003625 x3| .0002729 .0005714 0.48 0.633 -.0008469 .0013927 x4| -.0000124 .0000428 -0.29 0.772 -.0000962 .0000714 x5| -.0001993 .0002734 -0.73 0.466 -.0007351 .0003365 m1| .0052374 .0120197 0.44 0.663 -.0183208 .0287955 m1sq | -.0018499 .0028094 -0.66 0.510 -.0073563 .0036564 m1_z2 | -.0002276 .0034992 -0.07 0.948 -.0070859 .0066308 z2| -.0019458 .0062422 -0.31 0.755 -.0141803 .0102886 x6| -.002183 .0010165 -2.15 0.032 -.0041753 -.0001908 x7| .0000406 .0000986 0.41 0.680 -.0001525 .0002338 _cons | -.1274088 .1413448 -0.90 0.367 -.4044396 .149622 ------------------------------------------------------------------------------ Instruments for first differences equation Standard D.(lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7) GMM-type (missing=0, separate instruments for each period unless collapsed) L(1/3).L.lny collapsed Instruments for levels equation Standard lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7 _cons GMM-type (missing=0, separate instruments for each period unless collapsed) D.L.lngini collapsed ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -3.09 Pr > z = 0.002 Arellano-Bond test for AR(2) in first differences: z = -0.89 Pr > z = 0.375 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(3) = 0.53 Prob > chi2 = 0.913 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(3) = 0.49 Prob > chi2 = 0.921 (Robust, but weakened by many instruments.)
So what is the wrong in my command pls. I tried to put the endogenous variables in the gmm style part, however it does not provide much changes in the significance problem, unfortunately. I also tried to change the number of lags, however it is relatively useless as well. I also have other questions :
2- is it necessary to have the coefficient of L.lny to be less than 1 as I read. does it really mean that the model is unstable if it is higher than 1?
3- I read it is better to add "collapse" to reduce the number of instruments. I found that when I don't, the number of instruments become really high with sargan and Hansan tests to be 1, indicating instrument proliferation problem. so is it correct to adding it ?
4- I only have 60% of complete observation (190/(17*18)).. does this high % of missing affect my estimation results when I go for GMM ? shall I use FOD as I read it is better in case of unbalanced data. However, it does not yield a better results as well.
5- would it differ adding small option. I noticed it use normal distribution instead of T in calculating statistics of coefficients. Is it okay to use z instead of t ?
much appreciated for any advice !!
Comment