Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GMM estimation

    Dear Statalist,

    I am using stata v.14... I have unbalanced panel data with T = 17 and N = 18. I run ols, fe, and re and reached that clustered fe is the best amongst the three model using robust Hausman test. I know somehow that there is multicollinearity, but I read that most of the papers did not test for its existence. Also, I read here that it is oversold. I tested existence of time fixed effect using testsparm but found it insig..

    From literature, there is mostly reversal causality from 2 of my control variables (z1 and z2 )and my interest independent variable which I use 3 proxies for it in separate runs (M1, M2, and M3). most of the papers use two-step GMM technique to account for this problem and their results do not change so much, if not better.. the following is the results of the clustered fe . i used clustered due to the serial correlation and hetero problems.
    Code:
    xtreg lny lnz1 x1 x2 x3 x4 x5 m1 m1sq z2 x6 x7 lm1_z2 , fe cluster(country)
    
    Fixed-effects (within) regression Number of obs = 190
    Group variable: country Number of groups = 18
    
    R-sq:                             Obs per group:
    within = 0.7369               min = 1
    between = 0.0005          avg = 10.6
    overall = 0.1111              max = 17
    
    F(12,17) = 99.23
    corr(u_i, Xb) = -0.6006 Prob > F = 0.0000
    
    (Std. Err. adjusted for 18 clusters in country)
    ------------------------------------------------------------------------------
    | Robust
    lny |        Coef.      Std. Err.     t        P>|t|      [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    lnz1 |     .030846  .0132167    2.33   0.032   .0029613   .0587307
    x1|        .0006796 .0004414   1.54    0.142   -.0002518  .0016109
    x2|       -.0026081 .0009207  -2.83   0.011    -.0045505   -.0006657
    x3|       -.0156356 .0083621  -1.87   0.079   -.033278     .0020069
    x4|       -.000387   .0001654  -2.34   0.032   -.000736     -.0000379
    x5|       -.0000584 .0002823  -0.21   0.839    -.0006539  .0005371
    m1|       .0722394  .0158132  4.57   0.000    .0388764   .1056023
    m1sq|  -.0151598  .0036502 -4.15   0.001   -.0228609   -.0074586
    z2|       -.0636088  .0188611  -3.37  0.004  -.1034023    -.0238153
    x6|        .0482388  .0224125   2.15  0.046   .0009526    .0955249
    x7|        .0003355  .0002021   1.66  0.115  -.0000909    .0007619
    m1_z2  .0191029  .0070425   2.71  0.015   .0042446    .0339612
    _cons | 2.845722  .3811176   7.47   0.000    2.041634     3.64981
    -------------+----------------------------------------------------------------
    sigma_u | .18251934
    sigma_e | .00809359
    rho | .99803749 (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    .

    However, when I run two-step system GMM, the results change dramatically eroding the significance of most, if not all, the variables, although the model is valid according to AR(2), sargan , and Hansan tests, as follows:
    Code:
     
    xtabond2 lny l.lny lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7, /// gmm (l.lny , lag(1 3) collapse) iv ( lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7 ) twostep cluster(country) nodiffsargan Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: country Number of obs = 177 Time variable : time Number of groups = 17 Number of instruments = 17 Obs per group: min = 1 Wald chi2(13) = 1.40e+10 avg = 10.41 Prob > chi2 = 0.000 max = 16 (Std. Err. adjusted for clustering on country) ------------------------------------------------------------------------------ | Corrected lny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lny| L1. | 1.041343 .0277953 37.46 0.000 .9868654 1.095821 | lnz1 | .0005234 .0033601 0.16 0.876 -.0060623 .007109 x1| .0000164 .0000536 0.31 0.760 -.0000887 .0001214 x2| .0000259 .0001718 0.15 0.880 -.0003108 .0003625 x3| .0002729 .0005714 0.48 0.633 -.0008469 .0013927 x4| -.0000124 .0000428 -0.29 0.772 -.0000962 .0000714 x5| -.0001993 .0002734 -0.73 0.466 -.0007351 .0003365 m1| .0052374 .0120197 0.44 0.663 -.0183208 .0287955 m1sq | -.0018499 .0028094 -0.66 0.510 -.0073563 .0036564 m1_z2 | -.0002276 .0034992 -0.07 0.948 -.0070859 .0066308 z2| -.0019458 .0062422 -0.31 0.755 -.0141803 .0102886 x6| -.002183 .0010165 -2.15 0.032 -.0041753 -.0001908 x7| .0000406 .0000986 0.41 0.680 -.0001525 .0002338 _cons | -.1274088 .1413448 -0.90 0.367 -.4044396 .149622 ------------------------------------------------------------------------------ Instruments for first differences equation Standard D.(lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7) GMM-type (missing=0, separate instruments for each period unless collapsed) L(1/3).L.lny collapsed Instruments for levels equation Standard lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7 _cons GMM-type (missing=0, separate instruments for each period unless collapsed) D.L.lngini collapsed ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -3.09 Pr > z = 0.002 Arellano-Bond test for AR(2) in first differences: z = -0.89 Pr > z = 0.375 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(3) = 0.53 Prob > chi2 = 0.913 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(3) = 0.49 Prob > chi2 = 0.921 (Robust, but weakened by many instruments.)

    So what is the wrong in my command pls. I tried to put the endogenous variables in the gmm style part, however it does not provide much changes in the significance problem, unfortunately. I also tried to change the number of lags, however it is relatively useless as well. I also have other questions :

    2- is it necessary to have the coefficient of L.lny to be less than 1 as I read. does it really mean that the model is unstable if it is higher than 1?
    3- I read it is better to add "collapse" to reduce the number of instruments. I found that when I don't, the number of instruments become really high with sargan and Hansan tests to be 1, indicating instrument proliferation problem. so is it correct to adding it ?
    4- I only have 60% of complete observation (190/(17*18)).. does this high % of missing affect my estimation results when I go for GMM ? shall I use FOD as I read it is better in case of unbalanced data. However, it does not yield a better results as well.
    5- would it differ adding small option. I noticed it use normal distribution instead of T in calculating statistics of coefficients. Is it okay to use z instead of t ?

    much appreciated for any advice !!
    Last edited by Nariman Sayed; 15 Jan 2023, 12:53.

  • #2
    any help ... please..

    Comment

    Working...
    X