GMM estimation

Nariman Sayed

Join Date: Oct 2022
Posts: 29

15 Jan 2023, 11:27

Dear Statalist,

I am using stata v.14... I have unbalanced panel data with T = 17 and N = 18. I run ols, fe, and re and reached that clustered fe is the best amongst the three model using robust Hausman test. I know somehow that there is multicollinearity, but I read that most of the papers did not test for its existence. Also, I read here that it is oversold. I tested existence of time fixed effect using testsparm but found it insig..

From literature, there is mostly reversal causality from 2 of my control variables (z1 and z2 )and my interest independent variable which I use 3 proxies for it in separate runs (M1, M2, and M3). most of the papers use two-step GMM technique to account for this problem and their results do not change so much, if not better.. the following is the results of the clustered fe . i used clustered due to the serial correlation and hetero problems.

Code:

xtreg lny lnz1 x1 x2 x3 x4 x5 m1 m1sq z2 x6 x7 lm1_z2 , fe cluster(country)

Fixed-effects (within) regression Number of obs = 190
Group variable: country Number of groups = 18

R-sq:                             Obs per group:
within = 0.7369               min = 1
between = 0.0005          avg = 10.6
overall = 0.1111              max = 17

F(12,17) = 99.23
corr(u_i, Xb) = -0.6006 Prob > F = 0.0000

(Std. Err. adjusted for 18 clusters in country)
------------------------------------------------------------------------------
| Robust
lny |        Coef.      Std. Err.     t        P>|t|      [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnz1 |     .030846  .0132167    2.33   0.032   .0029613   .0587307
x1|        .0006796 .0004414   1.54    0.142   -.0002518  .0016109
x2|       -.0026081 .0009207  -2.83   0.011    -.0045505   -.0006657
x3|       -.0156356 .0083621  -1.87   0.079   -.033278     .0020069
x4|       -.000387   .0001654  -2.34   0.032   -.000736     -.0000379
x5|       -.0000584 .0002823  -0.21   0.839    -.0006539  .0005371
m1|       .0722394  .0158132  4.57   0.000    .0388764   .1056023
m1sq|  -.0151598  .0036502 -4.15   0.001   -.0228609   -.0074586
z2|       -.0636088  .0188611  -3.37  0.004  -.1034023    -.0238153
x6|        .0482388  .0224125   2.15  0.046   .0009526    .0955249
x7|        .0003355  .0002021   1.66  0.115  -.0000909    .0007619
m1_z2  .0191029  .0070425   2.71  0.015   .0042446    .0339612
_cons | 2.845722  .3811176   7.47   0.000    2.041634     3.64981
-------------+----------------------------------------------------------------
sigma_u | .18251934
sigma_e | .00809359
rho | .99803749 (fraction of variance due to u_i)
------------------------------------------------------------------------------

However, when I run two-step system GMM, the results change dramatically eroding the significance of most, if not all, the variables, although the model is valid according to AR(2), sargan , and Hansan tests, as follows:

Code:

 xtabond2 lny l.lny lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7, ///
gmm (l.lny , lag(1 3) collapse) iv ( lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7 ) twostep cluster(country) nodiffsargan



Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: country         Number of obs = 177
Time variable : time         Number of groups = 17
Number of instruments = 17        Obs per group: min = 1
Wald chi2(13) = 1.40e+10        avg = 10.41
Prob > chi2 = 0.000        max = 16
(Std. Err. adjusted for clustering on country)
------------------------------------------------------------------------------
| Corrected
lny |    Coef.   Std. Err.    z    P>|z|   [95% Conf. Interval]
-------------+----------------------------------------------------------------
lny|
L1. |        1.041343   .0277953  37.46    0.000 .9868654  1.095821
|
lnz1 |   .0005234   .0033601   0.16    0.876 -.0060623 .007109
x1|      .0000164   .0000536   0.31     0.760 -.0000887 .0001214
x2|       .0000259   .0001718   0.15    0.880 -.0003108 .0003625
x3|      .0002729   .0005714   0.48     0.633 -.0008469 .0013927
x4|      -.0000124   .0000428   -0.29    0.772 -.0000962 .0000714
x5|     -.0001993   .0002734   -0.73    0.466 -.0007351 .0003365
m1|     .0052374   .0120197   0.44     0.663 -.0183208 .0287955
m1sq | -.0018499   .0028094   -0.66    0.510 -.0073563 .0036564
m1_z2 | -.0002276   .0034992   -0.07    0.948 -.0070859 .0066308
z2|    -.0019458   .0062422   -0.31     0.755 -.0141803 .0102886
x6|      -.002183   .0010165   -2.15     0.032 -.0041753 -.0001908
x7|     .0000406   .0000986   0.41      0.680 -.0001525 .0002338
_cons | -.1274088   .1413448   -0.90     0.367 -.4044396 .149622
------------------------------------------------------------------------------
Instruments for first differences equation
Standard
D.(lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(1/3).L.lny collapsed
Instruments for levels equation
Standard
lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7
_cons
GMM-type (missing=0, separate instruments for each period unless collapsed)
D.L.lngini collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -3.09 Pr > z = 0.002
Arellano-Bond test for AR(2) in first differences: z = -0.89 Pr > z = 0.375
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(3) = 0.53 Prob > chi2 = 0.913
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(3) = 0.49 Prob > chi2 = 0.921
(Robust, but weakened by many instruments.)

So what is the wrong in my command pls. I tried to put the endogenous variables in the gmm style part, however it does not provide much changes in the significance problem, unfortunately. I also tried to change the number of lags, however it is relatively useless as well. I also have other questions :

2- is it necessary to have the coefficient of L.lny to be less than 1 as I read. does it really mean that the model is unstable if it is higher than 1?
3- I read it is better to add "collapse" to reduce the number of instruments. I found that when I don't, the number of instruments become really high with sargan and Hansan tests to be 1, indicating instrument proliferation problem. so is it correct to adding it ?
4- I only have 60% of complete observation (190/(17*18)).. does this high % of missing affect my estimation results when I go for GMM ? shall I use FOD as I read it is better in case of unbalanced data. However, it does not yield a better results as well.
5- would it differ adding small option. I noticed it use normal distribution instead of T in calculating statistics of coefficients. Is it okay to use z instead of t ?

much appreciated for any advice !!

Last edited by Nariman Sayed; 15 Jan 2023, 11:53.

Tags: endogeneity, gmm, panel, reversal causality, Time Series

Nariman Sayed

Join Date: Oct 2022

Posts: 29
#2

18 Jan 2023, 15:34

any help ... please..
Comment

Announcement

GMM estimation

Comment