Hello everyone,
I am trying to work out how to complete a DiD (difference in difference) analysis with Difference GMM and System GMM using Panel Data. Because my T is small and I am working with a cluster structure, I am using xtabond2 for the estimates.
My data set contains 8,232 students in a Panel Data format with T=5. For each student, I have the test scores (depvar) and a list of observed variables over the time period (indepvar). Morover, I have time, school and class fixed effects.
During the time series (2003-2008), a policy change is implemented in state schools in year 2007. Then, students from state schools are my treatment group and students from municipal schools are the control group. My DiD is 1 if student is enrolled in state schools (treated) in post-treatment period (time).
In the model I assume L1.profic_mat as endogenous, the control variables as predetermined and the fixed effects and DiD as exogenous. Then for the system GMM I estimate the following model: (PS: Coefficients for control variables and fixed effects are not show to save space).
For the difference GMM, I have:
However, I am still unsure whether this specification is right, because the values of the DiD coefficients for System and Difference GMM are very different between themselves. When I estimate the model with FE (with no lagged variable) the result is also very different.
For this reason the question: Is my specification of DiD in this GMM right?
I am not sure, whether the DiD in this case will work exactly in the same way as in a linear model. I need help with the implentation of DiD in this GMM and with the interpretation of its coefficient.
I am thankful for all help and Information.
I am trying to work out how to complete a DiD (difference in difference) analysis with Difference GMM and System GMM using Panel Data. Because my T is small and I am working with a cluster structure, I am using xtabond2 for the estimates.
My data set contains 8,232 students in a Panel Data format with T=5. For each student, I have the test scores (depvar) and a list of observed variables over the time period (indepvar). Morover, I have time, school and class fixed effects.
During the time series (2003-2008), a policy change is implemented in state schools in year 2007. Then, students from state schools are my treatment group and students from municipal schools are the control group. My DiD is 1 if student is enrolled in state schools (treated) in post-treatment period (time).
In the model I assume L1.profic_mat as endogenous, the control variables as predetermined and the fixed effects and DiD as exogenous. Then for the system GMM I estimate the following model: (PS: Coefficients for control variables and fixed effects are not show to save space).
Code:
xi: xtabond2 L(0/1).profic_mat DiD time treated $controlvar i.wave i.IDescola i.IDturma, /// gmm(L1.profic_mat,lag(1 1)) /// gmmstyle($controlvar) /// iv(DiD time treated i.wave i.IDescola i.IDturma, equation(level)) /// cluster(IDescola) twostep small orthogonal i.wave _Iwave_1-5 (naturally coded; _Iwave_1 omitted) i.IDescola _IIDescola_35018348-35924957(naturally coded; _IIDescola_35018348 omitted) i.IDturma _IIDturma_269-3809 (naturally coded; _IIDturma_269 omitted) Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm. Warning: Two-step estimated covariance matrix of moments is singular. Using a generalized inverse to calculate optimal weighting matrix for two-step estimation. Difference-in-Sargan/Hansen statistics may be negative. Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: IDaluno Number of obs = 4056 Time variable : wave Number of groups = 1713 Number of instruments = 755 Obs per group: min = 1 F(644, 31) = 49235.39 avg = 2.37 Prob > F = 0.000 max = 4 (Std. Err. adjusted for clustering on IDescola) ------------------------------------------------------------------------------------- | Corrected profic_mat | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------------+---------------------------------------------------------------- profic_mat | L1. | .3678468 .0810637 4.54 0.000 .2025164 .5331773 | DiD | 160.1348 57.99518 2.76 0.010 41.85286 278.4168 time | 0 (omitted) treated | -82.91613 87.8292 -0.94 0.352 -262.045 96.2127
For the difference GMM, I have:
Code:
xi: xtabond2 L(0/1).profic_mat DiD time treated $controlvar i.wave i.IDescola i.IDturma, /// gmm(L1.profic_mat,lag(1 1)) /// gmmstyle($controlvar) /// iv(DiD time treated i.wave i.IDescola i.IDturma, equation(level)) /// cluster(IDescola) twostep small orthogonal noleveleq
However, I am still unsure whether this specification is right, because the values of the DiD coefficients for System and Difference GMM are very different between themselves. When I estimate the model with FE (with no lagged variable) the result is also very different.
For this reason the question: Is my specification of DiD in this GMM right?
I am not sure, whether the DiD in this case will work exactly in the same way as in a linear model. I need help with the implentation of DiD in this GMM and with the interpretation of its coefficient.
I am thankful for all help and Information.
Comment