Hello everyone,
I am estimating a DiD (difference in difference) with a lagged dependent variable (t-1) using xtreg and regress.
My data set contains 8,232 students in a panel data format with T=5 (waves). For each student, I have the test scores (Zprofic_mat) and a list of observed variables ($controlvar) over the time period. Then, I create the lagged dependent variable for t-1.
During the time period (2003-2008), a policy change is implemented in state schools in year 2007. Then, students from state schools are my treatment group and students from municipal schools are the control group. My DiD is 1 if student is enrolled in state schools (treated) in post-treatment period (time).
When I estimate the model using regress, the results look good. Note that (as expected) the lagged variable (ZMat_L1) is positive indicating that there is a strong correlation between test scores across time.
But when I estimate the same model using xtreg, the ZMat_L1 decreases and becomes negative.
To be honest, I do not understand the reason for this change in the lagged dependent variable between xtreg and regress. Can please anyone helps me with the interpretation?
PS: Please note that the model above is only a reduced form for viewing purposes. In the "real" estimation, I will include the control variables, school and time fixed effects, and cluster the standard errors at class level. For this reason I would prefer to apply the xtreg for the estimation.
Any advice would be highly appreciated!
Thanks in advance.
I am estimating a DiD (difference in difference) with a lagged dependent variable (t-1) using xtreg and regress.
My data set contains 8,232 students in a panel data format with T=5 (waves). For each student, I have the test scores (Zprofic_mat) and a list of observed variables ($controlvar) over the time period. Then, I create the lagged dependent variable for t-1.
generate ZMat_L1 = L1.Zprofic_mat
When I estimate the model using regress, the results look good. Note that (as expected) the lagged variable (ZMat_L1) is positive indicating that there is a strong correlation between test scores across time.
PHP Code:
reg Zprofic_mat ZMat_L1 DiD time treated
Source | SS df MS Number of obs = 12103
-------------+------------------------------ F( 4, 12098) = 4245.11
Model | 6335.16585 4 1583.79146 Prob > F = 0.0000
Residual | 4513.59566 12098 .373086101 R-squared = 0.5840
-------------+------------------------------ Adj R-squared = 0.5838
Total | 10848.7615 12102 .896443687 Root MSE = .61081
------------------------------------------------------------------------------
Zprofic_mat | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ZMat_L1 | .7728753 .0061332 126.01 0.000 .7608532 .7848974
DiD | -.0511167 .0255607 -2.00 0.046 -.1012198 -.0010137
time | .0949971 .0176783 5.37 0.000 .0603447 .1296494
treated | .1167293 .0131256 8.89 0.000 .0910011 .1424576
_cons | -.0654067 .0097313 -6.72 0.000 -.0844815 -.0463318
------------------------------------------------------------------------------
But when I estimate the same model using xtreg, the ZMat_L1 decreases and becomes negative.
PHP Code:
xtreg Zprofic_mat ZMat_L1 DiD time treated, fe
Fixed-effects (within) regression Number of obs = 12103
Group variable: IDaluno Number of groups = 4881
R-sq: within = 0.0231 Obs per group: min = 1
between = 0.1807 avg = 2.5
overall = 0.0823 max = 4
F(4,7218) = 42.71
corr(u_i, Xb) = -0.3905 Prob > F = 0.0000
------------------------------------------------------------------------------
Zprofic_mat | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ZMat_L1 | -.0408561 .0111915 -3.65 0.000 -.0627948 -.0189175
DiD | .0807754 .0234132 3.45 0.001 .0348787 .1266721
time | .1067221 .0160304 6.66 0.000 .0752977 .1381465
treated | .0472461 .1129775 0.42 0.676 -.1742229 .268715
_cons | -.184291 .058919 -3.13 0.002 -.2997895 -.0687925
-------------+----------------------------------------------------------------
sigma_u | .9238459
sigma_e | .47841538
rho | .78853743 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4880, 7218) = 2.56 Prob > F = 0.0000
To be honest, I do not understand the reason for this change in the lagged dependent variable between xtreg and regress. Can please anyone helps me with the interpretation?
PS: Please note that the model above is only a reduced form for viewing purposes. In the "real" estimation, I will include the control variables, school and time fixed effects, and cluster the standard errors at class level. For this reason I would prefer to apply the xtreg for the estimation.
Any advice would be highly appreciated!
Thanks in advance.
Comment