DiD with a lagged variable using xtreg and regress.

Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#1

DiD with a lagged variable using xtreg and regress.

29 Dec 2021, 04:17

Hello everyone,

I am estimating a DiD (difference in difference) with a lagged dependent variable (t-1) using xtreg and regress.
My data set contains 8,232 students in a panel data format with T=5 (waves). For each student, I have the test scores (Zprofic_mat) and a list of observed variables ($controlvar) over the time period. Then, I create the lagged dependent variable for t-1.

generate ZMat_L1 = L1.Zprofic_mat

During the time period (2003-2008), a policy change is implemented in state schools in year 2007. Then, students from state schools are my treatment group and students from municipal schools are the control group. My DiD is 1 if student is enrolled in state schools (treated) in post-treatment period (time).

When I estimate the model using regress, the results look good. Note that (as expected) the lagged variable (ZMat_L1) is positive indicating that there is a strong correlation between test scores across time.

PHP Code:

reg Zprofic_mat ZMat_L1 DiD time treated Source | SS df MS Number of obs = 12103 -------------+------------------------------ F( 4, 12098) = 4245.11 Model | 6335.16585 4 1583.79146 Prob > F = 0.0000 Residual | 4513.59566 12098 .373086101 R-squared = 0.5840 -------------+------------------------------ Adj R-squared = 0.5838 Total | 10848.7615 12102 .896443687 Root MSE = .61081 ------------------------------------------------------------------------------ Zprofic_mat | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ZMat_L1 | .7728753 .0061332 126.01 0.000 .7608532 .7848974 DiD | -.0511167 .0255607 -2.00 0.046 -.1012198 -.0010137 time | .0949971 .0176783 5.37 0.000 .0603447 .1296494 treated | .1167293 .0131256 8.89 0.000 .0910011 .1424576 _cons | -.0654067 .0097313 -6.72 0.000 -.0844815 -.0463318 ------------------------------------------------------------------------------

But when I estimate the same model using xtreg, the ZMat_L1 decreases and becomes negative.

PHP Code:

xtreg Zprofic_mat ZMat_L1 DiD time treated, fe Fixed-effects (within) regression Number of obs = 12103 Group variable: IDaluno Number of groups = 4881 R-sq: within = 0.0231 Obs per group: min = 1 between = 0.1807 avg = 2.5 overall = 0.0823 max = 4 F(4,7218) = 42.71 corr(u_i, Xb) = -0.3905 Prob > F = 0.0000 ------------------------------------------------------------------------------ Zprofic_mat | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ZMat_L1 | -.0408561 .0111915 -3.65 0.000 -.0627948 -.0189175 DiD | .0807754 .0234132 3.45 0.001 .0348787 .1266721 time | .1067221 .0160304 6.66 0.000 .0752977 .1381465 treated | .0472461 .1129775 0.42 0.676 -.1742229 .268715 _cons | -.184291 .058919 -3.13 0.002 -.2997895 -.0687925 -------------+---------------------------------------------------------------- sigma_u | .9238459 sigma_e | .47841538 rho | .78853743 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(4880, 7218) = 2.56 Prob > F = 0.0000

To be honest, I do not understand the reason for this change in the lagged dependent variable between xtreg and regress. Can please anyone helps me with the interpretation?

PS: Please note that the model above is only a reduced form for viewing purposes. In the "real" estimation, I will include the control variables, school and time fixed effects, and cluster the standard errors at class level. For this reason I would prefer to apply the xtreg for the estimation.

Any advice would be highly appreciated!
Thanks in advance.

Last edited by Tharcisio Leone; 29 Dec 2021, 04:34.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10191
#2

29 Dec 2021, 08:09

Your bigger problem is the potential bias from the lagged dependent variable (LDV) given that you have only 5 time periods. You cannot easily distinguish between fixed effects and lagged dependence in this model - an individual can have a high value of the outcome either because she has a high fixed effect or because her values of the outcome were high in the past. With only a few time series observations, it is difficult to isolate these two effects. The usual approach in the literature is to treat the LDV as jointly determined with the outcome and proceed with an instrumental variables (IV)-type estimation. For this, see

Code:

help xtabond

as a starting point.

Last edited by Andrew Musau; 29 Dec 2021, 08:13.
1 like
Comment
Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#3

29 Dec 2021, 10:22

Dear Andrew,

first of all, thank you very much for your message.
But that must have been a misunderstanding. To the best of my knowledge, my model is not suffering of Nickel bias because I do not have individual fixed effects (only school and time FE). For this reason, I do not have to apply first-difference transformation (GMM) in order to remove the constant terms and the individual fixed effects from the model.

The xtreg would work fine here. My problem is only with the interpretation of the ZMat_L1.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#4

29 Dec 2021, 10:31

There is no surprise or paradox here. -xtreg, fe- estimates within panel effects only. -regress- instead implicitly constrains the within- and between-panel effects to be the same and estimates this common effect. The results you are getting constitute evidence that the implicit assumption that within- and between- effects are identical is incorrect. So, you need to decide whether you are interested in the between school or within-school effects of these variables and choose your model accordingly. Or, you can determine that you need to estimate both and run a hybrid model. (-xthybrid- from SSC will do this).
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10191
#5

29 Dec 2021, 11:01

Originally posted by Tharcisio Leone View Post

Dear Andrew,

first of all, thank you very much for your message.
But that must have been a misunderstanding. To the best of my knowledge, my model is not suffering of Nickel bias because I do not have individual fixed effects (only school and time FE). For this reason, I do not have to apply first-difference transformation (GMM) in order to remove the constant terms and the individual fixed effects from the model.

The xtreg would work fine here. My problem is only with the interpretation of the ZMat_L1.

In #1, you state that you have a panel of students.

My data set contains 8,232 students in a panel data format with T=5 (waves). For each student, I have the test scores (Zprofic_mat) and a list of observed variables ($controlvar) over the time period. Then, I create the lagged dependent variable for t-1.

If this is the case, and "IDaluno" identifies a student, i.e., in your xtset command you had

Code:

xtset IDaluno year

then your xtreg command includes individual (student) fixed effects. Or does "IDaluno" identify a school? Granted that you do not want to exploit the panel structure of your data, you need to consider the consequence of not controlling for unobserved time invariant student effects. If your analysis is descriptive and not causal, that's fine.

Last edited by Andrew Musau; 29 Dec 2021, 11:05.
Comment
Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#6

30 Dec 2021, 03:45

Many thanks for your support!

1. Individual fixed effects
@Andrew: Yes, you are right. The "IDaluno" identifies the students in the model and should really be applied. (Sorry, I have overlooked this).
There are some papers published on top-tier journals where FE and LDV were estimated together without IVs. (See e.g. Brittona and Propper 2016, specially equation 1 and table 2). In contrast, I found no study in the context of teacher bonus using GMM models for the estimation. What is your interpretation to this fact? Maybe this bias is not a big issue for the journals?

2. Surprise or paradox with the results
For the purpose of clarification, I estimated the results using OLS, FE and GMM.

PHP Code:

reg Zprofic_mat DiD time treated eststo reg reg Zprofic_mat ZMat_L1 DiD time treated eststo regLag xtreg Zprofic_mat DiD time treated, fe eststo xtreg xtreg Zprofic_mat ZMat_L1 DiD time treated, fe eststo xtregLag xtabond L(0/1).Zprofic_mat DiD time treated eststo xtabond . esttab reg regLag xtreg xtregLag xtabond, keep(DiD time treated ZMat_L1) stats(N r2) cells(b(star fmt(3)) se(par fmt(3))) -------------------------------------------------------------------------------------------- (reg) (regLag) (xtreg) (xtregLag) (xtabond) Zprofic_mat Zprofic_mat Zprofic_mat Zprofic_mat Zprofic_mat b/se b/se b/se b/se b/se -------------------------------------------------------------------------------------------- DiD -0.255*** -0.051* 0.087*** 0.081*** 0.030 (0.033) (0.026) (0.021) (0.023) (0.024) time 0.148*** 0.095*** 0.112*** 0.107*** 0.119*** (0.023) (0.018) (0.014) (0.016) (0.016) treated 0.381*** 0.117*** 0.115 0.047 -0.080 (0.015) (0.013) (0.083) (0.113) (0.121) ZMat_L1 0.773*** -0.041*** -0.356*** (0.006) (0.011) (0.011) -------------------------------------------------------------------------------------------- N 19520.000 12103.000 19520.000 12103.000 7040.000 r2 0.034 0.584 0.018 0.023 --------------------------------------------------------------------------------------------

Note that with GMM the ZMat_L1 remains negative. Then, how @Clyde highlighted, I need to decide what model should be used.
My main interest is to present the results free of bias but here I am not sure what would be this model. I am really surprised to see these negative values for the LDV, specially because the correlation between the test scores over time is high and positive (see below).

PHP Code:

. correlate Zprofic_mat ZMat_L1 (obs=14702) | Zprofi~t ZMat_L1 -------------+------------------ Zprofic_mat | 1.0000 ZMat_L1 | 0.7949 1.0000

I feel like I am missing something. Can anyone help me with this issue?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10191
#7

30 Dec 2021, 09:17

There are some papers published on top-tier journals where FE and LDV were estimated together without IVs. (See e.g. Brittona and Propper 2016, specially equation 1 and table 2). In contrast, I found no study in the context of teacher bonus using GMM models for the estimation. What is your interpretation to this fact? Maybe this bias is not a big issue for the journals?

What are the sample sizes? The LDV bias is of order $\frac{1}{T}$. If $T$ is sufficiently large, it can be ignored and the fixed effects model can be used.

My main interest is to present the results free of bias but here I am not sure what would be this model.

I do not know the underlying theory, but inclusion of a lagged dependent variable on the right-hand side implies that you believe that there exists a dynamic relationship where past values of your outcome influence current values. If the theory suggests such a relationship, you should focus on the GMM results as the other specifications will result in biased estimates. Then, you can perform diagnostics on this.

I am really surprised to see these negative values for the LDV, specially because the correlation between the test scores over time is high and positive (see below).

. correlate Zprofic_mat ZMat_L1
(obs=14702)

| Zprofi~t ZMat_L1
-------------+------------------
Zprofic_mat | 1.0000
ZMat_L1 | 0.7949 1.0000

You cannot simply look at the bivariate correlation and expect that the effect will be the same once you control for a whole host of things. There are omitted variables which once you include can change the magnitude and direction of the effect.

Last edited by Andrew Musau; 30 Dec 2021, 09:23.
Comment
Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#8

30 Dec 2021, 12:38

1. Sample size and T
In Brittona and Propper 2016 the sample size and T are lower than in my study (N=6,000 and T=2).

2. Bivariate correlation
I did not include the control variables in the output above. But once included the magnitude changed but not the direction of the LDV.

3. Theory
The underlying theory says that the value-added strategy is necessary. One important peculiarity of the educational production function is its cumulative character over time. The student achievement at time t depends not only on the educational inputs applied during t, but also the sum of all inputs that have already been integrated into the student learning process plus the initial ability. Therefore, the student's achievement in time t-1 strongly influences the outcome in t.

My trade-off here is to decide between a "biased" FE model that has been published on top-tier journals and a GMM that theoretically presents a solution for the Nickel bias but has not been applied in impact evaluation research.
In addition, it is still a puzzle to me that the LDV becomes negative by FE and GMM. Specially because the correlation between the test scores over time is high and positive.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10191
#9

30 Dec 2021, 16:12

Originally posted by Tharcisio Leone View Post

1. Sample size and T
In Brittona and Propper 2016 the sample size and T are lower than in my study (N=6,000 and T=2).

Now that is impossible. You lose a cross-section through lagging, so if $T=2$, you are left with only 1 period observation and you need a minimum of 2 to run a fixed effects model. I am sure that you have this wrong. You can experiment yourself with the Grunfeld dataset as below:

Code:

webuse grunfeld, clear keep if time <=2 xtset company year xtreg invest L.invest mvalue, fe

Res.:

Code:

. xtreg invest L.invest mvalue, fe note: L.invest omitted because of collinearity note: mvalue omitted because of collinearity Fixed-effects (within) regression Number of obs = 10 Group variable: company Number of groups = 10 R-sq: Obs per group: within = . min = 1 between = . avg = 1.0 overall = . max = 1 F(0,0) = 0.00 corr(u_i, Xb) = . Prob > F = . ------------------------------------------------------------------------------ invest | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- invest | L1. | 0 (omitted) | mvalue | 0 (omitted) _cons | 101.607 . . . . . -------------+---------------------------------------------------------------- sigma_u | 144.84996 sigma_e | . rho | . (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(9, 0) = . Prob > F = . .

As a general comment, do not take as the Gospel whatever you see published in academic journals. The Journal of Public Economics (JPE) is a top field journal in the area of public economics, but its review process is not as thorough as what you will see in the top 5 economic journals. The issue is that sometimes you will find referees that are experts in the topic that an article addresses, but are not very statistically minded. Therefore, there is a large variance in the quality of the methodologies employed. We can agree that it is not the case that $T=2$ in this instance, but say if $T=3$, then this study is a good candidate for replication if you can get access to the data. I do a few of these myself, and here is one based on a paper published in, er, JPE.

2. Bivariate correlation
I did not include the control variables in the output above. But once included the magnitude changed but not the direction of the LDV.

This is to be expected. You just need to find a story to explain it.

3. Theory
The underlying theory says that the value-added strategy is necessary. One important peculiarity of the educational production function is its cumulative character over time. The student achievement at time t depends not only on the educational inputs applied during t, but also the sum of all inputs that have already been integrated into the student learning process plus the initial ability. Therefore, the student's achievement in time t-1 strongly influences the outcome in t.

My trade-off here is to decide between a "biased" FE model that has been published on top-tier journals and a GMM that theoretically presents a solution for the Nickel bias but has not been applied in impact evaluation research.

Estimate both the FE model and the dynamic model (using GMM) and present the results side-by-side in a table. Then comment on the FE results and argue that the dynamic model results should be preferred due to the LDV bias, which will be apparent in the differences between the coefficient estimates. At the end of the day, you want to be able to defend whatever you do, and no one will criticize you for using a better model, arguing in favor of an inferior one.

In addition, it is still a puzzle to me that the LDV becomes negative by FE and GMM. Specially because the correlation between the test scores over time is high and positive.

Back to the second comment, this is what you will need to explain once you are comfortable with the diagnostics of the dynamic model. Plainly, this will result from the inclusion of the fixed effects and your other control variables.

Last edited by Andrew Musau; 30 Dec 2021, 16:21.
Comment
Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#10

02 Jan 2022, 12:14

I sincerely appreciate all your valuable comments. They were a great help. Thanks again !!

Now that is impossible. You lose a cross-section through lagging, so if T=2, you are left with only 1 period observation and you need a minimum of 2 to run a fixed effects model. I am sure that you have this wrong. You can experiment yourself with the Grunfeld dataset as below:

Please let me clarify the empirical model in Brittona and Propper 2016.
In this study the dependent variable is the test score at school leaving age (Key Stage 4) and the LDV is the exam score at entry into the school at age 11 (Key Stage 2).
For this reason, I meant T=2. But the model has of course a minimum of 2 time periods.

This is an empirical model that I could replicate in my study as well. Stead of using T=5, I can include only the first test score (at entry into the school) as explanatory variable.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2160
#11

03 Jan 2022, 07:59

It's not really DID if you have a lagged dependent variable. With T = 2, DID with panel data is the same as regressing D.Y on a constant and DiD. Your first regression is the same as D.Y on L.Y DiD. So one controls for lagged Y, the other doesn't. I discuss in my MIT Press book how these are based on different assumptions about the policy assignment.

Like Andrew, I'm puzzled how you can implement GMM when you are using T = 2. Your output suggests T as high has four for some students. In any case, with such a small average T you should not use usual fixed effects for the model with lagged Y. Also, I would recommend, if you really have up to four time periods, putting in i.time rather than just time itself.

A traditional DiD does not have lagged Y in the equation, and you have to reinterpret the treatment effect that you're identifying.

How many years after the intervention do you have?
Comment
Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#12

03 Jan 2022, 13:14

Many thanks for your support.

How many years after the intervention do you have?

Only 1 year (2008) for the post-treatment period.

Like Andrew, I'm puzzled how you can implement GMM when you are using T = 2.

Just to clarify.
In #1, I used a GMM with T=5.
In #10, I would use OLS and FE with T=2 only (as in Brittona and Propper 2016).

I would recommend, if you really have up to four time periods, putting in i.time rather than just time itself.

What do you mean exactly?

PHP Code:

regress Zprofic_mat ZMat_L1 DiD i.time treated ???

In this case, I am not able to control the model for individual and school fixed effects, what is particularly unique for an education production function.
Comment
Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#13

07 Mar 2022, 04:03

Dear Clyde Schechter,
could you please indicate me some literature in which I can find more information about the within- and between-panel effects in a context with lagged dependent variable (see your comment in #4).
My own search for this theoretical background was unsuccessful.

Originally posted by Clyde Schechter View Post

There is no surprise or paradox here. -xtreg, fe- estimates within panel effects only. -regress- instead implicitly constrains the within- and between-panel effects to be the same and estimates this common effect. The results you are getting constitute evidence that the implicit assumption that within- and between- effects are identical is incorrect. So, you need to decide whether you are interested in the between school or within-school effects of these variables and choose your model accordingly. Or, you can determine that you need to estimate both and run a hybrid model. (-xthybrid- from SSC will do this).
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#14

07 Mar 2022, 05:23

This paper might help.

EDIT: Clyde mentioned the hybrid estimator. You might also look up something called "Mundlak" models which has become a little more popular in recent years
1 like
Comment
Wei LIIU

Join Date: Mar 2022

Posts: 19
#15

17 Mar 2022, 06:54

Originally posted by Jeff Wooldridge View Post

It's not really DID if you have a lagged dependent variable. With T = 2, DID with panel data is the same as regressing D.Y on a constant and DiD. Your first regression is the same as D.Y on L.Y DiD. So one controls for lagged Y, the other doesn't. I discuss in my MIT Press book how these are based on different assumptions about the policy assignment.

Like Andrew, I'm puzzled how you can implement GMM when you are using T = 2. Your output suggests T as high has four for some students. In any case, with such a small average T you should not use usual fixed effects for the model with lagged Y. Also, I would recommend, if you really have up to four time periods, putting in i.time rather than just time itself.

A traditional DiD does not have lagged Y in the equation, and you have to reinterpret the treatment effect that you're identifying.

How many years after the intervention do you have?

Hello, sweetheart. Actually, I'm working on a similar topic right now, estimating a difference in difference model with a lagged dependant variable as the control variable. After the policy went into effect, I had two years of data. The problem is figuring out how to interpret the DID term's coefficient. Is the LDV going to pollute it? And how do you deal with the LDV's endogeneity problem? Perhaps the second lagged dependent variable can be used as an instrument. However, this will result in a significant loss of observations. Because I only have a total of seven years. And the policy will go into effect at the end of 2012.By the way, how can he find the coefficient of a time-invariant variable like variable of treat while using xtreg with fe? Because the within estimator cannot indentify the coefficient of time-invariant variable.
1 like
Comment

Announcement

DiD with a lagged variable using xtreg and regress.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment