Testing for parallel trends with conditional means

amira elshal

Join Date: Dec 2015
Posts: 75

Testing for parallel trends with conditional means

20 May 2021, 11:43

Dear colleagues,
I ran a difference-in-differences model and I would like to visually validate the parallel-trends assumption using conditional means rather than ordinary means. If I am using ordinary means, I would run the commands below. But how can I test for parallel trends if I would like to condition my (mean) outcome variable on the two variables "control1" and "control2"? I attach a sample of the data I am using. Thanks in advance.

Code:

    xtset id year
    collapse (mean) outcome, by(treated year)
    xtset treated year
    xtline outcome, overlay title(outcome_variable)

id	year	outcome	control1	control2	treated
1	2015	30	14.47	5.94	0
1	2016	45	14.47	5.94	0
1	2017	20	14.47	5.94	0
1	2018	15	14.47	5.94	0
1	2019	40	14.47	5.94	0
1	2020	65	14.47	5.94	0
2	2015	40	42.89	3.64	1
2	2016	55	42.89	3.64	1
2	2017	30	42.89	3.64	1
2	2018	25	42.89	3.64	1
2	2019	50	42.89	3.64	1
2	2020	100	42.89	3.64	1
3	2015	20	48.52	7.93	0
3	2016	35	48.52	7.93	0
3	2017	10	48.52	7.93	0
3	2018	5	48.52	7.93	0
3	2019	30	48.52	7.93	0
3	2020	55	48.52	7.93	0

Tags: None

Oscar Ozfidan

Join Date: Sep 2018

Posts: 257
#2

20 May 2021, 19:16

I am assuming you are talking about parallel time trends
without using the xtset and running
reg outcome year control1 control2 i.id i.treated

the coefficient of year would be the slope of the time trend

to test if the slopes of time trend are different between treated and non-treated categories
reg outcome year control1 control2 i.id i.treated i.treated#c.year
if the coefficient of i.treated#c.year is significant, the the slope of time trend differs from the time trend without the interaction term
Comment
amira elshal

Join Date: Dec 2015

Posts: 75
#3

21 May 2021, 05:28

Oscar Ozfidan Many thanks for your prompt response. This is one way to test for parallel trends. However, I am trying to plot the conditional mean (see Figure below). That is, I am testing for parallel trends but based on observables.

Last edited by amira elshal; 21 May 2021, 05:31.
Comment
Oscar Ozfidan

Join Date: Sep 2018

Posts: 257
#4

21 May 2021, 05:55

You need to explain what you are trying to do in a little bit more detail. Are you saying what I have shown addresses the test but now you want to plot them? Or are you saying my assumption that testing trends for treated vs untreated is not what you were looking for? If the latter, you need to clarify parallel trends of what vs what if the desired comparison is not between the categories of treated. If what remains is a strictly plotting issue, I dont think I can be of help since I have very poor knowledge of Stata graphs.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10088
#5

21 May 2021, 05:58

Perhaps it would be less confusing if the authors used the terms "Predicted values" or "Fitted values". In short, linear regression estimates the conditional mean of the outcome variable. So run a regression and use the fitted values.
Comment
amira elshal

Join Date: Dec 2015

Posts: 75
#6

21 May 2021, 06:15

Oscar Ozfidan Thanks for your message. I am sorry if I have been unclear. I am trying to test for parallel trends between treated and untreated units. But, instead of using the mean outcomes, I would like to use the conditional mean outcomes (i.e., conditioning mean outcomes on my two control variables). These two control variables, or call them observables, I include in the difference-in-differences original regression.
Comment
amira elshal

Join Date: Dec 2015

Posts: 75
#7

21 May 2021, 06:18

Andrew Musau Yes, I think that is what I am trying to do. May you, please, advise on how can I use the predicted/fitted outcome values to test for parallel trends between treated and untreated units? May you, please, provide the Stata codes? I am a little bit confused here I am afraid.
Comment
Oscar Ozfidan

Join Date: Sep 2018

Posts: 257
#8

21 May 2021, 06:32

@Andrew Musau In the plot she shared, dotted and undotted lines are not predicted vs actual values despite the general convention to show them like that. They are actually the the data of two groups i.e <100 mile and >100 mile. I think she is interested in testing the trends that goes through the dotted line and the undotted line. So, if that is the case, she needs to drop the year from the reg and keep the year treated interaction.
reg outcome control1 control2 i.id i.treated i.treated#c.year

if she wants to choose a particular treated group as the base trend lets say 0 group she can use

reg outcome control1 control2 i.id i.treated ib0.treated#c.year

after running that the coefficient of treated==1#c.year would indicate if the trend for treated==1 is significantly different than for the trend when treated==0.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10088
#9

21 May 2021, 06:36

It appears that you have panel data. Are your controls time varying?

Code:

xtset id year xtreg outcome controls, fe predict outcomehat, xbu

Here, the regression controls for your specified controls and individual fixed effects. It requires that your controls are time-varying. Otherwise, with time-invariant controls, just run simple OLS

Code:

regress outcome controls predict outcomehat, xb

where variable "outcomehat" holds your predicted outcome.

Last edited by Andrew Musau; 21 May 2021, 06:41.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10088
#10

21 May 2021, 06:39

Originally posted by Oscar Ozfidan View Post

@Andrew Musau In the plot she shared, dotted and undotted lines are not predicted vs actual values despite the general convention to show them like that. They are actually the the data of two groups i.e <100 mile and >100 mile. I think she is interested in testing the trends that goes through the dotted line and the undotted line. So, if that is the case, she needs to drop the year from the reg and keep the year treated interaction.
reg outcome control1 control2 i.id i.treated i.treated#c.year

if she wants to choose a particular treated group as the base trend lets say 0 group she can use

reg outcome control1 control2 i.id i.treated ib0.treated#c.year

after running that the coefficient of treated==1#c.year would indicate if the trend for treated==1 is significantly different than for the trend when treated==0.

Yes, I did not focus on the specific details. Just the fact that the authors by "conditional mean of real income" mean "predicted values of real income".
Comment
amira elshal

Join Date: Dec 2015

Posts: 75
#11

21 May 2021, 06:49

Andrew Musau Many thanks, Andrew. Yes, that is what I meant. But, I think, instead of "xb," it is "res" as follows:

Code:

predict outcomehat, res

I think those residuals are the conditional mean, as if we are obtaining the mean after taking away the variation explained by the controls. The rationale is that we account for these controls in the difference-in-differences specification. I hope that I am not mistaken.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10088
#12

21 May 2021, 07:13

I think those residuals are the conditional mean

I disagree. Residuals are not the conditional means of the outcome. As you state, they are defined as \(e_{i}= y_{i}- \widehat{y}_{i}\) for \(i= 1, \cdots, N\) where \(y_{i}\) is the value of the \(i_{th}\) outcome and \(\widehat{y}_{i}\) is the conditional fitted value.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#13

21 May 2021, 08:08

amira elshal : you are providing information by bits and pieces which is not simplifying the life of the people wanting to you. On the basis of the header of the figure you posted I found the paper
The paper describes in detail what the authors do, and moreover if you go the website of the paper: https://www.aeaweb.org/articles?id=10.1257/pol.5.2.1 , there is a link to the dataset and to the do files.

Andrew Musau :I agree with what you say in post #12, but if you look at the do file for Figure 3, it reads:

Code:

Figure 3 - Conditional mean of real income ; xi3: reg y_rel schooling age isfemale electricity water if ocu500==1 & codpers==1 [pweight=factor] ; capture drop detrend ; predict detrend, resid ; table year d2 [pw=factor] , c(mean detrend) ;

Sorry, I can't get the lines above to wrap properly, so I have inserted semi-colon to separate the lines of code
I don't know what the authors want really because in a footnote to the paper they say "The mean is conditional on schooling, age and gender of the household head, and access to piped water and electricity." I won't be coming back here soon.

On Edit, it wrapped !!!

Last edited by Eric de Souza; 21 May 2021, 08:12.
1 like
Comment
amira elshal

Join Date: Dec 2015

Posts: 75
#14

21 May 2021, 08:36

Eric de Souza Thanks for your help, much appreciated. I have downloaded the data and do files and will go thoroughly through them.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10088
#15

21 May 2021, 08:40

Thank you Eric de Souza for the additional information. So, here what the authors are doing is detrending the variable "y_rel" and referring to the result as the conditional mean of real income. You are correct amira elshal if you are exactly following what the authors do.
Comment

Announcement

Testing for parallel trends with conditional means

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment