Dif in Dif method and assumptions.

Jesse Luimes

Join Date: Apr 2024

Posts: 4
#1

Dif in Dif method and assumptions.

14 May 2024, 05:04

Hello all,
I have looked a bit around on this forum, and I could not find anything like this question so hereby:

For my thesis I am doing a dif in dif model on the effect of train stations on housing prices. I have repeated cross-sectional (transaction) data from 7 municipalities (2006-2018) of which 2 got a brand new train station in mid december 2012 (this is the treatment of course). The effect should then be visible in 2013, therefore salesyear>2012 in the code down below.
My thesis is only on one of the municipalities (dronten) and I want to see the effect of the staiton on that train. The first question is what I should do with the other town?? any suggestions for that? Now it is in the control group, which does not really make sense (you get a treated variable in the control group, it got a second station). Yet this is a bit of a side question.

Then this is my current code:
generate after2012 = (salesyear>2012)
generate dummydronten = (GEM_ID == 303)

gen drontenafter2012 = after2012*dummydronten

didregress (lnprijs lnage age2 lnm2 lnperceel NKAMERS Distance i.SOORTWONING i.salesyear i.ONBI i.ONBU i.GARAGE i.MONUMENTAAL i.ZOLDER i.ZWEMBAD) (drontenafter2012), group(GEM_ID) time(salesyear)
I checked the parallel trends assumption by -estat trendplots- and I think its just usable and its met.

But because I was not sure whether the assumption was met, I tried to get a numerical output on this with the following command:
reg lnprijs i.dummydronten##c.salesyear
margins dummydronten, dydx(salesyear)
However, the treatment group could not be estimated. So that did not really work (I could not compare). So ill just assume that the dif in dif assumptions is met?

Then another question, which is actually my main question is about other assumptions: what model is underlying in a dif in dif? If it is an OLS, which I assume, then I also assume the OLS assumptions must be met? How to do that, how to discover/solve for example heteroskedasticity (-hettest- forbreusch-pagan, -estat imtest, white-, and -rvfplot, yline(0)- do not work...). And I have got the same questions for endogeneity (error term correlated with independent variables) and multicollinearity. Do I need to check this for a simple dif in dif?

Quite a few questions for now, but I hope someone can help me here.

Best,
Jesse

Last edited by Jesse Luimes; 14 May 2024, 05:07.
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3120
#2

14 May 2024, 16:47

reg lnprijs c.dummydronten#c.salesyear salesyear dummydronten if salesyear<=2012

The first coefficient is a test of differences in trends. If it won't estimate, you've got a bigger problem.

If it's a panel, I'd estimate

areg lnprijs c.dummydronten#c.salesyear salesyear if salesyear<=2012 , absorb(panelid)

As for the standard errors, how many cross sections do you have and how many are treated? 7 municipalities and 2 treated?

If so, then I'd cluster the standard errors and use boottest for hypothesis testing. It's too few clusters, and a few treated clusters, so the standard SE are invalid (all of them).
Comment
Jesse Luimes

Join Date: Apr 2024

Posts: 4
#3

16 May 2024, 01:57

Dear mr Ford,
Thank you very much for your reply, I'll definitely apply this.

Originally posted by George Ford View Post

reg lnprijs c.dummydronten#c.salesyear salesyear dummydronten if salesyear<=2012

The first coefficient is a test of differences in trends. If it won't estimate, you've got a bigger problem. Thank you, ill definitely give it a shot. Thats really helpful

If it's a panel, I'd estimate

areg lnprijs c.dummydronten#c.salesyear salesyear if salesyear<=2012 , absorb(panelid)

As for the standard errors, how many cross sections do you have and how many are treated? 7 municipalities and 2 treated? I have 7 municipalities in total, of which 2 are treated. However, I am only interested in the treated effect of one of those two municipalities, namely on Dronten. Total amount of observations/housing transactions = 80,000+ divided over those 7 municipalites.

If so, then I'd cluster the standard errors and use boottest for hypothesis testing. It's too few clusters, and a few treated clusters, so the standard SE are invalid (all of them).

Okay thats actually interesting, i now used vce (cluster), but I guess I'll switch now, because this makes sense.

Thank you very much, explains a lot!
Best,
Jesse
Comment

Jesse Luimes

Join Date: Apr 2024
Posts: 4

16 May 2024, 03:49

Dear Mr. Ford,

A quick follow-up, I found that the equal trends assumption does not hold with the model in my first question. Both estat ptrends and estat granger reject the H0. However, the line of code you suggested does estimate. This is what I got.
Linear regression

lnprijs	Coef.		St.Err.	t-value		p-value	[95% Conf		Interval]	Sig
c	-.005		.004	-1.45		.148	-.012		.002
salesyear	-.019		.001	-18.47		0	-.021		-.017	***
dummydronten	10.555		7.26	1.45		.146	-3.674		24.784
Constant	49.734		2.032	24.48		0	45.752		53.716	***

Mean dependent var		12.221			SD dependent var			0.315
R-squared		0.017			Number of obs			25552
F-test		143.710			Prob > F			0.000
Akaike crit. (AIC)		13127.801			Bayesian crit. (BIC)			13160.395
* p<.01, p<.05, * p<.1

I just wanted to share this extra information, as I thought it might be good to know.

Best,
Jesse

Comment

George Ford

Join Date: Aug 2014

Posts: 3120
#5

16 May 2024, 07:50

#4 does not include the interaction of dummydronten and salesyear. That's the coefficient you are interested in, as it quantifies the difference in the trends. I'd also center the cross sections using fixed effects and you should restrict to the pre-treatment period.

or is that c? if so, you've rejected equal trends.

if you have x's, then include those, as you're interested in the conditional trends.

Last edited by George Ford; 16 May 2024, 08:14.
Comment

Announcement

Dif in Dif method and assumptions.

Comment

Comment

Comment

Comment