Hello all,
I have looked a bit around on this forum, and I could not find anything like this question so hereby:
For my thesis I am doing a dif in dif model on the effect of train stations on housing prices. I have repeated cross-sectional (transaction) data from 7 municipalities (2006-2018) of which 2 got a brand new train station in mid december 2012 (this is the treatment of course). The effect should then be visible in 2013, therefore salesyear>2012 in the code down below.
My thesis is only on one of the municipalities (dronten) and I want to see the effect of the staiton on that train. The first question is what I should do with the other town?? any suggestions for that? Now it is in the control group, which does not really make sense (you get a treated variable in the control group, it got a second station). Yet this is a bit of a side question.
Then this is my current code:
generate after2012 = (salesyear>2012)
generate dummydronten = (GEM_ID == 303)
gen drontenafter2012 = after2012*dummydronten
didregress (lnprijs lnage age2 lnm2 lnperceel NKAMERS Distance i.SOORTWONING i.salesyear i.ONBI i.ONBU i.GARAGE i.MONUMENTAAL i.ZOLDER i.ZWEMBAD) (drontenafter2012), group(GEM_ID) time(salesyear)
I checked the parallel trends assumption by -estat trendplots- and I think its just usable and its met.
But because I was not sure whether the assumption was met, I tried to get a numerical output on this with the following command:
reg lnprijs i.dummydronten##c.salesyear
margins dummydronten, dydx(salesyear)
However, the treatment group could not be estimated. So that did not really work (I could not compare). So ill just assume that the dif in dif assumptions is met?
Then another question, which is actually my main question is about other assumptions: what model is underlying in a dif in dif? If it is an OLS, which I assume, then I also assume the OLS assumptions must be met? How to do that, how to discover/solve for example heteroskedasticity (-hettest- forbreusch-pagan, -estat imtest, white-, and -rvfplot, yline(0)- do not work...). And I have got the same questions for endogeneity (error term correlated with independent variables) and multicollinearity. Do I need to check this for a simple dif in dif?
Quite a few questions for now, but I hope someone can help me here.
Best,
Jesse
I have looked a bit around on this forum, and I could not find anything like this question so hereby:
For my thesis I am doing a dif in dif model on the effect of train stations on housing prices. I have repeated cross-sectional (transaction) data from 7 municipalities (2006-2018) of which 2 got a brand new train station in mid december 2012 (this is the treatment of course). The effect should then be visible in 2013, therefore salesyear>2012 in the code down below.
My thesis is only on one of the municipalities (dronten) and I want to see the effect of the staiton on that train. The first question is what I should do with the other town?? any suggestions for that? Now it is in the control group, which does not really make sense (you get a treated variable in the control group, it got a second station). Yet this is a bit of a side question.
Then this is my current code:
generate after2012 = (salesyear>2012)
generate dummydronten = (GEM_ID == 303)
gen drontenafter2012 = after2012*dummydronten
didregress (lnprijs lnage age2 lnm2 lnperceel NKAMERS Distance i.SOORTWONING i.salesyear i.ONBI i.ONBU i.GARAGE i.MONUMENTAAL i.ZOLDER i.ZWEMBAD) (drontenafter2012), group(GEM_ID) time(salesyear)
I checked the parallel trends assumption by -estat trendplots- and I think its just usable and its met.
But because I was not sure whether the assumption was met, I tried to get a numerical output on this with the following command:
reg lnprijs i.dummydronten##c.salesyear
margins dummydronten, dydx(salesyear)
However, the treatment group could not be estimated. So that did not really work (I could not compare). So ill just assume that the dif in dif assumptions is met?
Then another question, which is actually my main question is about other assumptions: what model is underlying in a dif in dif? If it is an OLS, which I assume, then I also assume the OLS assumptions must be met? How to do that, how to discover/solve for example heteroskedasticity (-hettest- forbreusch-pagan, -estat imtest, white-, and -rvfplot, yline(0)- do not work...). And I have got the same questions for endogeneity (error term correlated with independent variables) and multicollinearity. Do I need to check this for a simple dif in dif?
Quite a few questions for now, but I hope someone can help me here.
Best,
Jesse
Comment