Fixed effects in Difference-in-Difference Non-Panel Data

Kim Pijl

Join Date: Jun 2019

Posts: 16
#1

Fixed effects in Difference-in-Difference Non-Panel Data

12 Jun 2019, 08:26

Dear all,

I am investigating the effect of accessibility to a newly opened metro station on housing values. I am using the "natural experiment" approach, the difference-in-difference estimator. I do not have panel data, but cross sectional data on the treatment and non-treatment group (defined as acces to location of the station or not) before and after the date of metro station opening. If I am correct this is no problem and a difference-in-difference estimator can be applied on such data.

Because I need to control for housing market dynamics, I want to include at least year fixed effects (might scale this to quarterly). I am fairly new to Stata and was wondering if I am correct when I say that the xtset, xtreg and fe are exclusively codes for panel data? And the only possibility for me to include time fixed effects is through dummies (i.year)?

Thank you in advance!

Kim

*For comparable research see: Diao, M., Leonard, D., & Sing, T. F. (2017). Spatial-difference-in-differences models for impact of new mass rapid transit line on private housing values. Regional Science and Urban Economics, 67, 64–77. https://doi.org/10.1016/j.regsciurbeco.2017.08.006

Last edited by Kim Pijl; 12 Jun 2019, 08:35. Reason: fixed effects
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#2

12 Jun 2019, 08:56

Kim:
welcome to this forum.
As you surmised, you can investigate time effect (in both -regress- and -xtreg-) via:

Code:

i.year

See -help fvvarlist- for further details.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kim Pijl

Join Date: Jun 2019

Posts: 16
#3

12 Jun 2019, 10:20

Thank you for the quick response Carlo! Another question I have related to this, and I hope I am allowed to ask here, is on multicollinearity. The "original" difference-in-difference estimator looks as follows:

P_i = a + b₁*Treatment + b₂*Post + b₃*(Post*Treatment) + e (1)

However, when controlling for time fixed effects:

P_i = a + b₁*Treatment + b₂*Post + b₃*(Post*Treatment) + time fixed effects + e (2)

In the case of formula (2) collinearity exists between the time fixed effects and the Post variable.(My interpretation of this is that when an observation is e.g. in Quarter _ of the year 20__, it is automatically known if this date is before or after the opening of the metro station = Post variable)

In previous threads I've read that Stata in most cases automatically drops the b_2*Post part of the equation. However, the article that I have added to post #1 includes estimators of b₂ and both time fixed effects. I have read somewhere that in this case Stata drops an additional year dummy. I find it difficult to find the logic behind dropping multiple year dummies but allowing for the b₂to be estimated. Does someone have a very elementary example so I can visualize this?

Many thanks in advance again!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#4

12 Jun 2019, 10:31

Kim:
unfortunately the article you mentioned is accessible under a pay-per-view wall.
Probably, the best approach is to report what you typed and what Stata gave you back and/or share an excerpt/example of your data via -dataex-. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kim Pijl

Join Date: Jun 2019

Posts: 16
#5

12 Jun 2019, 10:46

Dear Carlo,

Unfortunately I am still waiting on data approval from the real estate agencies, I am currently trying to get acquainted with the code. However, I did find a similar post on here: https://www.statalist.org/forums/for...ferences-model

Especially of interest is #4, the author notes that of the 4 periods, Stata dropped 2 and kept the Post estimator. My question remains: How is multicollinearity solved in this case? As I don't directly see it.

Kind regards,
Kim
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#6

12 Jun 2019, 10:54

Kim:
see Clyde Schechter 's reply #11 in the same post you quoted.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kim Pijl

Join Date: Jun 2019

Posts: 16
#7

12 Jun 2019, 11:21

Carlo, unfortunately reply #11 goes over my head. I created a little data set myself to run a few regression.
The data set looks a following:

I arbitrarily decided to let 2002 be the year in which the treatment is received.
I then ran two regressions:

One in which I include the Post estimator. Indeed here I can see that an additional year (1999 and 2010) is omitted due to multicollinearity.

The second regression I ran I manually excluded the Post estimator, here as usually the case with dummies one year (1999) is omitted.

However, when I think about it. When the Post estimator is included, and I have an observation of the year 2007 for example, I immediately know that Post =1. This still is multicollinearity right? Am I getting something wrong in the definition of multicollinearity or overlooking something?

Thank you again!
Attached Files
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#8

12 Jun 2019, 11:44

Kim:
as -post-, -Treat- and -DID- are perfectly correlated for years 2003; 2007; 2009; 2010, Stata decided to omit -2010-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kim Pijl

Join Date: Jun 2019

Posts: 16
#9

12 Jun 2019, 12:06

Ah, I think I finally see it. I hope I can bother you with one more question, I can see that the estimated coefficients are the same in both regression. For the variable of interest (DiD), standard errors, t-stats and p-values are the same (Treat as well). However, I see a difference for these statistics in some time fixed effects. Is this something that needs to be considered when one chooses which model to execute/report?

THANK YOU!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#10

13 Jun 2019, 00:04

Kim:
in your first regression model, -2003- coefficients is very small and reports a p-value=1. It would seem that -2003- is affected by quasi-extreme multicollinearity (see -estat vif- and -estat vce, corr-).

Kind regards,
Carlo
(Stata 19.0)
Comment
Kim Pijl

Join Date: Jun 2019

Posts: 16
#11

13 Jun 2019, 16:11

Thank you for all the help Carlo, it is much appreciated. On another post I found Clyde Schechter 's reply:

"If you list the i.year before i.treatment##i.post, Stata will drop post. You will still have treatment#post, which is your most important variable, but you will no longer be able to estimate what happened in the non-treatment country pairs following the onset of the treatment. If, however, you list i.treatment##i.post before i.year, you will lose a second year indicator to resolve the colinearity. Assuming that year is just a nuisance variable whose effects you want to adjust for, this is a better approach."

and

"The "main effect" of POST is the expected difference in Y between pre- and post-treatment epochs among the firms in the TREAT = 0 group."

My question: When interpreting the POST coefficient in combination with the i.year code, can the POST coefficient still be seen as the change in housing prices from pre to post period for treatment group = 0? Or should we give a different meaning to the coefficient? When I do my calculations by hand (average of pre and post prices for non-treatment group and calculate difference) and run the regression as: regress price post treatment DiD I receive a post estimator of -1000. By including year dummies it seems that the Post estimator becomes the additionally left out dummy variable and thus I can't interpret as before anymore?

Thank you!

Last edited by Kim Pijl; 13 Jun 2019, 16:33.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29963
#12

13 Jun 2019, 17:22

I think the important point to understand is that if you include both i.POST and year indicators in the model, there will be colinearity. The laws of linear algebra cannot be defied: the colinearity must be broken in some way. For the purposes of the DID estimator itself, it does not matter which way you break the colinearity: you will get the same coefficient for that interaction term. But the estimates for the "main effect" of POST and the year variables will change depending on how the colinearity is broken. Otherwise put, no matter which way you go about it, none of those POST and year coefficients has any simple interpretation, and none of them, in any case, estimate what they "appear" to estimate. At most they estimate what they appear to estimate only subject to some fairly strange conditions. For nearly any purpose this makes them meaningless and useless.

I think you can waste an endless amount of time trying to find the "right" way to break the colinearity. But no matter how you do it, there will always be some unsatisfactory aspect of it. It is a simple matter of fact that it is mathematically impossible to simultaneously estimate the "main effect" of POST and year effects in the same model, and you will just make yourself crazy if you keep trying. So spend your time thinking about the real issue: do I omit POST or do I omit i.year. Make a choice based on what is important to your research goals, and then stick with it and don't look back.
1 like
Comment
Kim Pijl

Join Date: Jun 2019

Posts: 16
#13

15 Jun 2019, 04:33

Thank you for the elaborate reply Clyde, much appreciated as well. The article I linked included the POST indicator as well as quarter fixed effects. In the null model, only Post, Treat and DiD enter the regression. The Post estimator turned out to be negative, they then argued that, “The negative “Post” coefficient indicates that there was a general declining trend in housing prices after the opening of the metro”. I believe this is the correct interpretation. However, after including quarter fixed effect as well, the Post estimator remains negative and they state that "The “Post” coefficient indicates the same declining trend in non-landed housing prices in the study area after the opening of CCL."

In the second case, is this a “false” statement then? In the specifics of my example above, equivalent to the article, when I run a null model, the Post estimator is negative, I would conclude that prices after treatment implementation are falling on average (in the Treatment = 0 group, but since we expect a parellel trend assumption I think the statement is sufficient without this addition). When I correct for time fixed effects, equivalent to the article, I would state that there is a general increasing trend. I feel like making such comments after the model that includes i.year is quite wrong?

At last, the collinearity issue is clear at the moment and I understand how an additionally dropped time indicator can solve it! Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29963
#14

17 Jun 2019, 12:41

since we expect a parellel trend assumption I think the statement is sufficient without this addition

Well, you expect a parallel trend, but you need to see if you actually have it!

When I correct for time fixed effects, equivalent to the article, I would state that there is a general increasing trend. I feel like making such comments after the model that includes i.year is quite wrong?

It is quite wrong. Since the year and post variables are colinear, none of them alone has any meaning when they are in the model together. Each can only be interpreted in the context of all the others. Once the year indicators are in the model, the post variable no longer reflects the trend, even in just the untreated group--that trend is diffused over post and all the year indicators in a way that is pretty much impossible to identify. So in this model, no statement about it can be made.
2 likes
Comment

Announcement

Fixed effects in Difference-in-Difference Non-Panel Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment