Difference in Differences model

joseph dover

Join Date: Jan 2016

Posts: 23
#1

Difference in Differences model

19 Feb 2016, 11:40

Hi Everyone!

I am trying to use the difference in differences model where I am struggling at the moment and would really appreciate your help!

I have read around and I think I have understood how to do a difference -in-differences regression analysis in STATA by looking at your posts on the Statalist forum.

I am looking the effect of the Euro on trade in Europe. I have a control group and a treatment group ("pairEURO" equal to 1 if they adopted the euro in 1999)

Code:

gen TREATMENT = if(pairEURO =1) gen POST = 1 if(year > 1999) gen INTERACTION = treatment*post xtset CountryPair YEAR xtreg lnTradeFlow CommonLanguage lnPopulationSizej lnPopulationSizei lnGDPj lnGDPi lnDistance TREATMENT POST INTERACTION i.year, fe robust

1) My TREAT variable gets omitted as it is constant over time but my POST variable does not. Does this mean I made a mistake? and did not correctly coded in your data?
I believe that it doesn't get omitted as it is not equal to 1 until the "Policy" is implemented, hence STATA did not omitted it, am I right?

2) It is better to do fixed effects than OLS because it take into account for effects such as "Distance" and "CommonLanguage" which are constant over time, am I correct? (The treatment dummy will also get omitted as it is fixed over time)

3)Is this the correct way of proceeding? And I am interested in the "INTERACTION" variable coefficient which I then do e^(Coefficient - 1) * 100 to get the percentage effect?

I am using STATA 14

I hope I am clear! Thank you very much in advance!

Best Regards,

Joseph
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

19 Feb 2016, 12:26

1) My TREAT variable gets omitted as it is constant over time but my POST variable does not. Does this mean I made a mistake? and did not correctly coded in your data?
I believe that it doesn't get omitted as it is not equal to 1 until the "Policy" is implemented, hence STATA did not omitted it, am I right?

It's not a mistake. In a fixed-effects regression, any variable that is constant within panel (CountryPair) will be colinear with the fixed effect and will be dropped automatically. It's not a quirk of Stata either. Any within-panel effects estimator necessarily does this: it is in principle impossible to estimate effects that are constant within panel. This is one of the uncommon situations in which it is sensible to have an interaction term but omit one of its component main effects.

Remember, too, that if you were able to retain the TREATMENT variable (which you could do in a random effects model or with OLS), its interpretation in the context of an interaction model is not what it seems. It would not be an effect of the treatment at all. It would be an estimate of the expected difference in the expected value of lnTradeFlow between the treated and untreated groups before treatment was begun. The effects of the treatment are all embodied in the contributions of the POST and INTERACTION variables.

It is better to do fixed effects than OLS because it take into account for effects such as "Distance" and "CommonLanguage" which are constant over time, am I correct? (The treatment dummy will also get omitted as it is fixed over time)

If it is important to you to obtain estimates of the effects of Distance and CommonLanguage, then you have to abandon the fixed-effects estimator. But if you are not interested in estimating those effects and only want to adjust for them, the fixed-effects estimator is an excellent way to do that, as it also adjusts for unobserved (and even un-thought-of) time-invariant differences among the panels. If you do want to estimate those effects, the simplest modification, I think, would be to use the between-effects estimator (-be- option to -xtreg-).

The use of OLS on panel data can be problematic. However, if you look at the very last line of your -xtreg- output, you will see an F-test of the hypothesis that all u_i = 0. If you do not reject that hypothesis (and here I mean not just p > 0.05, but p comfortably greater than 0.05 or a very large N of panel vars) then you can safely use OLS. Another way to retain time-invariant variables is to use the random-effects model. In finance and economics it is generally considered de rigeur if not absolutely mandatory to do a Hausman test.

3)Is this the correct way of proceeding? And I am interested in the "INTERACTION" variable coefficient which I then do e^(Coefficient - 1) * 100 to get the percentage effect?

This approach looks generally reasonable. I can't comment about the content of your model as I have no expertise in finance or economics, but the overall structure as an interaction model to estimate differences in differences is right. As for what you are interested in, the usual focus of interest in these is the coefficient of the interaction term. That estimates the expected difference between the change (post vs pre) in outcome in the treatment group and the change in outcome in the control group. That is the actual difference that the treatment made, over and above whatever difference between the groups may have existed prior to treatment. Since your outcome variable is the log transform of TradeFlow, if you are interested in the percentage change in TradeFlow associated with the treatment effect, the correct formula is 100*(exp(coefficient)-1). That is, first exponentiate the coefficient and then subtract one, not the reverse.
2 likes
Comment
joseph dover

Join Date: Jan 2016

Posts: 23
#3

25 Feb 2016, 12:05

Thanks a lot for you answer Clyde!

I also wanted to ask you if it was a good idea to add year effects (i.year), will this only take into account the year after the introduction of the Euro for the interaction variable? as it is a dummy variable (treatment*post)

I am only interested in the coefficient of the interaction term, hence I believe FE is the right regression. Although I add year effects to take out unobserved heterogeneity such as the financial crisis but I am now worried that my interaction term will only captured the year after.

Thanks a lot!

Best Regards,

Joseph
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

25 Feb 2016, 12:38

You can add i.year to your model if there is good reason to do so. But be careful how you code it: POST will be colinear with the full set of i.year indicators, it cannot coexist with all of them in the model. If you list the i.year before i.treatment##i.post, Stata will drop post. You will still have treatment#post, which is your most important variable, but you will no longer be able to estimate what happened in the non-treatment country pairs following the onset of the treatment. If, however, you list i.treatment##i.post before i.year, you will lose a second year indicator to resolve the colinearity. Assuming that year is just a nuisance variable whose effects you want to adjust for, this is a better aproach. Try running this code to see this for yourself:

Code:

clear* // CREATE SOME ARTIFICIAL DATA set obs 10 set seed 1234 gen country_pair = _n gen u = rnormal(0, 0.25) expand 10 by country_pair, sort: gen year = 1994+_n gen byte treatment = (country_pair <= 5) gen byte post = (year > 1999) gen xb = 0.2*treatment + 0.2*post + treatment*post gen outcome = xb + u + rnormal(0, 0.25) xtset country_pair year // DIFFERENCE IN DIFFERENCES WITHOUT YEAR EFFECTS xtreg outcome i.treatment##i.post, fe // WITH YEAR EFFECTS ADDED AT END xtreg outcome i.treatment##i.post i.year, fe // WITH YEAR EFFECTS ADDED AT BEGINNING xtreg outcome i.year i.treatment##i.post, fe

That said, if you are mainly concerned about adjusting for the effects of the financial crisis, rather than using a set of year indicators, why not just use a dichotomous variable indicating the years during the financial crisis? Or perhaps better still some economic variables that quantify the effects of the financial crisis continuously (perhaps overall GDP growth rates in the countries in each pair). That strikes me as a better way of getting at that (though I am no economist--so consult your colleagues about this.)

The use of i.year indicators is a broader adjustment that is appropriate if there are large "shocks" to the outcome from year to year that must be adjusted for. But if you can pinpoint more specific influences such as the financial crisis, that seems to yield a more explanatory model. A model with indicators for specific years is inherently incapable of generalizing to other time periods, whereas a model with adjustments for a financial crisis might be generalizable to future periods that also contain a financial crisis.
1 like
Comment
joseph dover

Join Date: Jan 2016

Posts: 23
#5

01 Mar 2016, 07:40

Hi again Clyde!

I am a bit confused, when I include my time invariant variables in my FE regression I get a different result for the coefficient I am interested in than of I took them out, is this normal? if so which one should I use in this case?

Code:

xtset CountryPair YEAR xtreg lnTradeFlow i.year CommonLanguage lnPopulationSizej lnPopulationSizei lnGDPj lnGDPi lnDistance TREATMENT POST INTERACTION, fe robust xtreg lnTradeFlow i.year lnPopulationSizej lnPopulationSizei lnGDPj lnGDPi INTERACTION, fe robust

My coefficient of the interaction variable differ is these 2 regressions, although I removed "CommonLanguage" "lnDistance" "TREATMENT" "POST" variables which would of been omitted anyway....

and when I keep these variables, I realized that some country-pairs get omitted, therefore I have more observation for the FE without these variables added... I am not really sure what to do in this case?

Is it the correct way to include these omitted variables?

Thank you so much again for your help!!

Best Regards,

Joseph

Last edited by joseph dover; 01 Mar 2016, 08:30.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#6

01 Mar 2016, 09:41

and when I keep these variables, I realized that some country-pairs get omitted

This presumably happens because some observations have missing values for those variables, so including them causes those observations to be excluded from the estimation sample. So, of course, when you change the estimation sample, the results change.

I don't understand why you want to include the variables CommonLanguage and lnDistance in the model: they are constant within country pair and will automatically be dropped due to colinearity with the country-pair fixed effect. So at best, they will add nothing to your analysis. In this case, due to missing values, they are also eroding your estimation sample. Unless there is something about country pairs for which missingness of these variables signals that they are actually "not in universe" for your research question, omitting these observations is most unhelpful: at the least it decreases statistical power and it may very well introduce bias as well. So I think I would not include those variables.
1 like
Comment
Nathan E. Fosse

Join Date: Jul 2014

Posts: 66
#7

01 Mar 2016, 10:54

On those country-pair constant variables: I echo Clyde on the problem of including country-pair variables: in a fixed effects model they will be dropped.

You should see country-pair constant variables drop in your Stata output in FE panel model; otherwise you've got problems with your data. I've altered some of Clydes code to add two variables that are country-pair constants, just so you can see (and compare) with our own analyses (it's possible that your variables are not being dropped because they are not constants within country-pair, if there are slight changes year to year; nevertheless, if you're unable to model country-pair constants in a FE regression, then something is amiss with your constaints.)

You'll see the output with the constants dropped if you run Clyde's syntax with two contstants added.

On panel models that permit your country-pair variables: Now you'll see an option if you want to model those constants: multilevel/mixed models. In the artificial example below, I added a regression that permits you to simultateneously model country-specific models. The question is primarily theoretical: are the years a nuisance to be done away with, or are they part of the data structure you are interested in modeling?

A great resource: Finally, I'd strongly recommend Chapter 8 of Cameron and Trivedi, who have these models (and others). It provides example syntax of these and other models (stata). Start with page p. 255 first try the Population Averaged Measure (PA) with Unstructured Error (xtreg, pa). Then go from there to other models. Run a hausman test to help select FE vs. RE (e.g., p. 267, hausman FE_model RE_model , sigmamore). There's a lot of material, but it's worth reading.

Clyde's example, with (1) dropped constants (cons1 cons2) and (2) included constants in a mixed model

Code:

clear* // CREATE SOME ARTIFICIAL DATA set obs 10 set seed 1234 gen country_pair = _n gen u = rnormal(0, 0.25) * NB: add your country-level constants by country_pair, sort: gen cons1 = runiformint(1,5) by country_pair, sort: gen cons2 = runiformint(1,5) expand 10 by country_pair, sort: gen year = 1994+_n gen byte treatment = (country_pair <= 5) gen byte post = (year > 1999) gen xb = 0.2*treatment + 0.2*post + treatment*post gen outcome = xb + u + rnormal(0, 0.25) xtset country_pair year // WITH YEAR EFFECTS ADDED AT BEGINNING // NB: SEE DROPPED CONSTANTS, CONS1 AND CONS2 xtreg outcome i.year i.treatment##i.post cons1 cons2, fe // MIXED MODEL TO PERMIT INCLUSION OF CONSTANTS xtmixed outcome i.treatment##i.post cons1 cons2 || year:, reml

Nathan E. Fosse, PhD
[email protected]
2 likes
Comment

Vats Prith

Join Date: Jul 2017
Posts: 30

31 Jul 2017, 09:44

Code:

 
           |                                                total_asset_cat
      year |         0          1          2          3          4          5          6          7          8          9 |     Total
-----------+--------------------------------------------------------------------------------------------------------------+----------
      2010 |        70         41          9         31          2          4          0          0          1          2 |       160 
      2011 |       108         32         13          2          5          5          3          2          2          2 |       174 
      2012 |       100         26         24          8          5         12          5         13          1          1 |       195 
      2013 |       120         47         26         10          3         14          0          6          5          2 |       233 
      2014 |       122         48         22         10         22         14          2          5          8          4 |       257 
      2015 |        96         53         11         17          5          5          1         12          6          2 |       208 
-----------+--------------------------------------------------------------------------------------------------------------+----------
     Total |       616        247        105         78         42         54         11         38         23         13 |     1,227

Comment

Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#9

31 Jul 2017, 10:49

A difference in difference model without year dummies is biased if you have more than 2 periods. See wooldridges nber summer institute lecture. The most robust way to estimate a diff in diff is a two way fixed effects model with a binary policy indicator (treatment*post). There is no reason to interpret any coefficient in a diff in diff model but that one, because it is the only one that is identified given the assumptions of the model.
2 likes
Comment

Announcement

Difference in Differences model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment