How to use difference in difference with many yearly observations and dummy-variables?

Susanne Daae

Join Date: Mar 2015

Posts: 22
#16

11 May 2015, 06:35

Thank you. I did not fully understand your code and what it does. The reason I need to use a DID-analysis is because I am trying to replicate the analysis of Giroud (2012), Proximity and investment: evidence from plant-level data, only with a different country. He used DID like this :

To examine the effects on plant-level investment and productivity, I use a difference-in-differences approach. I estimate:

Yijlt = ai + at + B x treatmentit + c Xijt + e

Where i indexes plants, j indexes firms, l indexes plant location, t indexes years, Yijlt is the dependent variable of interest (plant investment or productivity), ai and at are plant and year fixed effects, treatment is a dummy variable that equals 1 if a new airline route that reduces the travel time between plant i and its headquarters has been introduced by time t, X is a vector of control variables, and e is the error term.The main coefficient of interest is B, which measures the effect of the introduction of new airline routes.

I appreciate any help I have recieved and can get =)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#17

11 May 2015, 07:28

I admit now I'm a little confurse. The equation you have above, without Xijt, is exactly what I said to do in the first place. The ai are the fixed effects, the at are the year effects, and treamentit is the treatment variable. This is not a true "DID" because you cannot write the estimator as a difference in differenced means, but it is still called that. Well before DID it was a common way of using panel data for policy analysis. ai controls for differences across firms, at controls for secular changes across time, Xit controls for observed differences that might also be related to treatment assignment. Your mistake was looking at very specific, two-period DID analysis, which is a special case of the fixed effects approach.
1 like
Comment
Susanne Daae

Join Date: Mar 2015

Posts: 22
#18

11 May 2015, 07:40

Hehe I appreciate getting the right answer right away, my problem is converting the regression I want to Stata-codes, and understanding what each part of the code does. What I first didn't understand was the B x treatment part, which I now understand and have done regressions on (without getting statistically significant results).

The code you gave me, does that do all that to find Yijlt? Or do I need to do different analysis' to get the firm and year fixed effects? I do not fully understand the different elements of your code... =)
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9958

#19

11 May 2015, 09:30

It would have been so much easier if you started off by showing us this equation. What you have is referred to as a two-way fixed effects model.

Code:

Yijlt = ai + at + B x treatmentit + c Xijt + e

As Jeff points out

When T = 2 and there is a pre-treatment period for all units, the simple DID is the same as fixed effects estimation with a time dummy and the so-called interaction (the treatment dummy)

I will show this using the example - and the Stata code. I strongly recommend that you read about fixed effects and understand what you are doing.

Some notes: Estimating the model above amounts to including N-1 firm dummies and T-1 time dummies to to estimate the time invariant and firm invariant effects.

Code:

input ROA id  str8 route  year  
1.775622   1   "A TO C"   1998
3.83331    2   "A TO C"   1998
-5.210526  3   "A TO C"   1998
1.478725   4   "A TO C"   1998
-13.73461  5   "A TO C"   1998  
-8.754751  6   "A TO C"   1998
-2.822808  7   "A TO C"   1998
-.3456052  8   "A TO C"   1998
4.453937   9   "A TO C"   1998
-6.672886  10  "A TO C"   1998    
9.433187   1   "A TO B"   2000
8.438064   2   "A TO B"   2000
9.211845   3   "A TO B"   2000
8.478987   4   "A TO B"   2000
17.824034   5   "A TO B"   2000
1.470532   6   "A TO C"   2000
-8.817384  7   "A TO C"   2000
2.556952   8   "A TO C"   2000
-4.566959  9   "A TO C"   2000
3.711421   10  "A TO C"   2000
end

. reg  ROA i.id i.year  interaction

      Source |       SS       df       MS              Number of obs =      20
-------------+------------------------------           F( 11,     8) =    1.29
       Model |  714.428766    11  64.9480696           Prob > F      =  0.3669
    Residual |  402.570519     8  50.3213148           R-squared     =  0.6396
-------------+------------------------------           Adj R-squared =  0.1440
       Total |  1116.99928    19   58.789436           Root MSE      =  7.0938

------------------------------------------------------------------------------
         ROA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          id |
          2  |   .5312825   7.093752     0.07   0.942    -15.82694     16.8895
          3  |  -3.603745   7.093752    -0.51   0.625    -19.96197    12.75448
          4  |  -.6255484   7.093752    -0.09   0.932    -16.98377    15.73267
          5  |  -3.559692   7.093752    -0.50   0.629    -19.91791    12.79853
          6  |  -3.571822   7.770816    -0.46   0.658    -21.49136    14.34771
          7  |  -5.749808   7.770816    -0.74   0.480    -23.66934    12.16973
          8  |   1.175961   7.770816     0.15   0.883    -16.74357    19.09549
          9  |   .0137767   7.770816     0.00   0.999    -17.90576    17.93331
         10  |  -1.410445   7.770816    -0.18   0.860    -19.32998    16.50909
             |
        year |
       2000  |   1.699335   4.486483     0.38   0.715    -8.646512    12.04518
             |
interaction |   11.34938   6.344845     1.79   0.111    -3.281854    25.98062
       _cons |  -.9199552   5.494797    -0.17   0.871    -13.59098    11.75107
------------------------------------------------------------------------------

Using a fixed effects estimator, the firm effects are wiped out (you do not have to worry about them). Unfortunately, Stata does not have a true two-way fixed effects estimator, so you have to manually add the time dummies

Code:

. xtset id year
       panel variable:  id (strongly balanced)
        time variable:  year, 1998 to 2000, but with gaps
                delta:  1 unit

. xtreg ROA i.year interaction, fe

Fixed-effects (within) regression               Number of obs      =        20
Group variable: id                              Number of groups   =        10

R-sq:  within  = 0.5181                         Obs per group: min =         2
       between = 0.6677                                        avg =       2.0
       overall = 0.5552                                        max =         2

                                                F(2,8)             =      4.30
corr(u_i, Xb)  = 0.0547                         Prob > F           =    0.0539

------------------------------------------------------------------------------
         ROA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |
       2000  |   1.699335   4.486483     0.38   0.715    -8.646512    12.04518
             |
 interaction |   11.34938   6.344845     1.79   0.111    -3.281854    25.98062
       _cons |  -2.599959   2.243241    -1.16   0.280    -7.772883    2.572964
-------------+----------------------------------------------------------------
     sigma_u |  2.2924625
     sigma_e |  7.0937518
         rho |  .09456093   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(9, 8) =     0.21                Prob > F = 0.9848

Bear in mind that 11.34938 is the coefficient we obtained with the two-period DID estimator. Note that at the moment, you do not have any control variables specified: if you add them later on, they must vary over time and across firms (otherwise two-way fixed effects cannot estimate the impact of time invariant and firm invariant variables). For example, if you have the following 2 control variables: population and income, you add them into the regression as follows

Code:

xtreg ROA i.year interaction population income, fe cluster(id)

*and you have estimated the following equation:  Yit = ai + at + B x treatmentit + c Xit + e

Last edited by Andrew Musau; 11 May 2015, 09:38.

Comment

Susanne Daae

Join Date: Mar 2015

Posts: 22
#20

12 May 2015, 04:02

Thank you so much, the both of you! It is true that I do not yet grasp all the finer nuances of statistical analysis (and I am not a native English speaker), but your help has been invaluable in trying to fit all the pieces together to understand it. Thank you!
1 like
Comment
Tristan Gillard

Join Date: Dec 2016

Posts: 3
#21

05 Dec 2016, 03:26

Hello everybody. My research is pretty similar to Susanne's. I am trying to quantify the effect of the Announcement-and-Effective Construction of a new transit line (called RER) on real estate's prices. I have quarterly panel data for each year from 1990 to 2014. I'm using a difference in difference methodology. The new transit line will go trough 30 counties (treatment group) and I have 62 other counties (control group). I thus have 25 year of quarterly data over 1° house prices (which is my dependant variable) 2°A lot of control variables about the counties like the density of foreigners, the density of inhabitants per kilometer, the average wages of a county,... (I have all this data for all years). The Announcement took place in 1999 and the effective construction in 2004. Thus I created a dummy variable RER which is equal to one (for all years) for counties where the new transit line will go through. And two other dummy variable Announce (=1 from 1999 to 2014) and Construction (=1 from 2004 to 2014) and 2 Interaction variable; RER*Announcement and RER*Construction. I want to take into account time fixed effects and counties fixed effect.

What model/estimator should I use?

option 1: A Fixed effect Model

In Stata 14.0 I have to write;

gen Announcerer = Annonce*RER
gen Constrer = Construction*RER
encode time, gen(time2)
xtset ID time2, quarterly

xtreg lprices RER Announcement Construction Annoucerer Constrer wages densityperkm foreignersdensity i.time2, fe vce cluster(ID)

option 2: A Random effect Model

xtreg lprices RER Announcement Construction Annoucerer Constrer wages densityperkm foreignersdensity i.time2, vce cluster(ID)

option 3: A simple DID Model

reg lprices RER Announcement Construction Annoucerer Constrer wages densityperkm foreignersdensity i.time2, r

Questions:

-In Wooldridge's book (Introduction to Econometrics, 2014) it says on p.398 "Fixed effects allows arbitrary correlation between a_i and x_itj while random effects does not" he Hausman test tells me to choose option 2, the random effect model. However, I have reasons to believe that the a_iand x_itj are correlated. (For example, the distance to the capital and biggest city in the country Brussels which is time invariant and thus in the a_i might be correlated with my explanatory variable density of population. Should I believe the Hausman test or my intuition?
-What about option 3? In Woolridge's book, there is a similar example with the construction of an incinerator on p. 366 and he used a difference-in-difference estimator. However, there is only 2 years of observation, I have 25...
-I have some problem with econometrical terminology. If I am using option 1, should I say that I am using a difference of difference model, a fixed effect model or a fixed effect model with a difference in difference methodology?

Thank you
Comment
Flora Yin

Join Date: Aug 2021

Posts: 68
#22

26 Mar 2022, 23:51

Hi Andrew,

I happened to read this post and I'm doing something similar. I read the "Proximity and investment: evidence from plant-level data" paper, and I think their model

Yijlt = ai + at + B x treatmentit + c Xijt + e

is different from the one in #19

xtreg ROA i.year interaction population income, fe cluster(id)
*and you have estimated the following equation: Yit = ai + at + B x treatmentit + c Xit + e

Because the interaction variable should be treatment*post (as you wrote in #12), so it is different from B x treatmentit the paper uses. I wonder which is correct?

Thanks!
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9958

#23

27 Mar 2022, 07:06

The treatment indicator is always defined as equal to one if unit \(i\) at time \(t\) was subject to the treatment and zero otherwise. Therefore, for treated units, it is zero for pre-treatment years and turns on (changes to one) once the treatment is initiated. So in TWFE, the treatment indicator is not defined as a treated unit. If it were, you would not be able to identify its coefficient as it would be collinear with the unit fixed effects. This simple example illustrates: Suppose that firms 1, 3 and 5 were treated in the Grunfeld dataset and we define the treatment indicator as treated firms. We would not be able to get a coefficient on the treatment indicator as its identification relies on the existence of a pre-treatment period for the treated firms.

Code:

webuse grunfeld, clear
gen treatment= inlist(company, 1, 3, 5)
xtreg invest mvalue kstock treatment i.time, fe

Res.:

Code:

. xtreg invest mvalue kstock treatment i.time, fe
note: treatment omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        200
Group variable: company                         Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.7985                                         min =         20
     between = 0.8143                                         avg =       20.0
     overall = 0.8068                                         max =         20

                                                F(21,169)         =      31.90
corr(u_i, Xb)  = -0.3250                        Prob > F          =     0.0000

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1177158   .0137513     8.56   0.000     .0905694    .1448623
      kstock |   .3579163    .022719    15.75   0.000     .3130667    .4027659
   treatment |          0  (omitted)
             |
        time |
          2  |  -19.19741   23.67586    -0.81   0.419    -65.93593    27.54112
          3  |  -40.69001   24.69541    -1.65   0.101    -89.44122    8.061213
          4  |   -39.2264   23.23594    -1.69   0.093    -85.09647    6.643667
          5  |  -69.47029   23.65607    -2.94   0.004    -116.1698   -22.77083
          6  |  -44.23507   23.80979    -1.86   0.065      -91.238     2.76785
          7  |  -18.80446     23.694    -0.79   0.429     -65.5788    27.96987
          8  |  -21.13979   23.38163    -0.90   0.367    -67.29748    25.01789
          9  |  -42.97762   23.55287    -1.82   0.070    -89.47334    3.518104
         10  |  -43.09876    23.6102    -1.83   0.070    -89.70766    3.510134
         11  |  -55.68303   23.89561    -2.33   0.021    -102.8554   -8.510689
         12  |  -31.16928   24.11598    -1.29   0.198    -78.77665    16.43809
         13  |  -39.39223   23.78368    -1.66   0.100    -86.34361    7.559141
         14  |  -43.71651   23.96965    -1.82   0.070    -91.03501    3.601991
         15  |   -73.4951   24.18292    -3.04   0.003    -121.2346   -25.75559
         16  |  -75.89611   24.34553    -3.12   0.002    -123.9566    -27.8356
         17  |   -62.4809   24.86425    -2.51   0.013    -111.5654   -13.39637
         18  |  -64.63233    25.3495    -2.55   0.012    -114.6748   -14.58987
         19  |  -67.71796   26.61108    -2.54   0.012    -120.2509   -15.18501
         20  |  -93.52622   27.10786    -3.45   0.001    -147.0399   -40.01257
             |
       _cons |  -32.83631   18.87533    -1.74   0.084     -70.0981    4.425483
-------------+----------------------------------------------------------------
     sigma_u |  91.798268
     sigma_e |  51.724523
         rho |  .75902159   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(9, 169) = 52.36                     Prob > F = 0.0000

.

Last edited by Andrew Musau; 27 Mar 2022, 07:55.

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment