Still struggling with parallel trends using -reghdfe-

Parul Gupta

Join Date: Jun 2020
Posts: 147

Still struggling with parallel trends using -reghdfe-

28 Jul 2024, 02:55

I am using -reghdfe- (ssc install) to run DID regression (Stata 17). Sample data given below:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(y1 x1) float(treat treatpost year) long _id
0 10 1 1 2022 1
0  8 1 0 2012 1
0  6 1 0 2011 1
1 16 1 0 2012 1
0  6 1 0 2018 1
0  9 1 0 2012 1
0  7 1 0 2013 1
0  5 1 0 2010 1
0  6 1 0 2018 1
1 13 1 0 2012 1
0  7 1 1 2022 1
0  9 1 0 2012 1
. 14 1 0 2018 1
0  8 1 0 2018 1
0  6 1 0 2013 1
1 14 1 0 2013 1
0  6 1 0 2018 1
0  7 1 1 2022 1
1 16 1 0 2013 1
.  . 1 0 2013 1
.  . 1 0 2013 1
.  . 1 0 2016 1
0  8 1 1 2022 1
1 14 1 0 2010 1
1 16 1 0 2010 1
0  6 1 0 2012 1
1 12 1 1 2022 1
0  9 1 0 2010 1
.  4 1 0 2013 1
.  3 1 0 2013 1
0 10 1 1 2022 1
1 14 1 0 2018 1
0  8 1 0 2011 1
1 16 1 0 2011 1
0 10 1 1 2022 1
0  7 1 0 2011 1
0 10 1 0 2013 1
0  7 1 0 2013 1
0  8 1 1 2022 1
0 10 1 0 2013 1
1 15 1 0 2011 1
0 13 1 1 2022 1
1 15 1 0 2018 1
0 10 1 0 2011 1
0  5 1 0 2016 1
0  7 1 0 2011 1
0 13 1 0 2010 1
.  . 1 0 2018 1
1 16 1 0 2012 1
.  4 1 0 2016 1
0  8 1 1 2022 1
0  5 1 0 2010 1
.  9 1 1 2022 1
0 11 1 0 2016 1
0  9 1 1 2022 1
1 12 1 0 2012 1
.  . 1 0 2011 1
.  . 1 0 2018 1
0  7 1 0 2011 1
0  5 1 0 2010 1
.  4 1 0 2014 1
0 16 1 0 2018 1
.  . 1 0 2014 1
0 11 1 0 2016 1
.  5 1 0 2016 1
1 15 1 0 2012 1
1 14 1 0 2012 1
1 10 1 0 2012 1
.  4 1 0 2016 1
.  4 1 1 2022 1
.  . 1 0 2018 1
0  9 1 0 2018 1
1 13 1 0 2011 1
.  8 1 0 2013 1
.  . 1 0 2013 1
1 13 1 0 2013 1
0 10 1 1 2022 1
0 12 1 0 2011 1
0 10 1 0 2016 1
.  8 1 0 2016 1
.  . 1 0 2014 1
1 16 1 0 2012 1
1 15 1 0 2013 1
1 16 1 0 2013 1
1 13 1 0 2012 1
1 15 1 0 2010 1
0 11 1 0 2018 1
.  . 1 0 2016 1
.  . 1 0 2013 1
.  4 1 0 2010 1
1 15 1 0 2013 1
0  7 1 0 2014 1
. 11 1 0 2013 1
0 10 1 0 2018 1
. 15 1 0 2018 1
0 13 1 0 2014 1
0 11 1 1 2022 1
1 16 1 0 2011 1
.  3 1 0 2018 1
0  5 1 0 2012 1
end

For DID, I am using the following commands:

Code:

gen treatpost=treat*(year>2018)

reghdfe y1 x1 treatpost i.year if year>=2014, absorb(_id) vce(cluster _id)

The treat variable identifies the regions which were given the treatment. The treatment occurred between 2018 and 2022, so I have generated treatpost as treat*(year>2018).

How can I test for parallel trends using -reghdfe-? I am unable to use didregress because my dataset is large and didregress is taking hours to work.

I posted a different query on parallel trends on https://www.statalist.org/forums/for...ss-and-reghdfe but I am still confused how to apply the solution to my real data. Specifically I am not sure why are the -evertreated- and -pretreat- variables needed. Can I just do it with:

Code:

reghdfe y1 x1 treatpost i.year if year<2014, absorb(_id) vce(cluster _id)

Tags: None

Maxence Morlet

Join Date: Mar 2021

Posts: 643
#2

28 Jul 2024, 06:55

Code:

reghdfe y 1.treat#1.year, cl(unit_id) abs(i.id i.time)

Look at significance on pre-treatment coefficients, test their joint significance as well.

The user-wrriten eventdd command is nice as well.

But these tests of parallel trends are generally underpowered.
Comment
Parul Gupta

Join Date: Jun 2020

Posts: 147
#3

28 Jul 2024, 07:07

Thanks, Maxence. I don't have unit_id variable in my dataset. Is it different from _id?

I will explore eventdd.

Is there a better way to test parallel trends?
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 643
#4

28 Jul 2024, 07:31

unit_id is a generic name, use whichever variable you wish to cluster by.

Not really that I know of. You need to argue that the policy you are studying, or at least its timing, was exogenous. Words in this context can be more powerful than tests
Comment
Parul Gupta

Join Date: Jun 2020

Posts: 147
#5

28 Jul 2024, 07:37

Ah ok, you were using it as a placeholder.

But, shouldn't
reghdfe y 1.treat#1.year, cl(unit_id) abs(i.id i.time) be
reghdfe y i.treat#i.year, cl(unit_id) abs(i.id i.time) instead? Thanks again for your suggestion on justifying the existence of pretrends.
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 643
#6

28 Jul 2024, 08:01

everything but 1.treat#1.year will be collinear with the fixed effects and dropped. You can do it if you like but the other coefficients will normally be dropped.
Comment
George Ford

Join Date: Aug 2014

Posts: 3115
#7

28 Jul 2024, 14:55

You say " The treatment occurred between 2018 and 2022,"

Is everyone treated at the same time? If not, then you need to use a staggered DID approach.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2111
#8

28 Jul 2024, 16:06

George: Looks like Parul only has data for 2018 and 2022 — not years in between.
Comment
George Ford

Join Date: Aug 2014

Posts: 3115
#9

28 Jul 2024, 16:08

If the data is between 2018-2022, and the did variable is defined as year>2018, then there is only 1 year before the treatment. no pre trend can be assessed.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2111
#10

28 Jul 2024, 18:00

It appears he has several pre-treatment years. What I meant was he only has one post-treatment year, and that's 2022. If you look at his data set, the year jumps from 2018 to 2022. But he has data at least for 2016, 2014, and even further back. Maxence has shown Parul what do do, including the test for pre-trends.
Comment

Parul Gupta

Join Date: Jun 2020
Posts: 147

#11

28 Jul 2024, 22:56

George and Jeff:

I have several pre-treatment years. Treatment was given between 2018 and 2022 all at once. But I don't have data for that year.

Maxence:

I tried your command but I didn't get the coefficient. Did I do something wrong?

Code:

 reghdfe y 1.treat#1.year, cl(_id) abs(i._id i.year)
(MWFE estimator converged in 5 iterations)

HDFE Linear regression                            Number of obs   =  3,628,758
Absorbing 2 HDFE groups                           F(   0,    626) =          .
Statistics robust to heteroskedasticity           Prob > F        =          .
                                                  R-squared       =     0.0692
                                                  Adj R-squared   =     0.0690
                                                  Within R-sq.    =     0.0000
Number of clusters (_id)     =        627         Root MSE        =     1.2831

                                  (Std. err. adjusted for 627 clusters in _id)
------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  treat#year |
        1 1  |          0  (empty)
             |
       _cons |   3.316087   1.63e-16  2.0e+16   0.000     3.316087    3.316087
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
         _id |       627         627           0    *|
        year |         8           1           7     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Comment

Parul Gupta

Join Date: Jun 2020
Posts: 147

#12

28 Jul 2024, 23:01

Continuation of #11.

If I use i.year, I get the coefficients. Should the individual coefficients be insignificant, just like we need after -estat ptrends-?

Code:

. reghdfe y 1.treat#i.year, cl(_id) abs(i._id i.year)
(MWFE estimator converged in 5 iterations)
note: 1.treat#2022.year omitted because of collinearity

HDFE Linear regression                            Number of obs   =  3,628,758
Absorbing 2 HDFE groups                           F(   7,    626) =       1.64
Statistics robust to heteroskedasticity           Prob > F        =     0.1206
                                                  R-squared       =     0.0692
                                                  Adj R-squared   =     0.0691
                                                  Within R-sq.    =     0.0001
Number of clusters (_id)     =        627         Root MSE        =     1.2831

                                  (Std. err. adjusted for 627 clusters in _id)
------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  treat#year |
     1 2010  |   .0273351   .0407635     0.67   0.503    -.0527146    .1073849
     1 2011  |  -.0393147   .0360164    -1.09   0.275    -.1100423    .0314129
     1 2012  |  -.0412551   .0337945    -1.22   0.223    -.1076195    .0251092
     1 2013  |  -.0048992   .0269106    -0.18   0.856    -.0577452    .0479467
     1 2014  |  -.0411359   .0240977    -1.71   0.088     -.088458    .0061863
     1 2016  |  -.0638787    .026855    -2.38   0.018    -.1166156   -.0111419
     1 2018  |   -.030544   .0233063    -1.31   0.190    -.0763119     .015224
     1 2022  |          0  (omitted)
             |
       _cons |    3.32027   .0036382   912.60   0.000     3.313125    3.327415
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
         _id |       627         627           0    *|
        year |         8           1           7     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Comment

George Ford

Join Date: Aug 2014

Posts: 3115
#13

29 Jul 2024, 09:17

A possible answer is here.

ptrends is looking for the same slope before the treatment. it uses all the data, but only considers the coefficient on the pre-treatment trend.

HTML Code:

https://www.statalist.org/forums/forum/general-stata-discussion/general/1759625-parallel-trends-after-didregress-and-reghdfe

In the dataex, all units are treated. Is that the case? If so, this is not DID. And, the id variable is always 1, yet the years repeat. Is this id variable correct?

Also, if the treatment occurs sometime between 2018/2022 and this varies by id, then you have a staggered treatment model but you can't estimate it as such since you don't know when the treatment occurred. There could be calendar year effects are heterogeneous treatment effects over time, so the DID coefficient is probably biased.

The clustered SE may be biased as well, given the large gap between 2018 and 2022. Maybe worth investigating.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2111
#14

29 Jul 2024, 10:46

The problem is that Stata chooses which collinear variables to drop. I will modify Maxence's command so that 2018 is chosen as the reference period -- as is most common. Note how 1.treat#c.d2018 is omitted, forcing it to be the reference period. The "test" command tests the null hypothesis that PT holds.

Code:

gen d2010 = year == 2010 gen d2011 = year == 2011 gen d2012 = year == 2012 gen d2013 = year == 2013 gen d2014 = year == 2014 gen d2016 = year == 2016 gen d2022 = year == 2022 reghdfe y 1.treat#c.d2010 1.treat#c.d2011 1.treat#c.d2012 1.treat#c.d2013 1.treat#c.d2014 1.treat#c.d2016 1.treat#c.d2022, cl(_id) abs(i._id i.year) test 1.treat#c.d2010 1.treat#c.d2011 1.treat#c.d2012 1.treat#c.d2013 1.treat#c.d2014 1.treat#c.d2016

Last edited by Jeff Wooldridge; 29 Jul 2024, 10:48.
2 likes
Comment
Bruno Paisani

Join Date: Oct 2021

Posts: 9
#15

17 Nov 2024, 03:18

Originally posted by Jeff Wooldridge View Post

The problem is that Stata chooses which collinear variables to drop. I will modify Maxence's command so that 2018 is chosen as the reference period -- as is most common. Note how 1.treat#c.d2018 is omitted, forcing it to be the reference period. The "test" command tests the null hypothesis that PT holds.

Code:

gen d2010 = year == 2010 gen d2011 = year == 2011 gen d2012 = year == 2012 gen d2013 = year == 2013 gen d2014 = year == 2014 gen d2016 = year == 2016 gen d2022 = year == 2022 reghdfe y 1.treat#c.d2010 1.treat#c.d2011 1.treat#c.d2012 1.treat#c.d2013 1.treat#c.d2014 1.treat#c.d2016 1.treat#c.d2022, cl(_id) abs(i._id i.year) test 1.treat#c.d2010 1.treat#c.d2011 1.treat#c.d2012 1.treat#c.d2013 1.treat#c.d2014 1.treat#c.d2016

Hi Professor Wooldridge.
In a test like that should I include controls in the regression? Also, should I care about the post-treatment interactions coefficients (if they are statistically significant)?
Comment

Announcement

Still struggling with parallel trends using -reghdfe-

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment