Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Still struggling with parallel trends using -reghdfe-

    I am using -reghdfe- (ssc install) to run DID regression (Stata 17). Sample data given below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(y1 x1) float(treat treatpost year) long _id
    0 10 1 1 2022 1
    0  8 1 0 2012 1
    0  6 1 0 2011 1
    1 16 1 0 2012 1
    0  6 1 0 2018 1
    0  9 1 0 2012 1
    0  7 1 0 2013 1
    0  5 1 0 2010 1
    0  6 1 0 2018 1
    1 13 1 0 2012 1
    0  7 1 1 2022 1
    0  9 1 0 2012 1
    . 14 1 0 2018 1
    0  8 1 0 2018 1
    0  6 1 0 2013 1
    1 14 1 0 2013 1
    0  6 1 0 2018 1
    0  7 1 1 2022 1
    1 16 1 0 2013 1
    .  . 1 0 2013 1
    .  . 1 0 2013 1
    .  . 1 0 2016 1
    0  8 1 1 2022 1
    1 14 1 0 2010 1
    1 16 1 0 2010 1
    0  6 1 0 2012 1
    1 12 1 1 2022 1
    0  9 1 0 2010 1
    .  4 1 0 2013 1
    .  3 1 0 2013 1
    0 10 1 1 2022 1
    1 14 1 0 2018 1
    0  8 1 0 2011 1
    1 16 1 0 2011 1
    0 10 1 1 2022 1
    0  7 1 0 2011 1
    0 10 1 0 2013 1
    0  7 1 0 2013 1
    0  8 1 1 2022 1
    0 10 1 0 2013 1
    1 15 1 0 2011 1
    0 13 1 1 2022 1
    1 15 1 0 2018 1
    0 10 1 0 2011 1
    0  5 1 0 2016 1
    0  7 1 0 2011 1
    0 13 1 0 2010 1
    .  . 1 0 2018 1
    1 16 1 0 2012 1
    .  4 1 0 2016 1
    0  8 1 1 2022 1
    0  5 1 0 2010 1
    .  9 1 1 2022 1
    0 11 1 0 2016 1
    0  9 1 1 2022 1
    1 12 1 0 2012 1
    .  . 1 0 2011 1
    .  . 1 0 2018 1
    0  7 1 0 2011 1
    0  5 1 0 2010 1
    .  4 1 0 2014 1
    0 16 1 0 2018 1
    .  . 1 0 2014 1
    0 11 1 0 2016 1
    .  5 1 0 2016 1
    1 15 1 0 2012 1
    1 14 1 0 2012 1
    1 10 1 0 2012 1
    .  4 1 0 2016 1
    .  4 1 1 2022 1
    .  . 1 0 2018 1
    0  9 1 0 2018 1
    1 13 1 0 2011 1
    .  8 1 0 2013 1
    .  . 1 0 2013 1
    1 13 1 0 2013 1
    0 10 1 1 2022 1
    0 12 1 0 2011 1
    0 10 1 0 2016 1
    .  8 1 0 2016 1
    .  . 1 0 2014 1
    1 16 1 0 2012 1
    1 15 1 0 2013 1
    1 16 1 0 2013 1
    1 13 1 0 2012 1
    1 15 1 0 2010 1
    0 11 1 0 2018 1
    .  . 1 0 2016 1
    .  . 1 0 2013 1
    .  4 1 0 2010 1
    1 15 1 0 2013 1
    0  7 1 0 2014 1
    . 11 1 0 2013 1
    0 10 1 0 2018 1
    . 15 1 0 2018 1
    0 13 1 0 2014 1
    0 11 1 1 2022 1
    1 16 1 0 2011 1
    .  3 1 0 2018 1
    0  5 1 0 2012 1
    end
    For DID, I am using the following commands:

    Code:
    gen treatpost=treat*(year>2018)
    
    reghdfe y1 x1 treatpost i.year if year>=2014, absorb(_id) vce(cluster _id)
    The treat variable identifies the regions which were given the treatment. The treatment occurred between 2018 and 2022, so I have generated treatpost as treat*(year>2018).

    How can I test for parallel trends using -reghdfe-? I am unable to use didregress because my dataset is large and didregress is taking hours to work.

    I posted a different query on parallel trends on https://www.statalist.org/forums/for...ss-and-reghdfe but I am still confused how to apply the solution to my real data. Specifically I am not sure why are the -evertreated- and -pretreat- variables needed. Can I just do it with:


    Code:
    reghdfe y1 x1 treatpost i.year if year<2014, absorb(_id) vce(cluster _id)
    ?

  • #2
    Code:
    reghdfe y 1.treat#1.year, cl(unit_id) abs(i.id i.time)
    Look at significance on pre-treatment coefficients, test their joint significance as well.

    The user-wrriten eventdd command is nice as well.

    But these tests of parallel trends are generally underpowered.

    Comment


    • #3
      Thanks, Maxence. I don't have unit_id variable in my dataset. Is it different from _id?

      I will explore eventdd.

      Is there a better way to test parallel trends?

      Comment


      • #4
        unit_id is a generic name, use whichever variable you wish to cluster by.

        Not really that I know of. You need to argue that the policy you are studying, or at least its timing, was exogenous. Words in this context can be more powerful than tests

        Comment


        • #5
          Ah ok, you were using it as a placeholder.

          But, shouldn't
          reghdfe y 1.treat#1.year, cl(unit_id) abs(i.id i.time) be
          reghdfe y i.treat#i.year, cl(unit_id) abs(i.id i.time) instead? Thanks again for your suggestion on justifying the existence of pretrends.

          Comment


          • #6
            everything but 1.treat#1.year will be collinear with the fixed effects and dropped. You can do it if you like but the other coefficients will normally be dropped.

            Comment


            • #7
              You say " The treatment occurred between 2018 and 2022,"

              Is everyone treated at the same time? If not, then you need to use a staggered DID approach.

              Comment


              • #8
                George: Looks like Parul only has data for 2018 and 2022 — not years in between.

                Comment


                • #9
                  If the data is between 2018-2022, and the did variable is defined as year>2018, then there is only 1 year before the treatment. no pre trend can be assessed.

                  Comment


                  • #10
                    It appears he has several pre-treatment years. What I meant was he only has one post-treatment year, and that's 2022. If you look at his data set, the year jumps from 2018 to 2022. But he has data at least for 2016, 2014, and even further back. Maxence has shown Parul what do do, including the test for pre-trends.

                    Comment


                    • #11
                      George and Jeff:

                      I have several pre-treatment years. Treatment was given between 2018 and 2022 all at once. But I don't have data for that year.

                      Maxence:

                      I tried your command but I didn't get the coefficient. Did I do something wrong?

                      Code:
                       reghdfe y 1.treat#1.year, cl(_id) abs(i._id i.year)
                      (MWFE estimator converged in 5 iterations)
                      
                      HDFE Linear regression                            Number of obs   =  3,628,758
                      Absorbing 2 HDFE groups                           F(   0,    626) =          .
                      Statistics robust to heteroskedasticity           Prob > F        =          .
                                                                        R-squared       =     0.0692
                                                                        Adj R-squared   =     0.0690
                                                                        Within R-sq.    =     0.0000
                      Number of clusters (_id)     =        627         Root MSE        =     1.2831
                      
                                                        (Std. err. adjusted for 627 clusters in _id)
                      ------------------------------------------------------------------------------
                                   |               Robust
                                 y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                      -------------+----------------------------------------------------------------
                        treat#year |
                              1 1  |          0  (empty)
                                   |
                             _cons |   3.316087   1.63e-16  2.0e+16   0.000     3.316087    3.316087
                      ------------------------------------------------------------------------------
                      
                      Absorbed degrees of freedom:
                      -----------------------------------------------------+
                       Absorbed FE | Categories  - Redundant  = Num. Coefs |
                      -------------+---------------------------------------|
                               _id |       627         627           0    *|
                              year |         8           1           7     |
                      -----------------------------------------------------+
                      * = FE nested within cluster; treated as redundant for DoF computation

                      Comment


                      • #12
                        Continuation of #11.

                        If I use i.year, I get the coefficients. Should the individual coefficients be insignificant, just like we need after -estat ptrends-?

                        Code:
                        . reghdfe y 1.treat#i.year, cl(_id) abs(i._id i.year)
                        (MWFE estimator converged in 5 iterations)
                        note: 1.treat#2022.year omitted because of collinearity
                        
                        HDFE Linear regression                            Number of obs   =  3,628,758
                        Absorbing 2 HDFE groups                           F(   7,    626) =       1.64
                        Statistics robust to heteroskedasticity           Prob > F        =     0.1206
                                                                          R-squared       =     0.0692
                                                                          Adj R-squared   =     0.0691
                                                                          Within R-sq.    =     0.0001
                        Number of clusters (_id)     =        627         Root MSE        =     1.2831
                        
                                                          (Std. err. adjusted for 627 clusters in _id)
                        ------------------------------------------------------------------------------
                                     |               Robust
                                   y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                          treat#year |
                             1 2010  |   .0273351   .0407635     0.67   0.503    -.0527146    .1073849
                             1 2011  |  -.0393147   .0360164    -1.09   0.275    -.1100423    .0314129
                             1 2012  |  -.0412551   .0337945    -1.22   0.223    -.1076195    .0251092
                             1 2013  |  -.0048992   .0269106    -0.18   0.856    -.0577452    .0479467
                             1 2014  |  -.0411359   .0240977    -1.71   0.088     -.088458    .0061863
                             1 2016  |  -.0638787    .026855    -2.38   0.018    -.1166156   -.0111419
                             1 2018  |   -.030544   .0233063    -1.31   0.190    -.0763119     .015224
                             1 2022  |          0  (omitted)
                                     |
                               _cons |    3.32027   .0036382   912.60   0.000     3.313125    3.327415
                        ------------------------------------------------------------------------------
                        
                        Absorbed degrees of freedom:
                        -----------------------------------------------------+
                         Absorbed FE | Categories  - Redundant  = Num. Coefs |
                        -------------+---------------------------------------|
                                 _id |       627         627           0    *|
                                year |         8           1           7     |
                        -----------------------------------------------------+
                        * = FE nested within cluster; treated as redundant for DoF computation

                        Comment


                        • #13
                          A possible answer is here.

                          ptrends is looking for the same slope before the treatment. it uses all the data, but only considers the coefficient on the pre-treatment trend.

                          HTML Code:
                          https://www.statalist.org/forums/forum/general-stata-discussion/general/1759625-parallel-trends-after-didregress-and-reghdfe
                          In the dataex, all units are treated. Is that the case? If so, this is not DID. And, the id variable is always 1, yet the years repeat. Is this id variable correct?

                          Also, if the treatment occurs sometime between 2018/2022 and this varies by id, then you have a staggered treatment model but you can't estimate it as such since you don't know when the treatment occurred. There could be calendar year effects are heterogeneous treatment effects over time, so the DID coefficient is probably biased.

                          The clustered SE may be biased as well, given the large gap between 2018 and 2022. Maybe worth investigating.


                          Comment


                          • #14
                            The problem is that Stata chooses which collinear variables to drop. I will modify Maxence's command so that 2018 is chosen as the reference period -- as is most common. Note how 1.treat#c.d2018 is omitted, forcing it to be the reference period. The "test" command tests the null hypothesis that PT holds.

                            Code:
                            gen d2010 = year == 2010
                            gen d2011 = year == 2011
                            gen d2012 = year == 2012
                            gen d2013 = year == 2013
                            gen d2014 = year == 2014
                            gen d2016 = year == 2016
                            gen d2022 = year == 2022
                            reghdfe y 1.treat#c.d2010 1.treat#c.d2011 1.treat#c.d2012 1.treat#c.d2013 1.treat#c.d2014 1.treat#c.d2016 1.treat#c.d2022, cl(_id) abs(i._id i.year)
                            test 1.treat#c.d2010 1.treat#c.d2011 1.treat#c.d2012 1.treat#c.d2013 1.treat#c.d2014 1.treat#c.d2016
                            Last edited by Jeff Wooldridge; 29 Jul 2024, 11:48.

                            Comment


                            • #15
                              Originally posted by Jeff Wooldridge View Post
                              The problem is that Stata chooses which collinear variables to drop. I will modify Maxence's command so that 2018 is chosen as the reference period -- as is most common. Note how 1.treat#c.d2018 is omitted, forcing it to be the reference period. The "test" command tests the null hypothesis that PT holds.

                              Code:
                              gen d2010 = year == 2010
                              gen d2011 = year == 2011
                              gen d2012 = year == 2012
                              gen d2013 = year == 2013
                              gen d2014 = year == 2014
                              gen d2016 = year == 2016
                              gen d2022 = year == 2022
                              reghdfe y 1.treat#c.d2010 1.treat#c.d2011 1.treat#c.d2012 1.treat#c.d2013 1.treat#c.d2014 1.treat#c.d2016 1.treat#c.d2022, cl(_id) abs(i._id i.year)
                              test 1.treat#c.d2010 1.treat#c.d2011 1.treat#c.d2012 1.treat#c.d2013 1.treat#c.d2014 1.treat#c.d2016
                              Hi Professor Wooldridge.
                              In a test like that should I include controls in the regression? Also, should I care about the post-treatment interactions coefficients (if they are statistically significant)?

                              Comment

                              Working...
                              X