Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diff-in-diff approach when implementation of policy is "staggered"

    Hi Statalist,

    New to this forum and apologize in advance if this question already has been answered in some other post.

    Anyway, I have a longitudinal panel dataset containing observations for 290 regions, years 1985-2014. I want to investigate whether a policy that was introduced in 2002 has had an affect on voter turnout, which is the outcome variable. But, since the regions themselves decide whether or not to implement the policy, year of implementation varies for each region. Some regions never implemented the policy (untreated for all years). To clarify, 2002 was the first year regions could implement the policy.

    My treatment variable is a dummy, which takes the value 1 for all years that the region has had the policy, and 0 otherwise. Hence, treatment variable for untreated regions equals 0 for all years. I have various controls for all regions and the panel is strongly balanced.

    My thought was to tackle this with a diff-in-diff approach and to include entity and time fixed effects. However, I am a bit confused how to handle the staggered implementation in the different regions. Does anyone have any suggestions regarding code and model?

    I assume that the xtreg command is preferable for this setting?

    Best regards

  • #2
    See https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf for a lucid explanation of generalized difference-in-differences modeling, which applies to your situation.

    You need a variable, call it treat, which is 1 in the group that receives the treatment (and is 1 in those observations at all times, including before treatment started) and 0 in all observations for the untreated group, another which we can call active_treatment which is 1 in the treatment group after treatment begins, but is 0 in the treatment group before treatment begins and is 0 in all observations in the control group. Then you do a fixed effects regresion that looks more or less like this:

    Code:
    xtset region
    xtreg outcome i.treat i.activetreatment i.year, fe // OR RE AS THE CASE MAY BE
    The coefficient of activetreatment is the DID estimator of the effect of treatment.

    Comment


    • #3
      Thanks for the answer!

      If I also want to evaluate the effect of policy for a certain group/cohort of regions that implemented policy in the same year, i.e. the group of regions who chose to implement in 2003 for example. How would I approach this?

      I was thinking of doing this for all of the years which have a "larger" amount of regions that implemented during that specific year.


      Comment


      • #4
        Well, I suppose I would do a separate analysis in each of those subsets. Because each of those subsets would have simultaneous implementation of the policy in the treatment group, you could revert there to the classical DID analysis.

        Comment


        • #5
          Hi All,

          I am working on a similar analysis in which I want to estimate a treatment effect within a monthly panel model where I have staggered start times. I appreciate Clyde's answer above, but am having issues obtaining results within a FE model. My variable which is coded like the variable "treat" as above is dropped from the FE model because it is "constant within group" as all observations for this variable are coded 1 for the treatment group.

          I am able to estimate a RE or PA model, however, and I am wondering if either of those are actually the estimate I am interested in. My code for the models are shown below

          "type" is equivalent to Clyde's "treat" variable above
          "treat" is equivalent to Clyde's "activetreatment" variable above

          xtset geoid
          xtpoisson shoot type i.treat i,year_month, fe //The estimate for type is dropped because it is constant within the group (geoid)
          xtpoisson shoot type i.treat i,year_month, re //Provides an estimate for both the type and treat variables.

          Thank you for any guidance.


          Comment


          • #6
            Yes i.type is dropped in a fixed effects model because it is constant within group. That is not a problem. The information it would otherwise provide is carried in the fixed effects themselves. If it bothers you aesthetically to see the warning about the omission, you can rerun the model leaving i.type out--the results will be in all other respects the same. Or you can just ignore it and work with what you have.

            I would add that in most situations when using a DID estimator we are interested in the within-unit-of-analysis effect of the intervention, so that the fixed effects estimator is usually more appropriate than re or pa (which are blends of within- and between- effects).

            Comment


            • #7
              Thank you, Clyde for your thoughtful and rapid response.

              One more question. Would you favor this approach over another when the activetreatment variable is coded 1 for both the treatment and matched control areas and specifying an interaction term? So that:

              "type" is equivalent to your treatment variable above
              "intervention" is coded 1 for both treatment and control areas during the treatment periods for each site

              xtpoisson shoot type##implementation i.timeperiod, fe

              It seems using the interaction you could estimate the change in the DV in the control areas during the implementation period (the coefficient for the implementation variable) as well as the difference between the treated and control groups during that period (the interaction term). Am I correct in that interpretation? Or does the use of FE change this in a way that I am not understanding?

              Thank you.


              Comment


              • #8
                I'm not sure I follow you here.


                "type" is equivalent to your treatment variable above
                "intervention" is coded 1 for both treatment and control areas during the treatment periods for each site

                xtpoisson shoot type##implementation i.timeperiod, fe
                I don't understand what this "intervention" variable is. The control areas don't have a treatment period. This whole thread is predicated on the notion that the intervention takes effect at different times in different units. So it is not possible to define a simple pre-post variable that is applicable to everything. But it appears that is what you are looking to do here. If you are in a situation where the intervention is applied simultaneously to all of the treated entities, then that is best done with a classical DID approach, where you have a variable "treat" that is 1 for entities that eventually get the intervention and 0 for those that never do, and a pre-post variable distinguishing the pre- and post- intervention periods. Then you run -xtreg outcome i.treat##i.pre_post, fe- (perhaps with covariates, robust vce, etc.) and the coefficient of 1.treat#1.pre_post is the DID estimator of the intervention effect. (And, yes, the "main effect" of treat will be omitted because of colinearity with the fixed effects.)

                What is the variable "implementation" that you introduce in that -xtpoisson- command?

                Comment


                • #9
                  I should back up and state that I have a data set which contains a series of statistically matched pairs (treatment and control groups) for which I have a variable that identifies the pairs. Importantly treatment did not begin at the same time for all of the treatment areas, it is staggered at different points in time.

                  Using my pair indicator, I created a variable "implementation" that is equal to 1 for both the treatment areas and their matched comparisons during the periods in which treatment was occurring (in the treatment areas).

                  A crosstab of the two variables would look something like this (10 treatment areas with 3 treatment periods (again staggered in time):
                  Click image for larger version

Name:	crosstab.PNG
Views:	1
Size:	1.2 KB
ID:	1503125


                  I am sorry if this is an inappropriate or uneducated question, I am trying to get my head around the staggered DID model having been familiar with the traditional DID.

                  Thanks again.
                  Last edited by Kevin Wolff; 13 Jun 2019, 13:51.

                  Comment


                  • #10
                    OK. I see what you've got here. Using type and implementation is a bit of a mish-mash. Since you are working with matched pairs, you can define a pre-post variable. For an entity that ultimately gets the intervention, pre_post is 0 in the time periods preceding intervention and is 1 afterward. Then, for the entity that is its matched control, you set pre_post to be equal to the value of pre_post for the intervened-upon member of the pair in the same time periods. Then you can run the DID estimation just like a classical DID, with one exception: your analysis has to account for the matched pairs. In my view, the best way to do this is with a multi-level model that looks like this:

                    Code:
                    mepoisson shoot i.type##i.pre_post || matched_pair_id: || unit_id:
                    The generalized DID approach is not intended for use with matched-pairs.


                    Comment


                    • #11
                      Thank you again, Clyde, for your help thinking this one through. I appreciate your guidance.

                      Out of curiosity, does the approach you have outlined above have a specific name? Or would describing it as a DID approach that accounts for the nesting of observations within matched pairs (and panel data) suffice?

                      Comment


                      • #12
                        I don't know of any other name for it. I would just call it a DID approach with matched pairs of exposed and unexposed and longitudinal data, as you suggested.

                        Comment


                        • #13
                          Originally posted by Clyde Schechter View Post
                          See https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf for a lucid explanation of generalized difference-in-differences modeling, which applies to your situation.

                          You need a variable, call it treat, which is 1 in the group that receives the treatment (and is 1 in those observations at all times, including before treatment started) and 0 in all observations for the untreated group, another which we can call active_treatment which is 1 in the treatment group after treatment begins, but is 0 in the treatment group before treatment begins and is 0 in all observations in the control group. Then you do a fixed effects regresion that looks more or less like this:

                          Code:
                          xtset region
                          xtreg outcome i.treat i.activetreatment i.year, fe // OR RE AS THE CASE MAY BE
                          The coefficient of activetreatment is the DID estimator of the effect of treatment.
                          Hello Clyde,

                          I perfectly follow your solution to Oskar's question. I have a question that has two legs: the first leg is what you answered with regard to Oskar's question. The second leg is as below:
                          I am analyzing a policy and following from my theoretical model, the policy (variable) induces a kind of spatio-temporal interactions among the economic agents via another variable (say, S) such that in the econometric specification I need to include an interaction between activetreatment and S. My model looks like this:

                          outcome = bo + b1treat + b2activetreatment + b3activetreatment x S + b4S + year_fixed_effects + unit_fixed_effects + e

                          And with a panel data I am using the following code:

                          xtreg outcome i.treat activetreatment##c.S i.year, fe

                          My question here is: In my model above, what should I report as the impact of the policy? Is the coefficient of activetreatment (b2 ) still the DID estimator? Or the DID estimator should be b2 + b3xS ? If the latter is the DID estimator, how do I report it? Just add the estimates?

                          Thank you.

                          Comment


                          • #14
                            what should I report as the impact of the policy?
                            Nothing. There is no such thing. There are infinitely many impacts of the policy, and they depend on the value of S. There is no one that can be called "the" impact.

                            Is the coefficient of activetreatment (b2 ) still the DID estimator? Or the DID estimator should be b2 + b3xS ? If the latter is the DID estimator, how do I report it? Just add the estimates?
                            b2 + b3*S is the formula for the policy impact conditional on S. Whether you report this as a formula, or whether you produce a table or graph showing the value of the impact for selected interesting or important or common values of S depends on your audience's expectations.

                            Comment


                            • #15
                              Thank you Clyde for your support.

                              Apologies if my line of reasoning is quite naive. I am new to the DID estimation. I am not too sure but perhaps a more appropriate way to frame my question should be: what should I report as the impact of the policy on the outcome variable given my specification. I am sorry but as it stands now, I am still not clear as to what to refer to as the DID estimator.

                              To be clearer, I am trying to argue that given the background (of the mechanics) of the policy variable and how it affects the outcome variable, a DID estimation that does not incorporate S with its interaction with activetreatment would not be capturing completely or appropriately the effect/impact of the policy on the outcome variable. It is in this light that I wanted to know what the DID estimator is given my specification. So kindly permit me to ask you for confirmation as to what I should be referring to in my specification as the DID estimator. I quite understand the interpretation of the coefficient of a binary variable interacted with a continuous variable in regular regressions. However, my confusion here is that the binary variable is in the context of a DID estimation (ie. activetreatment). Is activetreatment still the DID estimator? and how do I interpret the coefficients b2 and b3?

                              Thank you. Apologies for the long write-up.

                              Comment

                              Working...
                              X