Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference three periods and variation in treatment group

    Hello stata experts,

    I am relatively new to stata and want to regress a difference in difference estimator. With this I want to estimate the effect of a political treatment within Germany. The treatment was introduced by different federal states at different times.

    I have data regarding one pre-period (1998), when non of the federal states had the treatment. One post-period (2007) when all states in the treatment group have the treatment and on third period. In this third period (2004) some of the states in the treatment group have introduced the measure and some did not.

    For a normal DID estimator I would simply define a post dummy and a treatment dummy as well as an interaction term, but I do not know how to include the third period.

    My idea was to ad a post dummy for every federal state like:

    gen Post_nrw=0

    replace Post_nrw=1 if Time>=2004

    gen Post_hb=0

    replace Post_hb=1 if Time>=2006

    gen Post_sh=0

    replace Post_sh=1 if Time>=2006

    gen Post_be=0

    replace Post_be=1 if Time>=2004


    Then I woud define a treatment group in which al states that adopt the treatment are included and calculate an interaction term for every post dummy like:

    gen interaction_nrw= Treatmentgroup * Post_nrw

    gen interaction_hb= Treatmentgroup * Post_hb

    gen interaction_sh= Treatmentgroup * Post_sh

    gen interaction_be= Treatmentgroup * Post_be

    and then I would regress:

    Y = Post_nrw BundesTreatmentgroup interaction_nrw Post_hb BundesTreatmentgroup interaction_hb... and so on for every federal state.


    Can you tell me if it is possible to calculate a DID estimator like this and if not how it is possible to include the third period?

    Thank you in advance.

    Kind regards,

    Hilz

  • #2
    Not exactly.

    You need to have a single pre-post variable that is defined for all of the federal states. For those in the treatment group, the pre-post variable should be 1 starting with the year in which that state adopted the treatment and remaining 1 thereafter; 0 before that. The difficulty with DID analysis in your situation is that it is not automatically possible to define the pre-post variable in the control group. When all states adopt the policy at the same time, then it is easy: if, say, that happened in year 2005, then -gen pre_post = (year >= 2005)- does it for all treatment and control states.

    But since different states went into treatment in different years, it isn't obvious how to define pre_post in the control group. There are basically three conceptually distinct approaches to solving this problem. They all rely on the same idea: the pre_post variable should show, in some sense, the time when the control state "would have" adopted the treatment if it had been in the treatment group.

    Approach 1. Since these are states, the decision to adopt treatment may have been made by a vote of the state's legislature. Different states may have held their votes in different years. The control states voted no, and the treatment states voted yes. In this case, letting pre_post start at 0 and then become 1 in the year when each legislature held its vote would completely capture the spirit of a DID analysis.

    Approach 2. In other contexts, there is no obvious point at which treatment "would have" been adopted, so approach 1 is not feasible. Even in your context, some legislatures may never have even taken up the question. So another approach is to create matched pairs. You identify attributes of the various states that are relevant to the outcomes you are looking at, and then create matched pairs based on those variables. The pre-post variable in each control state in each year is then set to the same values as those of its matched case.

    Approach 3. This is really a variant of Approach 2. If suitable variables for defining matched pairs are not available, treatment and control matched pairs are created at random. Again, the pre-post variable in a control state in any year is set equal to the value in that year of its matched case.

    Once you have defined pre_post for all treatment and control states, then you can proceed to the actual analysis. Your life will be made easier if you use factor variable notation, rather than calculating a product variable for your interaction. See -help fvvarlist-. Then you can do:

    Code:
    regress outcome i.treatment_group##i.pre_post other_covariates
    margins treatment_group#pre_post
    margins treatment_group, dydx(pre_post)
    The -margins- commands (see http://www.stata-journal.com/sjpdf.h...iclenum=st0260 for the best introduction to the -margins- command) will the show you the expected outcomes in treatment and control states before and after intervention, and the marginal effect of the pre-post transition in each group. That's the "pay dirt" of the analysis.

    Comment

    Working...
    X