Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Falsification test for difference and difference with panel data

    Dears,
    I have a question about model specification that I wonder if people can share their thoughts about it (by the way, apologies if this is not the place to ask this question, in that case I'll appreciate if someone can suggest me a more appropriate forum). I am estimating a difference and difference analysis using a panel data of schools. The treatment is the construction of water and sanitation project in rural districts. I have data from 2012 to 2016. However the issue with my setting is that, each school could receive treatment at different time periods and they can receive treatment many times (it is possible that many water projects could be constructed in the same district). My baseline regression is this

    Y,ist = alfa,t + alfa,s + alfa,i + beta*Ds,t + epsilon,ist

    Where "s" denote district , "i" denota individual and "t" year. The variable Ds,t equals one if there have been finished at least one project in district "s" in year "t"

    I would like to do a falsification test to show that my results are robust. However, I wonder if I can use lead values of the treatments as long as once they have received the treatment the project remains and later on they could benefit from the project. I also have get data from 2008 to 2011. Should I estimate a regression using only data from this period and using the lead of the treatment? or I should consider the whole sample data and the leads of the treatment?.
    Well, any comments about this will be appreciated!
    Thanks,
    Diego


  • #2
    I might be tempted to set up separate variables for first, second, third, etc. treatments. Then you can do standard DID with multiple treatments. That would also allow different effects as you add more treatments.

    Comment


    • #3
      Diego,

      What is alfa,i in your specification? You have district and year fixed effects and your main policy variable, which is sufficient. I assume it is a vector of 'individual-level' controls. As for your main concern, there are many ways to demonstrate the robustness of your results. You could incorporate a lead(s) or a lag(s) of your treatment variable (excluding your 'immediate' policy variable) and see how your results change when you substitute different time configurations into your model. But be careful, because there may be a theoretical justification for investigating possible anticipatory/persistent effects. But that's context-specific.

      Another approach would be to use a 'counterfeit' treatment group. There are many ways to do this. Let's say you want to assess differences in aggregate health outcomes for students in a subset of schools receiving the treatment (i.e., sanitation project); treatment begins in 2012 and you have data on years from 2010-2016. You estimate your model and observe positive health outcomes for treated schools. Now let's say you obtain significantly more pre-treatment data. You could run your model again and define a 'pre-post' period during years when the treatment/intervention was absent. Obviously, you shouldn't observe program effects for schools in years when treatment was not considered. But, as indicated earlier, treatment timing is staggered (the 'post' period is not well-defined in your setting). Is there a grouping of schools that begin treatment at the same time?

      Comment

      Working...
      X