Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Callaway & Sant'Anna (2021) DiD

    Hello,

    I am new to Stata and econometric analysis, and I am conducting a Difference-in-Differences (DiD) analysis for my research. I have a dataset with multiple treatment periods and groups:
    • 8 groups (cities) in total:
      • 3 groups never receive treatment.
      • 5 groups receive treatment at different times. Three of these treatment groups receive treatment in the same year, while the other two receive treatment in different years.
    • I have created the following variables already:
      • Treated: 0 for control and 1 for treatment groups
      • Treatment: 0 for all control groups, 0 for all pre-treatment periods of the treatment group and 1 for the year in which treatment is implemented (remains 1 in the following time periods)
      • group_id: groups the year and the specific group (city)
      • unit_id: identifies the unique units within each group
    I have multiple units per group observed over the study period, two outcome variables, and several covariates.

    I have cleaned the data and am ready to proceed with the main analysis, but I need guidance on the following:
    1. Model choice: Should I use the csdid or xthdidregress command for this setup?
    2. Parallel trends:
      • Are there any additional variables I need to create?
      • Do I need to test for the parallel trends assumption before running the main analysis?
      • If so, how should I test this in Stata?
      • If the assumption doesn't hold, do I need to restructure the data to ensure parallel trends, or does the model automatically adjust for it?
    I would greatly appreciate a short step-by-step explanation or any resources that could help me better understand and implement this analysis in Stata.

    Thank you so much for your help!

  • #2
    There are multiple staggered DiD methods, e.g. the extended two-way fixed effects of Wooldridge (2021), which I can only recommend.

    You will need to argue that treatment timing is random. No persistence or spillovers in treatment will also be important, see the SUTVA assumptions.

    You can try running placebo regressions, and eyeing parallel trends graphs in outcomes. Other than that, you must argue that treatment timing is random. See Roth (2022) for limitations in parallel trends tests.

    If parallel trends do not hold, your analysis will be biased. There are methods to try to address this, e.g. conditioning on pre-determined covariates to relax the unconditional parallel trends assumption, making it conditional, the inclusion of fixed effects, etc. I do not know if a method such as synthetic DiD works for staggered DiD...

    But ultimately, there is no magic fix: if trends are not parallel, DiD is biased and results are not causal. Bias increases in the degree of breakage of the PTA (see Ham and Miratrix, 2022, on arxiv).

    Comment


    • #3
      Based on your description, I would assume that your data is from a repeated cross-sectional design. Although I haven't used csdid with repeated cross-sectional data before (I typically work with panel data), I believe it should still be feasible to proceed with an approach like the one outlined below.

      You may have data spanning multiple years (variable: year), but the individuals (variable: id) differ across years. Each individual is associated with a city (variable: city), and through city, you can link each observation to the year when the treatment began (variable: year_start).

      In this case, since different cities started receiving treatment in different years, this represents a staggered DID design. Moreover, in csdid, the group variable refers to the treatment time group (i.e., year_start rather than city).

      Code:
      // year_start must be set to 0 for the never-treated group
      assert (year_start==0) if Treated == 0
      assert (year_start> 0) if Treated == 1
      
      // csdid for repeated cross-section
      csdid y covariates1 covariates2, time(year) gvar(year_start) cluster(city) long2 
      
      // ATT
      estat simple
      
      // event study (Pre-treatment estimates are used to check for parallel trends)
      estat event

      Comment

      Working...
      X