Hello all!
I am new to Statalist. I appreciate any and all help.
I have a very specific modeling question. I am evaluating a police department policy in New York City. I have crime data, by month, over a five year period (Jan-2012, Feb-2012, Mar-2012, ..., Nov-2016, Dec-2016, etc.) for all precincts in the city.
The program was only implemented in specific precincts, and only went into effect during the summer months in a few of these years. For example, in 2015, the program goes into effect in the summer (e.g., June, July, and August), then ends immediately after. Then, it goes back into effect the summer of the following year in 2016 (e.g., May, June, and July), then ends again. Now, I could use the archetypical difference-in-differences (DD) model and run separate regressions by year. This model amounts to the following:
y_pt = b_0 + b_1 Treatment_p + b_2 Post_t + b_3 (Treatment_p * Post_t) + e_pt
where y_pt is the crime rate in precinct p and month t. The variable Treatment_p is dummy indexing treated precincts (e.g., 20 precincts comprise the treatment group, and the remaining 50 or so precincts comprise the control group). The variable Post_t indexes the summer months in both treatment and control groups (e.g., dummy equal to 1 for Jun, Jul, and Aug, 0 otherwise). Interacting the two dummies gives us an estimate of b_3, the treatment effect for that year. It is worth noting that I estimate models separately for each year. By doing this, I standardize the post-treatment period. In this setting, I would only be comparing the several months before the intervention (e.g., Jan, Feb, Mar, Apr, and May) with the three months when the intervention is in place (Jun, Jul, and Aug) in each year.
However, I want to exploit more of the variation across time. Modeling this is somewhat complicated as the timing of the intervention varies a little depending on the year. For example, in 2015, the intervention runs from Jun-Aug; then in 2016, it runs from May-Jul. In addition, there are also different precincts receiving the program depending on the year, though most of the precincts receiving the intervention do stay the same. Then I noticed I could use the more "general" DD approach popularized by Bertrand et al. (2004):
Outcome_pt = Group Fixed Effects + Time Fixed Effects + delta*Policy_pt + e_pt
This involves including a full set of “precinct” effects (dummies for each precinct), a full set of “year” effects (dummies for each year), and a dummy for when the policy was actually in effect (Policy_pt). If all assumptions are met, then the variable Policy_pt would "turn on" in Jul-2015, Jul-2015, and Aug-2015 (first wave of the intervention), then off, then back on again in May-2016, Jun-2016, and Jul-2016 (second wave of the intervention, and so on).
My main question: Is the “generalized” DD model amendable to a “policy dummy” that turns on and off over the full 'month-year' panel data series? Or, when a program/policy variable is turned on (Policy_pt = 1), must it stay turned on for the rest of the panel series (i.e., program in effect) for the “generalized” DD to work?
Also, is the inclusion of year effects (i.e., year dummies) appropriate in this context? The intervention is only in effect during the summer months in each year, and so I wonder if year fixed effects is appropriate, since the intervention is only going to vary over specific months in a given year.
And finally, some papers employing the basic DD approach include a “pre-period” mean of the outcome variable on the right-hand side of the basic DD model. They argue that this “controls” for regression to the mean. Konda et al. (2016) used this in their paper investigating the effects of vacant lot 'greening' on crime. This could be useful for my study due to the cyclical crime patterns observed in the data.
Anyway, I know that was a lot. Please let me know if I have been unclear!
Thank you in advance!
Respectfully,
Tom
I am new to Statalist. I appreciate any and all help.
I have a very specific modeling question. I am evaluating a police department policy in New York City. I have crime data, by month, over a five year period (Jan-2012, Feb-2012, Mar-2012, ..., Nov-2016, Dec-2016, etc.) for all precincts in the city.
The program was only implemented in specific precincts, and only went into effect during the summer months in a few of these years. For example, in 2015, the program goes into effect in the summer (e.g., June, July, and August), then ends immediately after. Then, it goes back into effect the summer of the following year in 2016 (e.g., May, June, and July), then ends again. Now, I could use the archetypical difference-in-differences (DD) model and run separate regressions by year. This model amounts to the following:
y_pt = b_0 + b_1 Treatment_p + b_2 Post_t + b_3 (Treatment_p * Post_t) + e_pt
where y_pt is the crime rate in precinct p and month t. The variable Treatment_p is dummy indexing treated precincts (e.g., 20 precincts comprise the treatment group, and the remaining 50 or so precincts comprise the control group). The variable Post_t indexes the summer months in both treatment and control groups (e.g., dummy equal to 1 for Jun, Jul, and Aug, 0 otherwise). Interacting the two dummies gives us an estimate of b_3, the treatment effect for that year. It is worth noting that I estimate models separately for each year. By doing this, I standardize the post-treatment period. In this setting, I would only be comparing the several months before the intervention (e.g., Jan, Feb, Mar, Apr, and May) with the three months when the intervention is in place (Jun, Jul, and Aug) in each year.
However, I want to exploit more of the variation across time. Modeling this is somewhat complicated as the timing of the intervention varies a little depending on the year. For example, in 2015, the intervention runs from Jun-Aug; then in 2016, it runs from May-Jul. In addition, there are also different precincts receiving the program depending on the year, though most of the precincts receiving the intervention do stay the same. Then I noticed I could use the more "general" DD approach popularized by Bertrand et al. (2004):
Outcome_pt = Group Fixed Effects + Time Fixed Effects + delta*Policy_pt + e_pt
This involves including a full set of “precinct” effects (dummies for each precinct), a full set of “year” effects (dummies for each year), and a dummy for when the policy was actually in effect (Policy_pt). If all assumptions are met, then the variable Policy_pt would "turn on" in Jul-2015, Jul-2015, and Aug-2015 (first wave of the intervention), then off, then back on again in May-2016, Jun-2016, and Jul-2016 (second wave of the intervention, and so on).
My main question: Is the “generalized” DD model amendable to a “policy dummy” that turns on and off over the full 'month-year' panel data series? Or, when a program/policy variable is turned on (Policy_pt = 1), must it stay turned on for the rest of the panel series (i.e., program in effect) for the “generalized” DD to work?
Also, is the inclusion of year effects (i.e., year dummies) appropriate in this context? The intervention is only in effect during the summer months in each year, and so I wonder if year fixed effects is appropriate, since the intervention is only going to vary over specific months in a given year.
And finally, some papers employing the basic DD approach include a “pre-period” mean of the outcome variable on the right-hand side of the basic DD model. They argue that this “controls” for regression to the mean. Konda et al. (2016) used this in their paper investigating the effects of vacant lot 'greening' on crime. This could be useful for my study due to the cyclical crime patterns observed in the data.
Anyway, I know that was a lot. Please let me know if I have been unclear!
Thank you in advance!
Respectfully,
Tom
Comment