Difference-in-differences model with different numbers of pre- and post-treatment observations

Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#16

26 May 2016, 13:14

Sorry for the late reply. I was at a conference for the last couple days.

You should not drop any of the dummies without replacing them or without removing all dummies of the same kind completely. Otherwise you would not have a clear reference category. In other words, your reference category would be a mixture of two countries and/or sectors. If you include the dummies for countries and/or sectors depends on your theory behind the estimation. Are different countries and/or sectors expected to confound the effect you'd like to estimate.Since most firms will probably stay in the same country and industry you can think about getting rid of the dummies at all. Other (time-variant) covariates (like firm size) may be more important, but again, this depends on your theory. You may also consider to include dummies for each year (or year groups like decades) if your observations come from different years (maybe the leverage ratio was generally lower in the early 1990s than it is now). You should not just plug different variables into your estimation equation until the desired result comes up. In any case, however, your results so far are not country specific. For that you could either estimate the model for each country separately or to include (multiple) interaction terms between the diff-in-diff variable (which is an interaction already) and the country dummies. Given your sample size, I would not pursue this strategy (just like you planned anyway).

For my thesis (and in general), I am more interested in the time effects; I suspect that a firm's leverage ratio needs some time to develop, and it would be interesting if I could prove that. E.g.: in t+1 leverage decreased with 8%, in t+2 with 6%, and in t+3 with 2%, something like that. I think what I'm missing still is a way to account for changes per year.

You can estimate the effects for multiple time periods by either estimating the model just for the baseline and the desired point of time in the follow-up period, or you can also use the approach that uses additional interaction terms.This time, you would interact your diff-in-diff variable with an additional dummy that indicates at which point of time in the follow-up period we made this observation. As far as I can see, the -diff- command is not able to do this. Hence, you have to set up your own regression command (I described in #8 how to do so). I don't know how you identify the time of the observations besides your d_time3 dummy. Therefore, I expect you have a year variable. I also expect hat you have a observation for each firm in each year in your dataset. If this is not the case, the syntax below won't work. You should check it anyway. Another assumption is that you only have 3 periods after the treatment. If you have more you need to apply the approach for the additional periods as well. The reference category is t+1 (i.e. the coefficient of diff_in_diff3 shows you the effect in the first treatment period).

Code:

bysort id d_time3 (year): egen tplus = _n if d_time3 ==1 // This command generates a variable that shows if the observation refers to t+1, t+2 and so on (only works if you have a observation for each year after the treatment for each firm - otherwise this command could count a period which is actually t+2 as t+1 if t+1 is not in your dataset) tab tplus, gen(tplus_) // This command generates dummies indicating if the observation comes from t+1, t+2 and so on gen diff_in_diff3_t2 = diff_in_diff3 * tplus_2 // Now we generate the new interaction terms (which are actually three-way interactions since diff_in_diff3 is already a two-way interaction) gen diff_in_dif3_t3 = diff_in_diff3 * tplus_3 reg lvg d_cl d_time3 tplus_2 tplus_3 diff_in_diff3 diff_in_diff3_t2 diff_in_dif3_t3 if d_yr1_2==1, vce(cluster id)

Sebastian, what do you mean with 'expected values for the reference group'? If the control group has a value of -.034 pre-treatment, we can conlude that it has a 3.4% lower leverage than ...? The pre-treatment treatment group? The post-treatment control group? Something else?

Actually it means that the leverage ratio is expected to be negative for this country/sector combination. Clearly, it does not make much sense from a theoretical standpoint. However, this an OLS regression is not restrained to certain values, results of this kind can come up, especially if your categories have very few observations. If you don't use the country and/or sector dummies, there should not be any impossible values for the expected means.

Last edited by Sebastian Geiger; 26 May 2016, 13:23.
Comment
Oscar Jones

Join Date: May 2016

Posts: 10
#17

30 May 2016, 01:24

Hi Sebastian,

Thanks for your reply and elaboration!

First, I have ran several DiD regressions on different time periods to assess year-on-year changes around the treatment period to see when the treatment effect starts to happen, and how many years leverage ratio needs to fully absorb all effects of the cross-listing, so that solves my first issue you comment on in post #16.

Second, you are right, the inclusion or exclusion of (dummy) variables should be based on what makes sense, not what generates the highest significance. For my case, as you say, the leverage ratios of firms from different industries / countries are influenced in different ways, so it makes more sense to compare firms within the same country + industry. I'll just have to deal with the negative values then. In any case, the pre- and post-treatment differences and the DiD estimator are not affected, so I'll base my interpretation on those values.

Again, many thanks Sebastian, and also to Steve and Clyde for helping me out!

Regards,

Oscar
Comment
Sugandha Huria

Join Date: Mar 2017

Posts: 46
#18

15 Jul 2018, 03:24

How is it that the number of treated groups more after the treatment than before?
Comment
Eben Kreuger

Join Date: Dec 2019

Posts: 12
#19

02 Jan 2020, 08:27

Hi all, I realize this is an older thread but I think this question is most relevant here. How could you change the base year in the DID specification while controlling for pre-treatment years (e.g., up to five years before treatement, t-5)? The goal is to see how the outcome variable changes in t+1, t+2, t+3, and t+4, but all relative to t-1 (not back to, say, t-5). Thanks for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#20

02 Jan 2020, 12:23

Well, I suppose you could include the t-2 through t-5 outcomes as covariates in the regression model, rather than having them as separate observations.

But it seems misguided. You need all those extra years to explore parallel trends; and without substantiation of the parallel trends assumption your care for identifying causal effect by DID is weak, to say the least. So why do you want to do this? If there is something radically different about year t-1 from those other pre-exposure/intervention years, then perhaps they shouldn't be in the analysis at all (although that leaves you with an analysis based on only one pre-exposure/intervention period, which is really not adequate for claiming that the DID estimate is causal.) If those other years are not so radically different, why don't you want to include them in the usual way?
Comment
Nur Jahan

Join Date: Mar 2019

Posts: 20
#21

16 Sep 2021, 01:05

Originally posted by Clyde Schechter View Post

The number of pre- and post-treatment observations does not need to be the same. It doesn't matter. In general, when doing comparisons of groups with different numbers of observations (whether over time or number of people), the "effective sample size" (in terms of statistical power) is closer to that of the smaller group than the larger. (It's the harmonic mean, actually.) But looking at your output, the magnitude of the effect you are focusing on appears to be very, very small, and your sample size is respectable for finding effects that are large enough to matter practically. So I don't think you have a statistical power issue here. I think the effect you hoped to find is just much smaller than you imagined.

Hello Clyde, I have a similar issue. In fact, I have a large discrepancy in the number of observation in the before and after of the treatment group (treated before was 368 then after it became 19).
Will this bias my results? Thank you in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#22

16 Sep 2021, 10:17

No, it does not bias the results. It does, however, reduce your statistical power. In particular, the number of observations in the treated group after treatment is extremely small and leaves you with a less imprecise estimate of the treatment effect.
Comment
Nur Jahan

Join Date: Mar 2019

Posts: 20
#23

18 Sep 2021, 03:37

Originally posted by Clyde Schechter View Post

No, it does not bias the results. It does, however, reduce your statistical power. In particular, the number of observations in the treated group after treatment is extremely small and leaves you with a less imprecise estimate of the treatment effect.

Thank you very much for these precisions.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment