Using Difference-in-Difference to identify the effect of a continuous treatment

Juan Xi

Join Date: Jan 2018

Posts: 10
#1

Using Difference-in-Difference to identify the effect of a continuous treatment

30 Jan 2018, 11:16

I have a panel data of 40+ sub-regions in the U.S., for 30 periods. In total there are 1,000 individual hotels in the 40+ sub-regions. The treatment is the intensity (or density) of a special type of car rental service in that sub-region. I'm trying to estimate the treatment effect on the hotel performance (sales, customer ratings etc.).
Here are some contexts:
1. the special type of the service was introduced in the 14 period, for all of the sub-regions.
2. in each sub-region, the number and the intensity of the service may vary. That is, there may be sufficient service in some regions, but only a little service in some other regions.
3. the treatment intensity is continuous
4. across the periods, the treatment intensity in a sub-region may vary. For example, the intensity in sub-region A could be 2.3 in period 16 while 2.5 in period 20.
5. very a few (e.g., 1 or 2) sub-regions were lack of the service. That is, almost all regions are 'treated'.

Now let's first not worried about the endougeneity of the treatment. Is it possible to identify such a treatment effect in a Difference-in-Difference framework? I know I'm facing challenges because:
1) the treatment is continuous.
2) basically you can say there is no 'control' group.

But maybe I can explore the variation in the treatment intensity and somehow relate the changes in treatment intensity to the changes in the outcome? What I'm doing now is:

Code:

areg Y_ijt= A*X_ijt + C_t+ D*(After*TreatmentIntensity_jt) , absorb ( subregion)

where X_ijt is a set of time-varying covariates. C_t is the time- fixed effects. Y_ijt is the outcome for hotel i in sub-region j in period t. After is a binary variable which is 1 if the period is after the 14th period (i.e., after the introduction of the service treatment). TreatmentIntensity_jt is a continuous variable indicating the intensity of the car service in sub-region j in period t (hence for all regions, this variable is simply zero in all periods prior to the 14th period). The coefficient of interest is D.

Would the equation above make sense in estimating the 'treatment effect'? I realized this seems to be just a fixed effect regression, not like a typical DiD framework. But can I justify that the coefficient D identifies the effect of having a certain intensity of the car service on the hotel performance in that region?

Sorry for such a long post. I was trying to provide more details about the context of the problem and about my concerns. I'm grateful to any help or comments!

Last edited by Juan Xi; 30 Jan 2018, 11:18. Reason: difference-in-difference, continuous treatment
Tags: continuous treatment, difference-in-difference, panel data
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

30 Jan 2018, 15:14

It's not a classical DID model, but it falls within the rubric of a generalized DID model. See https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf for a very clear and interesting introduction.

Just be sure to use factor variable notation when you run this so that you can use -margins- afterwards to calculate adjusted predicted outcomes in various circumstances. See -help fvvarlist- if you are not familiar with factor-variable notation. See the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf if you are not familiar with the -margins- command.
Comment
Juan Xi

Join Date: Jan 2018

Posts: 10
#3

31 Jan 2018, 14:45

Originally posted by Clyde Schechter View Post

It's not a classical DID model, but it falls within the rubric of a generalized DID model. See https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf for a very clear and interesting introduction.

Just be sure to use factor variable notation when you run this so that you can use -margins- afterwards to calculate adjusted predicted outcomes in various circumstances. See -help fvvarlist- if you are not familiar with factor-variable notation. See the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf if you are not familiar with the -margins- command.

Thank you Clyde! I've read the reference slides, very helpful and seems it fits my context well. However here is a question I'm little confused/concerned, regarding the test on the pre-treatment trends.
So ideally, we want to test the assumption of 'common trends' in pre-treatment periods. In a classical DID model, this is very clear--we can just plot the outcome over time for the treated and for the control group. But how do we test this assumption in a generalized DID model, where all groups are receiving treatments, with different intensity though. What I can think of is:
1) for each of the group, plot the average outcome over time and see whether the outcome curves in pre-treatment periods are 'parallel'.
2) in a regression model, we can interact group dummy with the leads of periods dummy. For example, include the following interactions in the regression

Code:

a_1j*preTreatment_-1*group_j+ a_2j*preTreatment_-2*group_j+ ....

where preTreatment_-1=1 if current period is 1 period ahead of the treatment period, preTreatment_-2=1 if current period is 2 periods ahead of the treatment period, and so on. Hence the coefficients a_1j and a2j identifies the 'trend' in the pre-treatment periods in group j. Ideally, if the DID is valid, then we expect the effect did not begin prior to the treatment. That is, we expect a_1j and a2jare insignificant from zeros.

Would either of the above 'tests' make sense, to validate the DID model in such a generalized setting? Thank you for your comments!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

31 Jan 2018, 15:17

I think both approaches are reasonable. For 1) you have to be a bit careful how you do this, because with different times of onset for treatment in different entities, a curve for the treatment group at some point becomes a mixture of treated and untreated entities that you cannot sort out easily. So for this, it probably makes sense to graph outcome vs time for the entire control group using all available data. But in the treatment group, graph outcome vs time only for those time points where treatment has not yet begun for the given entity. With regard to 2), my only caveat is that you need some theory about how the treatment works and when its effects begin to be felt. If the onset of treatment effect is substantially delayed, just looking at the first couple of leads won't get you there. And if the treatment effects are not sustained over time, you may see them with the early leads but not later--that wouldn't negate the existence of the (transient) effect earlier.
1 like
Comment
Juan Xi

Join Date: Jan 2018

Posts: 10
#5

31 Jan 2018, 18:21

Originally posted by Clyde Schechter View Post

For 1) you have to be a bit careful how you do this, because with different times of onset for treatment in different entities, a curve for the treatment group at some point becomes a mixture of treated and untreated entities that you cannot sort out easily. So for this, it probably makes sense to graph outcome vs time for the entire control group using all available data. But in the treatment group, graph outcome vs time only for those time points where treatment has not yet begun for the given entity.

Thank you Clyde. You're right that I need to be careful with the timing of the treatment in each group.
Just to be sure that I'm implementing the test in the correct way. Let's say in a simple case, the treatment (for example, the tax increase in each state) occurred simultaneously for all states. Suppose we have 50 groups, 10 possible treatment levels (the treatment intensity is 0,1,2,...9). Then which of the following do we plot?
1) for each of the treatment level, we plot the average outcome for the groups associated with that treatment intensity. So, we'll have 10 curves
2) for each of the group, we plot the outcome. So, we'll have 50 curves.

If 1) is the the correct way, then I assume this approach is impracticable for a continuous treatment case? Because a continuous treatment means infinite levels of treatment. Unless we discretize the treatment.

Originally posted by Clyde Schechter View Post

With regard to 2), my only caveat is that you need some theory about how the treatment works and when its effects begin to be felt. If the onset of treatment effect is substantially delayed, just looking at the first couple of leads won't get you there. And if the treatment effects are not sustained over time, you may see them with the early leads but not later--that wouldn't negate the existence of the (transient) effect earlier.

I guess this approach might be more feasible in my context. To confirm that my understanding is correct: did you mean that to make my test solid, I need to make it clear the mechanism of the timing of treatment effect? For example, if the effect needs time to pick up (something like the delayed effect of adopting technologies), then 'no effect' in the first few leads doesn't mean there was no pre-treatment trends. Because the trend could have 'started', just it will take a few periods to pick up the effect.

Thank you very much!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

31 Jan 2018, 18:32

[quoqte]I guess this approach might be more feasible in my context. To confirm that my understanding is correct: did you mean that to make my test solid, I need to make it clear the mechanism of the timing of treatment effect? For example, if the effect needs time to pick up (something like the delayed effect of adopting technologies), then 'no effect' in the first few leads doesn't mean there was no pre-treatment trends. Because the trend could have 'started', just it will take a few periods to pick up the effect.[/quote]
Yes, that's exactly what I meant.

If 1) is the the correct way, then I assume this approach is impracticable for a continuous treatment case? Because a continuous treatment means infinite levels of treatment. Unless we discretize the treatment.

Yes, for this purpose (and only for this purpose), I would discretize the treatment. It is a necessary evil here.
Comment
Juan Xi

Join Date: Jan 2018

Posts: 10
#7

31 Jan 2018, 18:49

Originally posted by Clyde Schechter View Post

[quoqte]I guess this approach might be more feasible in my context. To confirm that my understanding is correct: did you mean that to make my test solid, I need to make it clear the mechanism of the timing of treatment effect? For example, if the effect needs time to pick up (something like the delayed effect of adopting technologies), then 'no effect' in the first few leads doesn't mean there was no pre-treatment trends. Because the trend could have 'started', just it will take a few periods to pick up the effect.

Yes, that's exactly what I meant.

Yes, for this purpose (and only for this purpose), I would discretize the treatment. It is a necessary evil here.[/QUOTE]

Thank you so much! It's clear now!
Comment
Anton Ivanov

Join Date: Sep 2014

Posts: 267
#8

18 Mar 2021, 13:25

Originally posted by Clyde Schechter View Post

It's not a classical DID model, but it falls within the rubric of a generalized DID model. See https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf for a very clear and interesting introduction.

Hello! I would like to learn more about the generalized DID model; however, the suggested link seems to be non-existent anymore. I would appreciate if anyone could share an alternative source for this PDF.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

18 Mar 2021, 14:06

You are correct that that link no longer exists. I have not been able to find that same paper on line. However, I have since discovered https://www.annualreviews.org/doi/pd...-040617-013507. The link is still active, and, from the feedback I have gotten, it is an even better source to learn from.
3 likes
Comment
Anton Ivanov

Join Date: Sep 2014

Posts: 267
#10

18 Mar 2021, 14:35

Clyde Schechter I am very thankful your reply and the suggested source.
Comment
Emily Rata

Join Date: Apr 2021

Posts: 5
#11

01 Sep 2021, 11:18

Clyde Schechter I followed this thread with such attention, as I am trying to employ a Diff in Diff strategy just like Juan. I was wondering about the correct interpretation of the coefficient resulting from the interaction term (time * treatment). Since, as Juan says, almost all units get treated, can we say that in this case the ATET (which is recovered by canonical DD settings) converges to the ATE?
As I said, I have a very similar case to Juan, where all units get treated, but with different intensity.
Comment
Sandra Nevoux

Join Date: Mar 2022

Posts: 1
#12

04 Mar 2022, 09:07

Hi @Clyde Schechter,

Thank you very much for this thread.

After reading this thread and the resources you indicated, I still have additional questions.

My empirical setting is the following:

1. Sales are introduced for specific articles during a given week.

2. The treatment "sales" is introduced:
- three times.
- at the same moment for all the articles.
- with a varying intensity, but in a discrete way, from one article to the other.
- for each article, this treatment increases in intensity in the following way: it is first introduced during a given week, remains at this level over a given number of weeks, then increases during a given week, remains at this level over a given number of weeks and then increases again during a given week and then remains at this level over a given number of weeks.

3. Some exceptions exist:
- Sales can occur before the official sales period for some articles.
- Some articles don't experience price reduction during the official sales period (because the brand aims at continuing selling them after this sales period, within the "new collection").

My research question is the following: I want to estimate the impact of each treatment intensity on the number of units sold for each article using a DiD setting. Moreover, in my regression:
- I would like to take into account the effect of each week (not only the global period over which a new more intense treatment is introduced, but also each week within this period), how should I include these time effects ?
- Articles might differ from one another along specific dimensions that might have an impact on the treatment and/or the outcome. These specific dimensions might be group-variant, time-variant or both group-variant and time-variant. All these dimensions are observable and hence could be included in the regression in order to overcome the endogeneity issue. I wonder which dimensions (group-variant, time-variant or both group-variant and time-variant) I should include or not in the regression and in which way (simple inclusion, interaction term with group effect, interaction term with time effect and/or interaction term with treatment effect, ... ).
- I wonder how I should treat the above-mentioned exceptions:
-> regarding the sales occuring outside the official sales period, should I completely remove from the regression the articles associated with such sales or only remove the lines of them associated with such sales or not do anything at all ?
-> regarding the articles that are not subject to sales during the official sales period, should I completely remove those articles from the regression (hence being left without any control group - since these articles are specific) ?

Hence, I wonder what would be the best way to run the regression.

Note that I have approximately 25 weeks and a large number of articles.

Sorry for this long message, I hope you will find it clear and thanks in advance for your help on this matter.

Best,

Sandra
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#13

04 Mar 2022, 14:23

Well, you've raised several questions here. And most of them are really substantive questions that require a knowledge of marketing and even of the specific products and sales venues to answer. So I can only give you some general statistical advice. You will really need to consult a colleague who is knowledgeable in the substance here.

If you have no control group, you have no DID study. You can still analyze treat vs no treat outcomes, but the methodological claim of estimating causal effect is gone. You then have to fall back on having enough covariates to take care of confounding. Of course, since a DID is not a true experiment, it can also be the case that a bad control group ruins the study. That is why the parallel trends analysis is so important: you have to at least show that the treatment and control groups were on parallel trajectories before treatment occurred. But remember that this is a necessary, but not sufficient, condition for good causal effect estimation. (To take a ridiculous example, if we were studying the effect of a treatment for infertility on a population of infertile women, using men as controls would probably satisfy the parallel trends condition, but clearly would not be an adequate control group.)

The covariates that you need to include should not be selected by whether they are time-invariant or group-invariant. Rather you should draw a diagram with all the variables in your data and draw arrows indicating plausible causal relationships. Any variable that has causal arrows directed into both the sales intervention and the amount sold outcome is a confounder, and omitting it from the model will leave your results vulnerable to omitted variable (confounding) bias. You need to have enough observations in your data (your description sounds like you have a lot, but if the number of covariates needed is large, even a pretty large data set can get overwhelmed. You need to have, as a rule of thumb, 30 observations per covariate, and really 50 or more would be better. Remember also that a discrete covariate with n >= 2 levels counts as n-1 covariates for this purpose.

While we are on the subject of variable selection, although I doubt you will encounter this problem with the kind of variables you are working with, if your arrow diagram contains any cycles (a series of arrows leading from some variable through some others and then back to itself) you are in a situation where causal estimation is essentially impossible. Another less perilous situation that is sometimes seen, though, again, it seems unlikely to occur with the kind of variables you have described, as a variable that arrows pointing into it from both the sales intervention and the amount sold outcome. Such a variable is known as a collider, and you must be sure to exclude it from the covariates in the model.

Variables that have no arrows into either the sales intervention or the amount sold outcome should be omitted from the model because all they will do is introduce noise.

Variables that have an arrow pointing into the amount sold outcome, but not the sales intervention, are not confounders. Including them in your model is optional. If you are running close to the limits on the ratio of data observations to covariates, then don't put them in. If you have room to spare, however, including them may reduce residual variance in the model thereby somewhat increasing your power.

Variables that have an arrow pointing into the sales intervention but not the amount sold outcome can be ignored and should not be included in the model.

As for whether to include these variables by themselves, or with interaction terms with the sales intervention depends on your substantive knowledge of what the effects of these variables are likely to be. If you believe that the sales intervention will work differently depending on the values of these covariates, then you should include an interaction term to properly reflect that. If, however, you do not believe the variable modifies the impact of the sales intervention on amount sold outcome, but just changes the level of the amount sold outcome overall, then you should not use an interaction term. Be aware that every interaction term you include adds a whole bunch of variables into the model, and can quickly exhaust your data. So use them sparingly. Even if you think a variable will modify the effect of the sales intervention, then if that modification will be relatively small, you might want to omit it just to avoid overcrowding your model with more variables than the data can support. Similar considerations apply to interactions with time. If you believe that the effect of a covariate on the outcome will be different at different time periods, then you would need an interaction with time to capture that. But the same cautions about too many variables apply. These are all judgment calls, and often have to rely on impressionistic, rather than data-based, beliefs about the real world generating process. And those beliefs are, as I noted at the start of this post, the domain of people in marketing, with statisticians playing only a supporting role.
Comment

Announcement

Using Difference-in-Difference to identify the effect of a continuous treatment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment