Difference-in-difference with multiple periods

Guest
#1

Difference-in-difference with multiple periods

12 Apr 2021, 04:34

Hi,

I want to set up a difference-in-difference regression to analyze the effect of top 20% ESG-rated firms on stock performance in different time-frames during the COVID-19 pandemic. However, I am new at regression models and Stata and hence have some questions.

I try to set up an extended regression-function based on a model used in Albuqerque et. al (2020). This is the article https://papers.ssrn.com/sol3/papers....act_id=3583611

Thus, I want to set up the following regression:
𝑃𝑖𝑡=𝛾𝑖+𝜆𝑡+𝑃reCrash𝑡+Crash𝑡+𝑃ostCrash𝑡+𝛿1(𝑇𝑖×𝑃reCra sh𝑡)+𝛿2(𝑇𝑖×Crash𝑡)+𝛿3(𝑇𝑖×PostCrash𝑡)+𝜖𝑖𝑡,
where stock performance (return) is observed for firm 𝑖 on day 𝑡 from January 2020 till April 2020. The parameters 𝛾𝑖 and 𝜆𝑡 denote fixed effects for firms and days, respectively. 𝑇𝑖 is a treatment dummy which equals 1 for firm 𝑖 if it is top 20% ESG firm, 0 otherwise.

I have the following questions:

1) How do you set up the regression in Stata? Right now I have made columns indicating for each of my three periods: "Pre", "Crash" & "Recovery" (i.e., if not included in specific period = 0, if included =1. These are not overlapping. Thereby, there will always only be one 1 horizontally). First, I have used xtset to set dates and company number. Second, I have defined each of the three difference-in-difference by for example "gen did0 = Top20_ESG*Pre". Next, I have used xtreg for return by applying the three different periods and the difference-in-differences: "xtreg Return Pre Crash Recovery Top20_ESG did0 did did1". Should Top20_ESG be included as an independent variable?
When I run the regression I get really low t-values (high p-values).

2) When I apply the fixed effect, I do it by adding ",fe" in the end. However, it says that "Top20_ESG omitted because of collinearity". Why does this happen? Should it be excluded? And does it automatically both apply it for the firm and day fixed effects?

3) Would it make sense to include control variables such as firm-specific measures, e.g., size, leverage etc.?

4) Could industry be applied/controlled for? So that the effect would not influence the results.

Thank you so much in advance! It will be really appreciated for any help on one or more of the above-stated questions.

Best,
Guest

Last edited by sladmin; 10 Jun 2021, 14:43. Reason: anonymize original poster
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

12 Apr 2021, 11:54

1) Too bad you wasted time and effort doing that. You can work with what you've done, but it's suboptimal and down the lines will probably hinder your efforts. So let's go back and do it the better way. Create a single variable that takes on values 0 = Pre-crash, 1 = Crash, 2 = Post-crash. It doesn't actually have to be 0/1/2, any three distinct non-negative integers will do, but 0/1/2 is probably easiest so let's assume those are the values you use. Let's call it era, just for reference in the rest of this discussion (you can call it whatever you want.) And just forget about those did* variables.

2) You need to -xtset- your data. Presumably you have some variable, I'll call it firm_id, that identifies distinct firms in your data, and another variable that gives time (year, quarter, whatever your unit of time is). So -xtset firm_id time-. You also presumably have some dependent variable that gives the stock performance. I'll just call that one stock_performance. Then your regression is:

Code:

xtreg stock_performance i.era##i.Top20_ESG i.time, fe

The i.era##i.Top20_ESG is Stata's factor-variable notation. Read -help fvvarlist- for details. In this instance Stata will interpret this as instructions to expand i.era into three indicator (aka dummy) variables, create an indicator variable for Top20_ESG and also create all their interaction terms, and include those in the model. the -fe- designation will tell Stata to incorporate firm level fixed effects (𝛾𝑖 in your equation). The Top20_ESG indicator will be dropped due to colinearity with the fixed-effects (just as you found it was with your original regression.) That's because a firm either is or is not Top20_ESG, and that is a time-invariant attribute of the firm. All time invariant attributes of the firm are automatically dropped in fixed-effects models. This is normal, expected, indeed, inevitable. Also, expect to see a couple of the time indicators omitted also, because there will be colinearity among the time indicators and the era indicators (since, I'm assuming, Pre-Crash, Crash, and Post-Crash refer to time periods.)

3) This is a substantive question, not a statistical one. Ask a person with knowledge of finance or economics. (I'm an epidemiologist.)

4) It already is adjusted for. Industry is a time-invariant attribute of the firm. One of the big advantages of fixed-effects models is that they automatically adjust for all time-invariant attributes of the firms--even if there is no data on them, and even if no data could in principle ever be found. So you don't need to worry about that--it's already done.

Now, you may want to consider using cluster robust estimation of the standard errors in your model. First make sure you have enough different firms to warrant doing that. There isn't a uniform consensus on the minimum number required for using -vce(robust)-, but I think everyone would agree that it's not appropriate with ten or fewer. Assuming you have enough firms, and if they are nested in industries, you might want to actually cluster the standard errors on industry instead of firm, i.e. -vce(industry)-. Again, this is only to be done if you have more than 10 industries (and some would say that many more than that are truly needed. So if you have a lot of firms but not so many industries, don't cluster your standard errors on industry--do it at the firm level. If you have only a small number of firms, well, don't expect much in the way of meaningful results anyway.

Last edited by Clyde Schechter; 12 Apr 2021, 11:58.
Comment
Guest
#3

13 Apr 2021, 00:04

Hi Clyde,

Thank you very much for your response. I have a few clarifying questions.
First, I can clarify that we have around 500 firms, leading to 42578 observations (rows).

1) When I run the model it says that Top20_ESG and Time are omitted because of collinearity. And the number of observations in the output is only 39 (compared to 42,578) - is this because it hasn't used the observations?

2) Would the fixed effect model be the same as using control variables with characteristics for the firm? (I know it would not be one-to-one, but would that get some of the effects?)

3) In general I get really high p-values for the difference-in-differences variables. Is there any way to improve these in the model?

4) You mention using cluster robust estimation of the standard errors which we would like to do. How would you recommend it in Stata? We have around 500 firms and 11 industries.

I have added a screenshot of the result so you can follow my thoughts.

Once again, thank you so much for your help!

Best,
Guest

Last edited by sladmin; 10 Jun 2021, 14:44. Reason: anonymize original poster
Comment
Guest
#4

13 Apr 2021, 00:13

And one more question, is it possible to also include a dummy for the bottom ESG firms, i.e., not comparing with the rest of the data (i.e., not top 20%) but with a specific group?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

13 Apr 2021, 11:30

1) When I run the model it says that Top20_ESG and Time are omitted because of collinearity. And the number of observations in the output is only 39 (compared to 42,578) - is this because it hasn't used the observations?

Top20_ESG is a time-invariant attribute of the firms, so it is colinear with the fixed effects and is omitted--this is normal and is not a problem. Notwithstanding the message that Time was omitted, if you look at the output you can see that it is, in fact, still there. The confusion arises because Top20_EST##Time already includes Time itself. That is precisely what the ## operator is about. But then your command also lists Time separately. It is the separate, redundant, occurrence of Time that was omitted. So this is not a problem either. As for the number of observations, it is not 39, it is 39,593: the line wraps in the output. If you widen your Results window or use -set linesize- to get a longer out put line length, you will avoid this kind of problem in the future. The difference between 39,593 and 42,578 will be attributable to observations that have missing values for one or more of the variables appearing in the regression: only observations with complete non-missing data on all model variables are included.

2) Would the fixed effect model be the same as using control variables with characteristics for the firm? (I know it would not be one-to-one, but would that get some of the effects?)

No, they would not be equivalent. And the latter would be an inferior approach in many settings. One of the real strengths of fixed-effects models is that the confounding (aka omitted variable bias) effects of all time-invariant firm attributes, even those which are unobserved or even unobservable in principle, are automatically adjusted for. Using covariates describing characteristics of the firms can never achieve that. Now, there may be other reasons why a random effects model might be preferable in your situation, in which case you lose that advantage and you can (partially) compensate for it by including firm-describing covariates.

3) In general I get really high p-values for the difference-in-differences variables. Is there any way to improve these in the model?

The results are what they are, and it is not science to change the model because they disappoint you. The model should be justified independently of the results you get from it. GIven that you have already seen the results, you cannot be unbiased in making these judgments. What you might do is consult with others in your discipline, describing both this model and other models that strike you as reasonable, and without providing any direct or indirect information about the results, getting professional advice about which model seems a priori to be the best model for your situation, and then go with that.

4) You mention using cluster robust estimation of the standard errors which we would like to do. How would you recommend it in Stata? We have around 500 firms and 11 industries.

I would use -vce(cluster CompanyNo)- here. 500 firms is plenty; 11 industries is, in my view and that of most statisticians, not enough (though I'm sure if you shopped around hard enough you could find somebody who would approve of clustering on 11 groups.)

And one more question, is it possible to also include a dummy for the bottom ESG firms, i.e., not comparing with the rest of the data (i.e., not top 20%) but with a specific group?

It's not clear to me what you have in mind. If what you mean is to consider a three-way partition of the firms into top ESG, bottom ESG, and those in the middle, and then doing the analysis with these three groups instead of just two, then sure, that can be done. The way to do it is to create a three-level variable: ESG_group = 0 for middle, 1 for bottom, 2 for top. Then use i.Time##i.ESG_group in your model.
Comment
Guest
#6

17 Apr 2021, 12:47

Hi Clyde,

Thank you for your response. It really helped.

One last question.

We ended up including two periods (not three).

When I plot the residuals vs. fitted values I get two clusters (see attached picture). Is this "okay" (with the assumptions) when we're dealing with two periods in the difference in differences? (i.e., normal to have groups/clusters for each period).

If not, is there any other way to control for this?

Best,
Guest

Last edited by sladmin; 10 Jun 2021, 14:44. Reason: anonymize original poster
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

18 Apr 2021, 12:13

Well, the appearance of two clusters is not a problem. In fact, it is a sign of a strongly predictive model that you get separation along the fitted values axis.

What is a bit problematic, however, is that the vertical spread on the left is much greater than that on the right. This is what heteroscedasticity looks like (when it is large enough for the eye to see.) This implies that your standard errors (and the test statistics, p-values, and confidence intervals derived from them) are wrong. So re-run the model using cluster robust standard errors. (That doesn't remove the heteroscedasticity--but it corrects the results to account for it.)
Comment
Guest
#8

19 Apr 2021, 09:57

Thank you! Ran it using cluster robust standard errors.

One more question, how can the two variables for the diff in diff be negative and then when combined be positive?

Have attached an example.

Thank you in advance.

Best,
Guest
Attached Files

Last edited by sladmin; 10 Jun 2021, 14:44. Reason: anonymize original poster
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

20 Apr 2021, 18:43

how can the two variables for the diff in diff be negative and then when combined be positive?

I'll turn that around with a question for you: what makes you think this is unusual, interesting, or a problem. There's no reason to think this wouldn't happen. So, you need to understand what these things mean. In an interaction model, the "main" effect variables (I hate that term, because I think it promotes confusion--I prefer to call them constituent effect variables) do not mean the same thing that they mean in a non-interaction model. In particular, Top20_S in this model is not the effect of Top20_S on the outcome variable. Nor is Crash the effect of Crash on the outcome variable. In fact, in an interaction model, there is no such thing as "the effect of Top20_S." In an interaction model like this one, there are two different effects of Top20_S: one that applies when Crash = 0 and the other applying when Crash =1. Similar facts obtain about Crash.

Rather, Top20_S represents the effect of Top20_S on the outcome variable when Crash = 0. And Crash represents the effect of Crash on the outcome variable when Top20_S = 0.

And what does Top20_S#Recovery represent? It is not the effect of Top20_S and Recovery combined. It is the difference between the effect of Top20_S when Crash = 1 and the effect of Top20_S when Crash = 0. (Equivalently, it is also the difference between the effect of Crash when Top20_S = 1 and the effect of Crash when Top20_S = 0.

Putting it in the form of a little table:

Situation	Formula for Return
Crash = 0, Top20_S = 0	_cons
Crash =0, Top20_S = 1	Top20_S + _cons
Crash = 1, Top 20_S = 0	Crash + _cons
Crash = 1, Top20_S = 1	Crash + Top20_S + Top20_S#Crash + _cons

So what your output here says is that the effect of Top20_S is 0.0039 greater when Crash = 1 than when Crash = 0. That is clearly perfectly consistent with both of those effects being negative: the interaction term is telling you which is bigger and by how much.

Note: Because your model has additional interaction terms, the above explanation is oversimplified and neglects the impact of other variables like Recovery that interact with Top20_S in the model, but the general principles are the same, although the formulas for Return are longer and more complicated. In fact, in modern Stata, one doesn't typically use these formulas to do the calculations, but instead, that is done by the -margins- command. -margins Crash#Top20_S- will give you all of those results without you having to do any additional calculation or coding (and it will properly account for the additional interaction with Recovery too).

Comment

Guest
#10

22 Apr 2021, 01:49

Thank you very much for the explanation!

I have two more questions.

1) Is there any statistical evidence to use the results from this model when we have an r^2 of 0.07? This seems really low. And if yes, only for the significant results? Would it make sense to use another regression model? If yes, do you have any recommendations? Preferably, an OLS model.

2) Would it statically make sense to include Fama French factors in the model? This can increase the model explanation to 0.56. To give a quick introduction to the Fama French factors, they help explain the excess performance of stocks and are on a daily basis but not for each stock but for the portfolio. "The Fama and French model has three factors: size of firms, book-to-market values and excess return on the market. In other words, the three factors used are SMB (small minus big), HML (high minus low) and the portfolio's return less the risk free rate of return. SMB accounts for publicly traded companies with small market caps that generate higher returns, while HML accounts for value stocks with high book-to-market ratios that generate higher returns in comparison to the market." So would there be any way to include these factors in the model or is it not possible statistically due to the fact that it is on a portfolio level and not for each stock?

Once again thank you!

Best,
Guest

Last edited by sladmin; 10 Jun 2021, 14:44. Reason: anonymize original poster
Comment
Guest
#11

22 Apr 2021, 08:06

I'm not sure I fully understand the Top20_S, Crash and Recovery coefficients (row 1-3).
Is e.g., the Crash the effect of both top_S=1 and top_S=0 or is it only when top=0 (i.e., similar to the bottom)

Best,
Guest

Last edited by sladmin; 10 Jun 2021, 14:45. Reason: anonymize original poster
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

22 Apr 2021, 12:44

1) Is there any statistical evidence to use the results from this model when we have an r^2 of 0.07? This seems really low. And if yes, only for the significant results? Would it make sense to use another regression model? If yes, do you have any recommendations? Preferably, an OLS model.

The low R² says that your outcome variable, return, is subject to the influence of many forces that are not included in your model, so that those you include only account for a small fraction of the total variation in return. That doesn't mean that your results can't be useful. The real issue is whether the effect sizes you have uncovered are large enough to matter in real world terms. That is not a statistical question, it is a question in finance, and needs to be answered by somebody knowledgeable in that field if you do not feel qualified to make that judgment.

2) Would it statically make sense to include Fama French factors in the model?

This question would have to be answered by somebody else who knows what Fama French factors are, somebody with expertise in your discipline. Sorry, I can't help you here.

I'm not sure I fully understand the Top20_S, Crash and Recovery coefficients (row 1-3).
Is e.g., the Crash the effect of both top_S=1 and top_S=0 or is it only when top=0 (i.e., similar to the bottom)

The coefficient of Crash is the effect of Crash being 1 rather than 0 when Top20_S and Recovery are both zero. Similar reasoning for the others.

You might want to create a complete table of the effects of Crash for all four possible combinations of Top20_S and Recovery:

Code:

margins Top20_S#Recovery, dydx(Crash)

This is a lot simpler than puzzling out the meaning of the coefficients in the regression output table. Regression coefficients are fairly opaque when there is a single interaction term, and when there are several interaction terms it really gets confusing.
Comment
Guest
#13

23 Apr 2021, 03:14

Thank you once again!

To be sure I understand it correctly, please see the following and the attached table for calculating returns.

Constant is the average return in the period before the Crash (and recovery) and for the bottom ESG

Top20_S is the average return for the top ESG before the crash (and recovery) relative to the bottom ESG (extra return), i.e., the extra effect of being high ESG before

Crash is the average return of the bottom ESG during the crash relative to before

Recovery is the average return for the bottom ESG during the recovery relative to the period before the crash (not relative to the crash)

Top20_S#Crash is the average return of top ESG during the crash (i.e., relative to before the crash) relative to the bottom ESG

Top20_S#Recovery is the average return of top ESG during the recovery (i.e., relative to before the crash not to the crash) relative to the bottom ESG

Are the interpretations of the five coefficients above correct? And is the following table correctly set up?

And when I try to use the code you send Stata says "variable Crash may not be present in model as factor and continuous predictor"

Best,
Guest

Last edited by sladmin; 10 Jun 2021, 14:45. Reason: anonymize original poster
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#14

23 Apr 2021, 23:43

Are the interpretations of the five coefficients above correct? And is the following table correctly set up?

The _cons and the first three coefficients are correct. The last two are not. Top20_S#Crash is not the average return in any group. Similarly for Top20_S#Recovery. These coefficients represent differences in marginal effects.

And is the following table correctly set up?

I cannot comment on the table. It mentions a variable ESG_treatment which appears nowhere in your model, so I don't know what to make of it.

And when I try to use the code you send Stata says "variable Crash may not be present in model as factor and continuous predictor"

That's right. Where Crash appears by itself, as it has no i. prefix, it is, by default, assumed to be continuous. But then, when it appears as part of an interaction term, it is, by default, a factor variable. So you have to keep it consistent. You can either put i. before Crash where it appears by itself, or you can simplify the whole thing by taking advantage of the ## operator:

Code:

regress return i.(Crash Recovery)##i.Top20_S and other predictors here...

Actually, even simpler would be to get rid of the Crash and Recovery variables and create a 0/1/2 variable for Before, Crash, and Recovery. Call that variable era and make the code

Code:

regress return i.era##i.Top20_S etc.
Comment
Guest
#15

24 Apr 2021, 00:03

Hi again,

Thank you. So the interpretation is:

Top20_S#Crash is the average difference in return for Top20S in crash relative to the difference of the Top20S before the crash.
When we say "relative to the difference" it is relative to the bottom, correct?
I.e., we could say "the more effect of being top_s during the crash relative to being top_s before the crash - all relative to the bottom ESG)".

Top20_S#Recovery is the average difference in return for Top20S in recovery relative to the difference of the Top20S before the crash, i.e., we could say "the more effect of being top_s during the recovery relative to being top_s before the crash - all relative to the bottom ESG)". Correct?

Please just read the table above as Top20S (we made it for two different ESG scores - the interpretation should be similar). It would be really nice to see if the table were correctly set up.

Once again thank you very much!
Comment

Announcement