Difference-in-difference with multiple periods

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#16

24 Apr 2021, 00:11

The rows that you shaded in purple are problematic, and they reflect improper modeling of your situation. If Recovery = 1 with Crash = 0 is not a possible combination, then you cannot have Recovery and Crash as separate variables in the model. You must do it the way I have previously suggested:

Create a 0/1/2 variable for Before, Crash, and Recovery. Call that variable era and make the code

Code:

regress return i.era##i.Top20_S etc.

The rest of the table looks correct.
Comment
Guest
#17

24 Apr 2021, 00:14

Thank you! And the following interpretation is correct as well?

Top20_S#Crash is the average difference in return for Top20S in crash relative to the difference of the Top20S before the crash.
When we say "relative to the difference" it is relative to the bottom, correct?
I.e., we could say "the more effect of being top_s during the crash relative to being top_s before the crash - all relative to the bottom ESG)".

Top20_S#Recovery is the average difference in return for Top20S in recovery relative to the difference of the Top20S before the crash, i.e., we could say "the more effect of being top_s during the recovery relative to being top_s before the crash - all relative to the bottom ESG)". Correct?
Comment
Guest
#18

24 Apr 2021, 00:14

And so we have 6 possible outcomes (the table without the purple rows) for the model? No more, correct?
Comment
Guest
#19

25 Apr 2021, 03:58

It is especially the Top20_S#Recovery I'm in doubt about showing the effect relative to the crash period or relative to the pre-crash period
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#20

25 Apr 2021, 11:11

Top20_S#Crash is the average difference in return for Top20S in crash relative to the difference of the Top20S before the crash.
When we say "relative to the difference" it is relative to the bottom, correct?
I.e., we could say "the more effect of being top_s during the crash relative to being top_s before the crash - all relative to the bottom ESG)".

I think you are using the term "relative" to mean two different things in this sentence, so it is a little confusing.

The coefficient of Top20_S#Crash is the expected difference in the effect on return of being Top20S during the Crash period and the effect on return of being Tpo20S before the crash. It is a difference in effects on return, it is not the difference in return itself between any groups.

And so we have 6 possible outcomes (the table without the purple rows) for the model? No more, correct?

Correct.
Comment
Guest
#21

26 Apr 2021, 10:12

So if our coefficient for Top20_S#Crash was for example 0.12 is it correct that we can interpret this as:

"An investor would yield an average of 0.12% higher return within the crash period by investing in high rated ESG stocks compared to low ESG stocks (relative to the period before the crash)"
"This is similar to a 2.7% cumulative higher return (0.12%*22 days) by selecting high in contrast to low rated ESG stocks in the crash period" (we have 22 days in our crash period)

This is how we understand this article interprets it:
https://academic.oup.com/rcfs/article/9/3/593/5868419
They say for example "High ES-rated firms earn an average abnormal daily return of 0.45% relative to other firms from February 24 to March 17, for a cumulative effect of 7.2% (0.45% x 16)."
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#22

26 Apr 2021, 11:19

"An investor would yield an average of 0.12% higher return within the crash period by investing in high rated ESG stocks compared to low ESG stocks (relative to the period before the crash)"

I don't actually understand the sentence, so I can't say if it looks right or wrong.

I would phrase this finding as "the effect of the crash period, relative to before the crash, on retruns is 0.12 greater for high rated ESG stocks than it is for low rated ESG stocks."

Explaining interaction terms in words is difficult and often confusing because normal language does not often deal with second-order mixed partial derivatives (which is what interaction coefficients actually are mathematically), and there are not simple words or phrases to express them. It might help your audience if you supplement your word descriptions with a table with Crash vs Before Crash vs After Crash in the row stubs, and High ESG vs Low ESG in the column headers, and put the corresponding expected returns in the cells.
Comment
Guest
#23

26 Apr 2021, 23:43

Thanks. You mean like the table I send before? Or how would you set it up with the High ESG also?
Comment
Guest
#24

27 Apr 2021, 07:22

Once again thank you so much for helping.

I got one more question in regards to the linearity assumption for OLS.

We talked about that it should be distributed around the x-axis which indicates linearity. Would you say this is the case based on the two outputs below? (for two dependent variables - one returns and one abnormal returns). Please note we correct results for heterogeneity by robust standard - which we can see by the pattern on the y axis.

Best,
Guest

Last edited by sladmin; 10 Jun 2021, 14:52. Reason: anonymize original poster
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#25

27 Apr 2021, 09:21

To my eye those graphs look OK as far as linearity is concerned. There is heteroskedasticity, but as you have dealt with that using robust standard errors that's not a problem.
Comment
Guest
#26

03 May 2021, 07:54

Regarding the random sample assumption for the OLS. How do you normally argue for this? To my understanding, you cannot test for it. We have looked at S&P 500 and since we apply all companies, I would argue that this fulfills the random sample assumption? Is this correct, or how?
Comment
Guest
#27

03 May 2021, 08:41

And thank you once again.

Two more questions with this model and the definition of variables.

xtreg RawReturn Top20_ESG Crash Recovery 1.Top20_ESG#1.Crash 1.Top20_ESG#1.Recovery i.GICSectors LN_assets Leverage Liquidity MBV ROA if Not20_ESG != 1, vce(cluster CompanyNo)

1) Would you categorize variables in the above regression as:
Dependent variable: Raw Returns
Independent variable: Top ESG, Crash, Recovery
Control Variables: GICSectors LN_assets Leverage Liquidity MBV ROA

2) We agree that the control variables all er "included", i.e., taken into account, in the difference-in-differences interaction terms. Correct? So we say that it is "industry adjusted".

Best,
Guest

Last edited by sladmin; 10 Jun 2021, 14:52. Reason: anonymize original poster
Comment
Guest
#28

03 May 2021, 11:26

One last question I have calculated the VIF values to check for multicollinearity. Is this still fine when the independent variables are dummy variables? I get the following output.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#29

03 May 2021, 12:12

Re #26: "Random sample" is not an assumption, nor is it anything you can test within a data set. It refers to the way your sample was obtained from all the possible samples that anyone could obtain. If you set out with a plan to use a random number generator to select which firms to include in your sample, then it would be a random sample. I'm interpreting your question to mean that you did not do that: rather you set out to obtain information on all S&P 500 firms during the time period of your study. If so, what you have is not a random sample, it is a census over a time period. Such a sample is unquestionably representative of the S&P 500--there is no sample more representative than a census. But it would not be a random sample of firms. Consequently, generalizations drawn about S&P 500 firms are justified using this data, but generalizations about non-S&P 500 firms have no basis in your data at all.

#27. 1) Yes, although I never use the term "Control" variables when speaking of observational data, where nothing is actually controlled. These terms are properly called covariates and are included in the model to adjust for their influence on the outcome. I realize, however, that the (mis)use of the term control variable is widespread, and in the sense that it is widely misused, you are misusing it in the same way--so most people will understand it the way you mean it.

2) I do not follow you here. The "control" variables are taken into account by virtue of their being included in the list of variables in the regression command. They are not involved in the interaction terms at all. If the variable called "GICSectors" encodes industry, then, yes, your results are industry-adjusted by virtue of your having included this. If that's not with GICSectors is about, then there is no variable in the model whose name suggests to me that it might encode industry, and in that case, you could not call your results industry-adjusted. I don't know what GIC means, but the term "Sector" is commonly used to refer to industry or groups of industries.

#28. This is one of my pet peeves. You should not waste your time calculating VIFs. They have no relevance to anything. I presume you did this because you are concerned with the possibility of multicollinearity in your data. Multicollinearity is the most overrated "problem" in all of statistical analysis, in my opinion. When a model includes several variables that have high levels of intercorrelation, the effect of that is to make the estimation of the regression coefficients of the variables involved in the near-colinear relationship less precise--in other words, the standard errors are larger, the confidence intervals wider, and the p-values higher. Now, when is this a problem? It's only a problem if it affects what you referred to in #26 as the independent variable(s). If the affected variables are only covariates included for adjustment purposes (what you called "control variables"), then it doesn't matter: you still have adjusted for those variables, and since it was not your purpose to estimate their effects on the outcome it makes no difference that the results don't give you enough information to do that. So the question really should be: are my independent variables sufficiently affected by multicollinearity among themselves or with other variables in the model that I cannot adequately estimate their effects in the model. And the answer to that question is found by looking at the standard errors (or confidence intervals) of the coefficients of the independent variables. If those confidence intervals are sufficiently narrow that you answer your original research questions, then you can just forget about multicollinearity--even if it is present it has done you no harm.

What if, in fact, the confidence intervals for an independent variable's coefficient is so wide that you simply can't answer your research question(s) about that variable? Well, then you have a problem: you have to report that your analysis was inconclusive. But, actually, it's worse than just that. Because there is no way to resolve that problem with the existing data set. The only solutions are to gather a (much, much) larger data set--presumably not possible in your case since you already have a census-- or to get another data set that samples in a completely different design using matching or something like that to eliminate the problematic colinearity. Notice that VIFs have provided no useful information in reaching these conclusions, nor do they play any role in finding a new data set if that is what you need.

For a very well-written, entertaining takedown of multicolinearity see Arthur Goldberger's econometrics text. He devotes a whole chapter explaining why talking about multicolinearity is a waste of time.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment