Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DID - Including Time Fixed Effects?

    Dear Statalist Forum,

    I am currently writing my Thesis and have set up the following equation


    Y = B0 + B1*D1*Post + B2*Size + B3*Age + B4*Industry

    Where Y is the dependent variable and D1 = 1 for treatment and 0 for control. The term Post is a dummy variable for time, which is 0 before the treatment and 1 following it. Consequently, B1 is the difference-in-differences estimator. Size, Age, and Industry are included as control variables.

    Having looked in other forums here at Statalist, it seems as if a time indicator, i.e. + Post*B5, is often included in the regression in Stata so that the equation would instead become:

    Y = B0 + B1*D1*Post + B2*Size + B3*Age + B4*Industry + B5*Post

    However, as most research in this area does not include the time indicator, I would like to hear what the "correct" approach would be? Is it "wrong" to leave out the time indicator? And what is the exact difference in the interpretation of the results if including/excluding the time indicator?

    I can see that if I include the time indicator, the difference-in-differences estimator will change in some of the tests in STATA, and therefore I would really appreciate any thoughts on the above-mentioned issue.

    Best Regards,

    James

  • #2
    There is no general correct answer to your question. It is entirely context and problem dependent. As you have posed your question in a fully general and abstract manner and provided no clues about the specifics of your research, it is impossible to give an answer.

    The closest one can come to a general answer is to enunciate the general underlying principle. Time effects are used when it is believed that the outcome exhibits substantial variability over time during the era under observation. If what is expected is a more or less continuous trend over time, then this is usually modeled with a continuous variable (or a spline) representing time. If what is expected are period to period shocks to the outcome that apply to all units of analysis equally, then indicator variables for each time period (except one due to colinearity considerations) are used. If the outcome variable is thought to be stable over time, then there is no need to include it in the model.

    The principle is unassailable, but deciding how it applies to a particular problem is sometimes difficult, and at the very least requires knowledge about the outcome variable being studied and how it behaves in the real world. That would not be a statistical issue, and if you are not sure how this principle applies to your situation you need to consult an expert in your field.

    Comment


    • #3
      Hi Clyde,

      Thank you for your detailed answer. It is very much appreciated.

      We want to compare the effect of private equity ownership on firm performance comparing one year before the investment with three years after the acquisition. Most peer-reviewed researchers in this field do not control for Post in their model, but they do not provide an exact explanation for not controlling for Post. Therefore, I would like to know what the exact implication is of not using Post as a control variable? And how should B1 be interpreted if you do not include Post as a control variable? Can B1 still be interpreted as a difference-in-differences estimator?

      When we run the regressions with Post as a control variable, the p-value for this term is usually insignificant and the overall model fit is worse (higher F statistics), so we thought that it would be better to simply leave it out from the regressions and do as other researchers in this field. Examples of typical dependent variables, Y, that we test are profit margins and efficiency ratios (i.e. EBITDA divided by total assets).

      Best regards,

      James

      Comment


      • #4
        Ah, this is a rather different question.

        Most peer-reviewed researchers in this field do not control for Post in their model, but they do not provide an exact explanation for not controlling for Post. Therefore, I would like to know what the exact implication is of not using Post as a control variable? And how should B1 be interpreted if you do not include Post as a control variable? Can B1 still be interpreted as a difference-in-differences estimator?
        This depends on whether or not you are including i.year in the model as a fixed effect. If you are, then Post would be colinear with the year indicators, and Post would be automatically omitted by Stata anyway. B1 would still be interpreted as a difference-in-differences variable. (And B1 would be the same as if you included Post and omitted an extra year indicator instead.) So you can just omit Post from the regression command in the first place if you like. (My preference is to include it anyway and let Stata drop it automatically--if for some reason Stata doesn't drop it, that alerts me to the fact that there is an error in either the Post variable or the year variable that I have to fix. Fail early and often!)

        If, however, you do not have year fixed effects in the model and you also omit Post, then this is no longer a difference in differences model. You could not interpret B1 as a DID estimator of the acquisition effect. In fact, it would just be a mis-specified model that can't be interpreted at all.

        When we run the regressions with Post as a control variable, the p-value for this term is usually insignificant and the overall model fit is worse (higher F statistics), so we thought that it would be better to simply leave it out from the regressions and do as other researchers in this field.
        NO! NO ! NO! NO! NO!

        This is a common misunderstanding about "insignificant" variables, and widely seen even in pee-reviewed publications but it is dead wrong. An "insignificant" coefficient is not a justification for omitting a variable from a model, never. It is even worse when that variable is part of an interaction. Remember that in the interaction model, Post's coefficient represents the expected difference in outcome between the pre- and post- time periods in the control group! You want that to be small because the two groups are presumably more or less indistinguishable at that point in time or you likely wouldn't have chosen this group of controls. But you also don't expect the difference to be exactly zero (and a non-significant coefficient DOES NOT mean that it is zero), and the whole point of a DID model is that you are adjusting for that small difference. (Whether it is statistically significant or not depends on extraneous considerations such as sample size and variances.) If it isn't in the model, then you are not adjusting for it and you are not getting a DID estimate. Finally, any model that includes an interaction term but omits any of the constituents of that interaction term is mis-specified and uninterpretable, unless the omission is due to colinearity with other variables (in which case the variable is omitted but the very necessary information it carries is conveyed by other variables that are retained).

        Comment

        Working...
        X