Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    these are different models. The one with c.year##i.state incorporates a linear time trend in each state. That is, within each state, there is a continuous drift over time in the expected value of y, whereby it increases (or decreases, as the case may be) by a constant amount per year. That rate of increase or decrease differs from one state to the next.

    In the model with i.year##i.state, the model incorporates an idiosyncratic shock to the expected value of y in each year. It might go up one year, down the next, then up again for 2 years, then down for 5 years, then up for 3, etc. The shocks are independent from one year to the next. And, in this model, they differ from one state to the next.

    Neither model, strictly speaking, incorporates time fixed effects. The c.year##i.state model has nothing even remotely like a time fixed effect. For the i.year##i.state model you could, if you like, think of the shocks as being time fixed effects, but because they differ across states they are not truly that. A true time fixed effect, though it has the arbitrary shock character, is by definition the same in all states.

    It is not surprising that results differ in these two models. They are very different models. The one with linear trends is based on very strong assumptions about the trajectory of y in time, whereas the one with shocks is compatible with arbitrary movement of y over time. If you fit a "shock" model to a situation where there is, in fact, a linear trend, the shocks themselves will show linear growth in the year coefficients from onoe year to the next. So you can think of the linear trend model as a very special case of the shock model, one that arises under very special constraints. That said, when the real situation involves linear growth, the linear growth model is a much more efficient way of capturing that effect in the model.

    If, on the other hand, you fit a linear trend model to a situation where the real situation is shocks, the linear trend coefficient will be zero or very close to zero, and the effects of the time shocks will be absorbed into the residual noise term of the model, making all of your other coefficient estimates less precise than they could be. That is the best case scenario for this model-reality mismatch. A worse outcome can also occur: if the treatment effect actually varies over time, the failure to capture the time effects can result in those effects being absorbed into the treatment effect estimate, which can result in a biased estimate of treatment effect. The bias can be in either direction.

    So you should choose between the models, if at all possible, based on a strong theoretical understanding of the time dynamics of your outcome y. When that is not possible, the linear trend model being a highly constrained version of the shocks model, it is possible to compare the models using the Bayes or Akaike information criteria (BIC, AIC). Both of those will credit the model that gives a better fit, but will also penalize the shock model for requiring a larger number of degrees of freedom to do so.
    Also when I use c.year##i.state my results are much better. What can be the reason I can look for?
    I don't know what you mean by "better." In any case, it is unscientific at best, and scientific misconduct at worst, to select a model based on the results being closer to your preferences.

    Comment


    • #47
      Thank you for the reply. I got much better understanding of the concept. So if I want to add a state-specific linear time trend I should use c.year##i.statefips?
      I thought when I used "##" it usually accounts for state and year fixed effects in the regression. Because the regression results have control for state and year respectively with the interaction term.
      Because, if I add i.statefips or i.year separately in the regression it will not give me the results because it has been already controlled for. Hence I thought I am incorporating fixed effects along with linear trend. Am I wrong?

      Also what happens if I add state and year fixed effects with c.year#i.statefips? I have seen paper mentioned all three in the regression but your post from previous suggest it some criticism to include all three at the same time. Is this right or I just understand it wrong?
      Last edited by John lenon; 28 Apr 2022, 15:46.

      Comment


      • #48
        The term "fixed effects" is used to mean a number of different things, and is often just casually thrown around. So it can be confusing.

        The strict meaning of a year fixed effect is a series of indicator variables ("dummies") that represent yearly shocks to the outcome variable that apply across all states. The strict meaning of a state fixed effect is a series of indicator variables that represent state-level variability in the level of the outcome variable that applies across all years.

        When you use c.year##i.statefips, Stata inserts three variables into the regression: c.year, i.statefips, and c.year#i.statefips. None of these satisfies those definitions. c.year is not a series of indicator variables to start with: it is a single continuous variable. i.statefips is a series of indicator variables, and it does represent state-level variation in the outcome level. But because there is also the interaction term, the state level variations do not apply across all years. Indeed, the whole point of an interaction model is that neither of the constituents of the interaction has an effect that applies across all values of the other. If there were such a homogeneous effect, there would be no point to having the interaction term.

        That said, your concern is that you are appropriately adjusting* for time trends (linear trends, not shocks when you use c.year) and state heterogeneity. And you are. But you are doing it with a different mechanism. These variables carry the information that strict-sense state-level fixed effects would have and more. And the c.year and c.year#i.statefips terms together adjust properly for linear time trend and more.

        I'm going to prattle on a bit more, because you will soon be at the point of interpreting your results, and it is important that you not misinterpret them. When you see the results for variable c.year, you must not mistake that for "the linear time trend." Because this is an interaction model, there is no such thing as "the linear time trend." Rather, each state has its own linear time trend. That one that gets labeled c.year is just the linear time trend for whichever state is the reference category for statefips and is not explicitly listed in the output. For other states, the linear time trend can be calculated by adding the coefficient of c.year with the coefficient of c.year#that state's statefips variable, should you need it. (There are few circumstances where this is needed, however.)

        Even more important, you must not mistake a coefficient for a given statefips level to be "the effect of that state." Again, because it is an interaction model, there is no such thing as "the effect" of any state. Rather, the coefficient for a given statefips represents an effect of that state in the year where your year variable == 0. If you are coding year in its natural coding, there will be no such year in the data as I presume you do not have data going back to Biblical times. So these coefficients have no real meaning at all by themselves. To find the state-specific variation in a given year, you have to add up the state coefficient plus the value of the year variable * the coefficient of c.year#that state's statefips variable.

        I will close by also noting that if you are running your model using -xtreg, fe- or -areg, absorb(statefips)- or -reghdfe, absorb(statefips)- you will not get any coefficients for the statefips variables. That is just as well as really there is no reason to be interested in them anyway. So don't invest a lot of time and energy thinking about adding up those various coefficients. Just make sure you don't misinterpret what coefficients you do get.

        *I use the term adjusting because in observational data, it is never possible to actually "control" for anything. While the term control is widely misused to refer to the process of adding a variable to a regression for the purpose of reducing omitted variable bias, the proper term is adjust. Control is only possible with experimental data, and the controlling is done in the study design, not during analysis.

        Comment

        Working...
        X