Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about GLS RE modeling

    Hi Anyone,

    I am intending to estimate a GLS RE model (I have STATA 13.0 so I think I can either use mixed or xtreg, with the same results). I have within-person longitudinal data, 11 years of it, and my outcome (average student attendance rate in school) , and time-invariant control variables (like race, sex) as well as time-varying control vars (such as family poverty status). My main IV is the number of years that a student was exposed to the intervention, introduced in the middle of my 11-year time span. I'd like to see the linear relationship between years of implementation (pre and post) and the outcome, which is individuals' average attendance rate. Thus, I have centered the number_of_years_implementing variable on the year in which the intervention was introduced.

    Does this approach seem correct? And if yes, is it also feasible to introduce fixed-effect dummies for each year? My final model command looks like this:

    xtreg avg_daily_attend y0405 y0506 y0708 y0809 y0910 y1011 y1112 y1213 y1314 y1415 var1 var2 var3 var4 var5 years_into_intervention

    where avg_daily_attend ranges between 1-100
    where y0405...y1415 are the year dummy vars (with the start of the intervention year excluded)
    where var1, var2, and var3 are time-invariant and
    where var4 and var5 are time-varying, and
    where years_into_intervention is my primary IV

    I just wanted to get some input from someone with more experience doing this. No one around my office seems to be able to serve as my sounding board today!

    Thanks in advance!

    Jane
    Last edited by Jane Schweister; 18 Jul 2018, 15:18.

  • #2
    First, the simple part. Yes you can use either -xtreg, re- or -mixed-. The results might not be exactly the same, as different algorithms are used to calculate the estimates, but they are both estimating the same model. The differences should be rather small, but not necessarily zero.

    If I understand what you're doing correctly, the specification of years_into_intervention sounds wrong. Is there any reason to expect average attendance rate to have a systematic linear trend over time during the years before the intervention (with some yearly shocks superimposed on that)? Presumably that was some kind of control condition. And if there is a secular trend like that, the model you are proposing stipulates that the intervention does nothing to change it. That's an odd intervention to study. My thought would be to do something more like this:

    Code:
    mkspline prior 0 subsequent = years_into_intervention
    xtreg avg_daily_attend y0405 y0506 y0708 y0809 y0910 ///
         y1011 y1112 y1213 y1314 y1415 ///
         var1 var2 var3 var4 var5 ///
         prior subsequent
    lincom subsequent - prior
    The first command creates a linear spline that identifies a piecewise linear model of avg_daily_attend as a function of time. Since your intervention starts with years_into_intervention == 0, I use that as the joinpoint of the spline. To understand how the spline works, run -graph twoway line prior subsequent years_into_intervention, sort-.

    With this change, you are modeling one trend in attendance (subject also to yearly shocks) prior to the start of the intervention, and a potentially different trend after the intervention. The effectiveness of the intervention is then assessed estimating the difference between the trends in attendance rates after and before the intervention, using the -lincom- command.

    By the way if the entire data set is contained between years 0405 and 1415, these y* indicators will always sum to 1. Due to this colinearity, Stata will omit one of them. This is unavoidable and is not a problem, so don't be disturbed when Stata notifies you that it's doing that.

    Comment


    • #3
      Thank you so much, Clyde! Really super helpful advice. I now modeled my two-legged spline and have come up with a significant estimate for the 'subsequent' var. I will do some further specifications to see if indeed, a linear relationship with time is the best choice.

      Did you agree it makes sense to add the FE terms for each year (omitting the year that the intervention was introduced)? I wasn't able to find any cautions in the help pages about using FE dummies inside an RE specification. But, it makes sense to me to try to account for unobserved yearly factors.

      Thanks, again!

      Jane

      Comment


      • #4
        I wasn't able to find any cautions in the help pages about using FE dummies inside an RE specification.
        That's because there are no general reasons not to include them. It's a decision that has to be made on a case-by-case basis and it's based on the underlying science.

        What it boils down to is this: is the outcome variable in your study subject to year-by-year shocks that are appreciably larger than just the noise in the data. My guess is that it is. One year we might have a bad flu season that drives attendance rates down compared to normal. Another year there might be some special program at the school that the students really like and attendance spikes that year. That seems to me to be baked into that variable, and only yearly indicators can capture that. Of course, this subject matter is outside my expertise, and perhaps such things don't really happen, or don't really affect school attendance very much. In that case, adding these indicators would just complicate the model and add nothing useful.

        The main point is that it's a subject matter consideration, not a statistical one. So you're the expert here.

        Comment


        • #5
          Your reasoning about yearly shocks is absolutely correct! Thank you again for providing me with some reassurance that I'm on the right track!

          -Jane

          Comment

          Working...
          X