Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • fixed effect at which level?

    Hello everyone,

    I have panel data by student, year and institution. The outcomes are student-level academic performance (both binary and continuous), and the predictor is a categorical treatment experience indicator (more than 2 categories). I am trying to think of a way to incorporate fixed effect to better account for student level heterogeneity. It's straightforward to have year fixed effect, but in terms of institution and student level fixed effect, I can't reason through conceptually and get confused. Can we include both? Then how to define the error term? I see people handle it differently but don't see a way I completely agree with. Could anyone please shed some light on this? A million thanks!

  • #2
    You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    If you want to control for student heterogeneity, then you need fixed effects at the student level. If you xtset your data by student and year, you use xtreg with i.year in the rhs variables -
    xtset student year
    xtreg y x i.year, fe

    As you know, this essentially creates a separate intercept for each student. If you have more than one observation in a student-year, you'll need to just xtset student. This means you won't be able to use lag or lead operators and some diagnostics for time series data should not work.

    If students only attend one institution, then the student fixed effect fully accounts for institution (since institution doesn't vary within students). If students change institutions, then you could have both a student and institution fixed effect, but if this is very rare, I wouldn't bother. You can do two panel variables either with xtreg by adding i.institution to the rhs variables (but you may need to increase your matrix size and it may be slow ) or using reghdfe (user written).

    Comment


    • #3
      I encounter a bit similar situation in which I am not sure whether including both student and institution fixed effects works for my case.
      I examine how subsidy program for each institution affects student outcome. My equation is as follows:
      Student outcome_it = Subsidy_jt + Time FE + Student FE + Institution FE
      As Phil Bromiley suggested, there are many students change institutions in my case, I may include both student FE and institution FE. However, the variable "Subsidy" is created based on the instution that the student studied in the baseline year. The variation is at the institution in the baseline year.
      I have two questions:
      1, Should I include Institution FE in this case? If yes, why would I need to include it when students change institutions? How does this help my empirical strategy?
      2, I think the effect may be different between students in the low-income households and students in the high-income households. I intend to interact a dummy variable (Dum_i) indicating whether the student comes from the low-income household or not in the baseline year as follows
      Student outcome_it = Subsidy_jt * Dum_i + Subsidy_jt + Dum_i + Time FE + Student FE + Institution FE
      I wonder whether the variable Dum_i will be absorbed into Student FE or not? Is there is any case in which regression including both Dum_i and Student FE still provide estimates for those two variables?

      I would appreciate any advice on this.
      Thanks

      Comment


      • #4
        The FE will not permit a coefficient on any variable that does not change for level of the FE.

        If subsidy is constant by institution, then the institution FE will eat it. If some students get subsidies but not all, then you can include it.

        I'd use reghdfe.

        Code:
        ssc install reghdfe
        
        reghdfe outcome_it    subsidy_jt   c.subsidy_jt#c.Dum_i   Dum_i , absorb(student institution time)
        Dum_i coefficient may not be estimated, depending on whether its time invariant. (you could include Dum_i in the absorb). The interaction should be ok.







        Comment

        Working...
        X