Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating a between-subjects effect with fixed-effects regression

    I would greatly appreciate your thoughts on my approach to estimate the effect of a between-subjects intervention while accounting for subject fixed effects.

    Setup
    I have a data set (see end for example) with multiple observations per individual, and both between- and within-subjects interventions.
    • Each individual is given 1 of 2 tasks, A and B (between-subjects intervention).
    • Individuals do the task in several rounds, and each round presents the task in 1 of 2 ways, X and Y (within-subjects intervention).
    • The outcome is performance in the task, measured by a variable called "output".
    The baseline OLS specification is just the average output across the 4 combinations of interventions, AX, AY, BX, and BY (where AX is the omitted group):
    Code:
    reg output 1.B##1.Y, cl(id)
    Problem
    However, I would like to also account for individual fixed effects. Pesaran and Zhou (2016, Econometric Reviews) demonstrate a two-step procedure to estimate time-invariant effects while accounting for fixed effects:
    1. Run the fixed-effects regression without the time-invariant variable (B in this case).
    2. Regress the residuals from the FE regression on the time-invariant variable.
    A/B are randomly assigned, so intuitively this makes sense: Compare the average fixed effects between individuals in groups A and B. The difference is the average effect of B.

    The Stata command xtfef (written by Law and Zhou) implements this procedure, but does not provide the baseline average of the fixed effects for the omitted group AX (i.e., the intercept).

    My approach
    I'm considering the following simpler, and hopefully transparent approach.

    Regress the within-subjects intervention Y on individuals under A and B separately:
    Code:
    xtreg output Y if B==0, cl(id) fe // Equation 1
    xtreg output Y if B==1, cl(id) fe // Equation 2
    Then, I take the difference in the "constants" of Equation 1 and Equation 2 as the effect of B. (The point estimate is the same as from xtfef code.) For the standard error of this difference, I use the Delta method (square root of SE(constant1)^2 + SE(constant2)^2). This way, I have the average output across all four groups with the individual fixed-effects partialed out.

    Any red flags? Thanks in advance for all your help!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(id round B Y output)
    1 1 1 0 1
    1 2 1 1 7
    1 3 1 0 6
    2 1 1 1 6
    2 2 1 0 7
    2 3 1 1 1
    3 1 1 0 6
    3 2 1 1 0
    3 3 1 0 6
    4 1 0 1 9
    4 2 0 0 2
    4 3 0 1 0
    5 1 0 0 4
    5 2 0 1 9
    5 3 0 0 5
    6 1 0 1 9
    6 2 0 0 2
    6 3 0 1 6
    end

  • #2
    Dave:
    wouldn't the following code do the trick?
    Code:
     xtset id timevar
    xtreg output 1.B##1.Y i.timevar, re vce(cluster id)
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo, thanks for the response! You're right in that the random effects assumptions would hold in this baseline specification, since the interventions B and Y are randomly assigned.

      However, I'd like to extend the model to later include covariates that are likely correlated with the individual effects. For that reason, I'd like to avoid using the random effects model if possible.

      Ideally, I think I want to do the following:
      Code:
      reg output Y ibn.id if B==0, nocons
      est store A
      reg output Y ibn.id if B==1, nocons
      est store B
      
      suest A B, cl(id)
      lincom (exp) // where exp compares the average of the coefficients on "id" dummies in group A vs. those in group B
      lincom isn't letting me do this since I have too many dummies for id—I get the "expression too long" error.

      I guess I could export the estimated coefficients and variance-covariance matrix, and then do the test manually?

      Comment


      • #4
        The following implements what you describe as the "Pesaran and Zhou (2016, Econometric Reviews) demonstrate a two-step procedure to estimate time-invariant effects while accounting for fixed effects:
        1. Run the fixed-effects regression without the time-invariant variable (B in this case).
        2. Regress the residuals from the FE regression on the time-invariant variable."
        Code:
        . qui xtreg output Y, fe
        
        . predict double fixed, u
        
        . reg fixed i.B
        
              Source |       SS           df       MS      Number of obs   =        18
        -------------+----------------------------------   F(1, 16)        =      1.73
               Model |   1.3429784         1   1.3429784   Prob > F        =    0.2068
            Residual |  12.4104938        16  .775655864   R-squared       =    0.0976
        -------------+----------------------------------   Adj R-squared   =    0.0412
               Total |  13.7534722        17  .809027778   Root MSE        =    .88071
        
        ------------------------------------------------------------------------------
               fixed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 1.B |  -.5462963   .4151722    -1.32   0.207    -1.426422    .3338295
               _cons |   .2731481   .2935711     0.93   0.366    -.3491948    .8954911
        ------------------------------------------------------------------------------

        Comment


        • #5
          Joro, thanks for the suggestion. Pesaran and Zhou's procedure is a bit more involved since the standard errors in the 2nd step need to account for the estimation in the 1st step, but your code should arrive at same point estimates.

          After some more searching, I think I can use random effects as Carlo suggested in the baseline regression, and then introduce correlated random effects model when I want to add covariates that are potentially correlated with the individual effects.

          Comment


          • #6
            Indeed, the authors claim some slightly more complicated expression for the variance of the second stage, and this is surprising to me. (By the way the paper is 2018, not 2016: Pesaran & Qiankun Zhou (2018) Estimation of timeinvariant effects in static panel data models, Econometric Reviews.)

            Generally, when you have a generated regressor you need to fix the standard errors. Here you have a generated regressand, the second stage regression estimates uncertainty correctly, possibly with robust variance. The authors refer to simulation results available upon request. Until I see simulation results showing me that this particular two stage procedure gives wrong standard errors, I am not very convinced.

            You can always get the allegedly different and correct standard errors by bootstrapping both stages of the procedure. You just wrap up the two lines of code in a programme and you bootstrap it.

            Also I think you re overthinking this. Of course you can do what Carlo suggested and estimate random effects model. You can also simply compare average output by A and by B. If the A/B treatment is randomly assigned across people, simple mean comparison by A/B does the trick.


            Originally posted by Dave Kim View Post
            Joro, thanks for the suggestion. Pesaran and Zhou's procedure is a bit more involved since the standard errors in the 2nd step need to account for the estimation in the 1st step, but your code should arrive at same point estimates.

            After some more searching, I think I can use random effects as Carlo suggested in the baseline regression, and then introduce correlated random effects model when I want to add covariates that are potentially correlated with the individual effects.

            Comment

            Working...
            X