Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Determining the level of fixed effects - applying Papke and Wooldridge (2023) method

    Dear Statalist users,

    I am writing this post to ask for your help in determining the level of fixed effect to be controlled in linear panel model, suggested in Papke and Wooldridge (2023) - A simple, robust test for choosing the level of fixed effects in linear panel data models (link)
    (I am studying the effect of village-level treatment on household-level outcomes, so trying to figure out whether I should include household- or village-FE)

    I was able to follow the procedure with Stata, using NLSY data as an example. But have two questions to check the procedure and final steps.

    Suppose I estimate the effect of hours worked on ln(wage) as below, with a vector of controls including time-varying (age and weeks_worked) and time-invarying (race) at individual-level.
    lnwageit = b0 + b1*hoursit + b2*Xit + (Fixed Effects)
    Denote b1hatiFE and b1hatgFE as the estimates of b1 under unit-FE and group- FE.

    My goal is to test whether b1 (coefficient on hoursit) is robust to the choice of fixed effect (i.e. b1hatiFE = b1hatgFE); individual-level or group-level (industry-level in this example)

    Here's the procedure suggested in the paper (procedure 3.2 in Chapter 3.3 "Testing a single coefficient")


    Step 1: Run unit-FE regression with time dummies and controls, and obtain the residuals. Repeat it with group-FE.

    Step 2. Run unit-FE regression of the variable of interest (hoursit in this example) on time dummies and controls, and obtain the residuals. Repeat it with group-FE.

    Step 3. Compute the average of (unit-FE residuals)2 (from step 2) across i and t, and the average of (group-FE residuals)2 (from step 2) across i and t.

    Step 4. Construct q_hat, the difference in {(residuals from step 2) * (residuals from step 1) / (the average from step 3)} between unit-FE model and group-FE model (equation 3.16 in the paper)


    Step 5. Obtain SE(b1hatiFE - b1hatgFE), the standard error of (b1hatiFE - b1hatgFE), from regressing q_hat (from step 4) on the constant value 1, probably clustering at the group-level or at least at the individual-level. The single estimated coefficient will be identically zero.

    Then the paper wrote we can use a t statistic version of the Hausman test, obtained as (b1hatiFE - b1hatgFE) / SE(b1hatiFE - b1hatgFE), to test whether we can use individual-FE or group-FE.


    Here's the Stata code I used to replicate the procedure.

    Code:
    use https://www.stata-press.com/data/r18/nlswork, clear
    lobal    Y    ln_wage
    global   T    hours
    global   G    ind_code // industry-identifier
    global   Xs    age race wks_work
    
    *    (0)    Make balanced panel data by keeping balanced individuals only (not strongly required, but for convenience)
            
    *    Keep only observations which all variables in the regression are non-missing.
    egen    num_missing    =    rowmiss(${Y}    ${T}    ${G}    ${Xs})
    keep    if    num_missing==0
            
    *    Keep only individuals surveyed across all years
    keep    if    inrange(year,68,73)    //    Keep it shorter to make a larger sample of balanced data
    bys    idcode:    egen    num_surveyed    =    count(ln_wage)
    keep    if    num_surveyed==6
    
    
    
    *    (1-1)    Run individual-FE model with time dummies, and get residuals (SE clustered at individual-level)
        xtset    idcode year // idcode is unit identifier
        xtreg    ${Y} ${T}    ${Xs}    i.year, fe vce(cluster    idcode)
        scalar    b1hat_iFE=e(b)[1,1]    //    indivdiual-FE estimator
        predict    uhat_iFE_resid,    residual    //    residuals
        
        *    (1-2)    Run industry-FE model with time dummies, and get residuals    (SE clustered at individual-level)
        reg    ${Y}    ${T}    ${Xs}    i.${G}    i.year,    vce(cluster    idcode)
        scalar    b1hat_gFE=e(b)[1,1]    //    group-FE estimator
        predict    uhat_gFE_resid,    residual
    
    *    (2-1)    Run unit-level FE regression of T on time dummies and covariates, and get residuals (x_doubledot)
        xtreg    ${T}    ${Xs}    i.year,    fe    vce(cluster idcode)
        predict    x_doubledot,    residual
            
            *    (2-1)    Run group-level FE regression of T on time dummies and covariates, and get residuals (x_singledot)
        reg    ${T}    ${Xs}    i.${G}    i.year,    vce(cluster    idcode)
        predict    x_singledot,    residual
    
    
    
    *    (3)    Compute the average of (x_doubledot)^2 (ahat_doubledot) across all i and t
        gen    x_doubledot_sq    =    (x_doubledot)^2
       egen    ahat_doubledot    =    mean(x_doubledot_sq)
    
        * Compute the average of (x_singledot)^2 (ahat_singledot) across all i and t
        gen    x_singledot_sq    =    (x_singledot)^2
       egen    ahat_singledot    =    mean(x_singledot_sq)
    
    
    *    (4)    Compute q_hat (equation 3.16)
    
        gen    qhat    =    ((x_doubledot * uhat_iFE_resid) / ahat_doubledot)  ///
                       -    ((x_singledot * uhat_gFE_resid) / ahat_singledot)
    
    
    
    
    *    (5)    Obtain SE(b1hatiFE - b1hatgFE) by regressing qhat on 1 (cosntant), clustering at individual-level.
        gen    vector_1    =    1
        reg    qhat    vector_1, vce(cluster idcode)
        scalar    SE_delta    =    sqrt(e(V)[2,2]) // SE(b1hatiFE - b1hatgFE)
    
    * (6) Compute t-statistic
        scalar t = (b1hat_iFE - b1hat_gFE) / SE_delta
        scalar    list t    //    Show t-statistic computed


    I would like to ask two questions about this procedure.

    1. In step 5, how do I obtain SE(b1hatiFE - b1hatgFE) from regressing q_hat on 1? Is it same as the standard error of the coefficient on 1, as I did in the code above? I assume it is, based on what authors wrote in the previous section ("...and the cluster-robust variance-covariance matrix will be V1_hat"), but want to double-check this.

    2.
    Once I computed the final t-statistic, can I just interpret it as regular t-statistic reported in regression? For example, if my t-statistic is greater than 1.96, I can reject the null hypothesis that unit_FE estimator and group-FE estimator are the same at p=0.05, and stick to unit-FE estimator?


    Any comments are greatly appreciated.

    Thank you.
Working...
X