Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A question Regarding Two-stage residual Inclusion method

    I am conducting a discrete-time survival analysis and I am suspecting an endogeneity problem. To address it, I am using the 2-stage residual inclusion method since my exposure is binary. Therefore, to conduct the first stage, I am predicting the probability of receiving treatment X using a logistic regression model. My question is : which residual should I use. I searched the web and Dr. Hilbe's logistic regression book and there is no straight answer. In Stata,there are multiple options after logit post-estimation including:

    dev Deviance residual
    res Pearson residuals; adjusted for number sharing covariate pattern

    rs Standardized Pearson residuals; adjusted for number sharing covariate pattern



    In addition, I found other ways on the web that used the probability of a positive outcome and the linear prediction to calculate the residuals as follow:

    xb linear prediction​
    pr probability of a positive outcome

    Logit X Z C1 C2

    Predict xb, xb

    gen expxb=exp(xb)

    gen resid1=X-expxb

    Predict pr, pr

    gen resid2=X-pr

    gen resid3=1-pr






    Would you please direct me toward the right choice of residuals that I should include in the 2nd stage. I highly appreciate your help.

  • #2
    There's not a lot of guidance yet on the choice of residuals in two-stage residual inclusion models (also referred to more broadly as control function models). If the residuals in your outcome equation are symmetrical, the response residual is likely a safe choice (see Terza 2008; O'Malley et al. 2011). Also, if you have a large enough sample, the response residual should lead to a consistent estimate.

    However, O'Malley et al. have shown that response residuals in 2SRI can lead to estimates that are less efficient than what you'd get from a two-stage least squares model if the residuals are asymmetrical and if n is not sufficiently large. In smaller samples, if residuals from the outcome equation are highly skewed, you might get vastly different estimates depending on your choice of residual (Garrido et al. 2012).



    Garrido, M.M., Deb P., Burgess J.F., Penrod J.D. 2012. Choosing models for cost analyses: Issues of nonlinearity and endogeneity. Health Services Research 47(6): 2377-2397.

    O’Malley, A.J., Frank, R.G., Normand, S.-L. T. 2011. Estimating cost-offsets of new medications: Use of new antipsychotics and mental health costs for schizophrenia. Statistics in Medicine 30, 1971–88.

    Terza, J. 1998. Estimating count data models with endogenous switching: Sample selection and endogenous treatment effects. Journal of Econometrics 84, 129-154.

    Hope this helps!
    Melissa

    Comment


    • #3
      Thank you Melissa for replying. I still have few questions. When you said " If the residuals in your outcome equation are symmetrical, the response residual is likely a safe choice" I assume you meant residuals calculated when I ignore the endogeneity problem and run the regular cox-regression model? . The residuals calculated from this model is highly skewed to right, however, my sample is fairly large (around 70,000 patients). Is it still safe to go with the response residuals?

      Comment


      • #4
        Correct, I was referring to the residuals from your regular outcome model. Your sample size may still be sensitive to choice of residuals (see the sample sizes used in the O'Malley et al. citation I provided) - one suggestion would be to run your model with a couple different choices of residuals (and perhaps higher-order terms of your residuals) to understand how sensitive your results are to residual choice.

        Comment


        • #5
          Good suggestion. Thank you very much

          Comment

          Working...
          X