I have a question about applying a two-stage negative binomial regression. My dependent variable is a count (the number of voluntary turnovers), and my main independent variables are two types of diversity measures, X1 and X2. We believe the relationship between diversity and voluntary turnover may be curvilinear, so we also include the squared terms (X1 squared and X2 squared) in the model.
We suspect that the diversity variables may be endogenous, so we plan to use a two-stage approach. We have instruments for these variables: Z1 for X1 and Z2 for X2.
Given this setup, which method would be preferred?
Thanks in advance for your help on this issue
We suspect that the diversity variables may be endogenous, so we plan to use a two-stage approach. We have instruments for these variables: Z1 for X1 and Z2 for X2.
Given this setup, which method would be preferred?
- Plug-in approach: First, predict values for X1, X1 squared, X2, and X2 squared using linear regression, and then include these predicted values in the second-stage negative binomial regression.
- Control function approach: First, predict X1 and X2 using linear regression, and then include the residuals from these first-stage regressions as additional controls in the second-stage model.
Thanks in advance for your help on this issue