Hello,
I have a question about the standard error of the second stage estimate using the 2SLS regression, which is not exactly a Stata question, although I do use Stata to fit the model. For some reason, I first ran the 2SLS regression "manually", meaning I used the predicted endogenous treatment variable from the first stage as the actual treatment variable in the second stage. Normally, running the two stages separately will lead to an underestimated standard error of the estimate. However, when I re-ran the regression using ivreghdfe, a canned function for 2SLS regression, the standard error I got is actually smaller than the one I got from running the 2SLS regression manually. Both my endogenous treatment and outcome variables are binary, so the first stage is a linear probability regression, and the predicted endogenous treatment are technically "incorrect" (i.e., they are probabilities instead of 0 or 1). Therefore, the residuals of the second stage are larger if calculated them using the predicted endogenous treatment instead of the actual treatment (which are either 0 or 1). It would be greatly appreciated if someone could validate my interpretation of this finding. Thank you very much!
The codes I used to run the 2SLS manually are:
First stage: reghdfe i.binary_endogenous instrument $covarn, absorb(county year) vce(cluster state)
predict pr_binary_endogenous if e(sample)
Second stage: reghdfe binary_outcome pr_binary_endogenous $covarn, absorb(county year) vce(cluster state)
The codes I used to run the 2SLS using command ivreghdfe, a community contributed command for running 2SLS regression with many fixed effects, are:
ivreghdfe binary_outcome $covarn (i.binary_endogenous instrument = instrument), absorb(county year) cluster(state)
I ran all the codes in Stata18 MP version.
I have a question about the standard error of the second stage estimate using the 2SLS regression, which is not exactly a Stata question, although I do use Stata to fit the model. For some reason, I first ran the 2SLS regression "manually", meaning I used the predicted endogenous treatment variable from the first stage as the actual treatment variable in the second stage. Normally, running the two stages separately will lead to an underestimated standard error of the estimate. However, when I re-ran the regression using ivreghdfe, a canned function for 2SLS regression, the standard error I got is actually smaller than the one I got from running the 2SLS regression manually. Both my endogenous treatment and outcome variables are binary, so the first stage is a linear probability regression, and the predicted endogenous treatment are technically "incorrect" (i.e., they are probabilities instead of 0 or 1). Therefore, the residuals of the second stage are larger if calculated them using the predicted endogenous treatment instead of the actual treatment (which are either 0 or 1). It would be greatly appreciated if someone could validate my interpretation of this finding. Thank you very much!
The codes I used to run the 2SLS manually are:
First stage: reghdfe i.binary_endogenous instrument $covarn, absorb(county year) vce(cluster state)
predict pr_binary_endogenous if e(sample)
Second stage: reghdfe binary_outcome pr_binary_endogenous $covarn, absorb(county year) vce(cluster state)
The codes I used to run the 2SLS using command ivreghdfe, a community contributed command for running 2SLS regression with many fixed effects, are:
ivreghdfe binary_outcome $covarn (i.binary_endogenous instrument = instrument), absorb(county year) cluster(state)
I ran all the codes in Stata18 MP version.
Comment