Standard Errors of 2SLS

Mingyu Qi

Join Date: May 2020

Posts: 32
#1

Standard Errors of 2SLS

28 May 2024, 21:26

Hello,

I have a question about the standard error of the second stage estimate using the 2SLS regression, which is not exactly a Stata question, although I do use Stata to fit the model. For some reason, I first ran the 2SLS regression "manually", meaning I used the predicted endogenous treatment variable from the first stage as the actual treatment variable in the second stage. Normally, running the two stages separately will lead to an underestimated standard error of the estimate. However, when I re-ran the regression using ivreghdfe, a canned function for 2SLS regression, the standard error I got is actually smaller than the one I got from running the 2SLS regression manually. Both my endogenous treatment and outcome variables are binary, so the first stage is a linear probability regression, and the predicted endogenous treatment are technically "incorrect" (i.e., they are probabilities instead of 0 or 1). Therefore, the residuals of the second stage are larger if calculated them using the predicted endogenous treatment instead of the actual treatment (which are either 0 or 1). It would be greatly appreciated if someone could validate my interpretation of this finding. Thank you very much!

The codes I used to run the 2SLS manually are:

First stage: reghdfe i.binary_endogenous instrument $covarn, absorb(county year) vce(cluster state)
predict pr_binary_endogenous if e(sample)

Second stage: reghdfe binary_outcome pr_binary_endogenous $covarn, absorb(county year) vce(cluster state)

The codes I used to run the 2SLS using command ivreghdfe, a community contributed command for running 2SLS regression with many fixed effects, are:

ivreghdfe binary_outcome $covarn (i.binary_endogenous instrument = instrument), absorb(county year) cluster(state)

I ran all the codes in Stata18 MP version.
Tags: None
Mingyu Qi

Join Date: May 2020

Posts: 32
#2

29 May 2024, 10:30

Sorry, I noted that I made a mistake in my original post. When I said "the residuals of the second stage are larger if calculated them using the predicted endogenous treatment instead of the actual treatment (which are either 0 or 1)", I meant the variance of the residuals are larger.
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 634
#3

30 May 2024, 01:08

Hayashi (2000) specifies that when you run TSLS, you must account for the loss of degrees of freedom in estimating the first stage, so when you did it manually, you did not account for the fact that the predicted treatment was estimated. Stata does it automatically in its commands.
1 like
Comment
Mingyu Qi

Join Date: May 2020

Posts: 32
#4

30 May 2024, 10:17

Originally posted by Maxence Morlet View Post

Hayashi (2000) specifies that when you run TSLS, you must account for the loss of degrees of freedom in estimating the first stage, so when you did it manually, you did not account for the fact that the predicted treatment was estimated. Stata does it automatically in its commands.

Hi Maxence, thank you very much for your reply! I understand that the standard errors for the second stage estimates are wrong if I estimate the two stages separately. I'm just confused why the wrong standard errors I got are larger than the correct ones, as usually they should be smaller.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#5

30 May 2024, 17:53

Generally, it can go either way. But in this particular application, you can't justify generating instruments that depend on the estimated fixed effects when you are including lots of fixed effects. I'm not entirely sure how reghdfe computes predicted values, but if they include the estimated FEs, this is problematical and may be the reason why the incorrect standard errors are larger. You definitely want to stick with ivreghdfe.
1 like
Comment
Mingyu Qi

Join Date: May 2020

Posts: 32
#6

31 May 2024, 12:12

Originally posted by Jeff Wooldridge View Post

Generally, it can go either way. But in this particular application, you can't justify generating instruments that depend on the estimated fixed effects when you are including lots of fixed effects. I'm not entirely sure how reghdfe computes predicted values, but if they include the estimated FEs, this is problematical and may be the reason why the incorrect standard errors are larger. You definitely want to stick with ivreghdfe.

Thank you very much Prof Wooldridge!
Comment

Announcement

Standard Errors of 2SLS

Comment

Comment

Comment

Comment

Comment