I very much welcome the new absorb() option for the regress command introduced in Stata 18.5 (StataNow). However, this now creates problems further down the assembly line.
The regress postestimation help file states the following description for the scores option of the predict command:
However, this is no longer correct when variables have been absorbed. The scores produced here are incorrect. As a consequence, subsequent commands that require those scores will produce incorrect results as well. First and foremost, this is an issue for the suest command; see the following example:
The robust standard errors obtained from suest are now very different (and wrong) compared to the correct robust standard errors from regress. Without the absorb() option, the are virtually identical (aside from different degrees-of-freedom corrections).
Ideally, this should be fixed by computing the correct scores, which are the residuals after absorbing the respective variables. Currently, they are computed as y-xb, ignoring the absorbed variables.
The regress postestimation help file states the following description for the scores option of the predict command:
score is equivalent to residuals in linear regression.
Code:
. webuse psidextract . quietly regress lwage wks, absorb(id) . estimates store reg . suest reg, vce(cluster id) Cluster adjusted results for reg Number of obs = 4,165 (Std. err. adjusted for 595 clusters in id) ------------------------------------------------------------------------------ | Robust | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- mean | wks | .0010085 .0041811 0.24 0.809 -.0071864 .0092033 _cons | 6.629139 .1973423 33.59 0.000 6.242355 7.015923 -------------+---------------------------------------------------------------- lnvar | _cons | -2.696966 .1758439 -15.34 0.000 -3.041613 -2.352318 ------------------------------------------------------------------------------ . regress lwage wks, absorb(id) vce(cluster id) Linear regression, absorbing indicators Number of obs = 4,165 F(0, 594) = . Prob > F = . R-squared = 0.7287 Adj R-squared = 0.6835 Root MSE = .25963 (Std. err. adjusted for 595 clusters in id) ------------------------------------------------------------------------------ | Robust lwage | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- wks | .0010085 .0014418 0.70 0.485 -.0018231 .00384 _cons | 6.629139 .0674909 98.22 0.000 6.496589 6.761689 ------------------------------------------------------------------------------
Ideally, this should be fixed by computing the correct scores, which are the residuals after absorbing the respective variables. Currently, they are computed as y-xb, ignoring the absorbed variables.
Comment