Problem/bug with the new absorb() option in StataNow for regress: incorrect scores -> suest invalid

Sebastian Kripfganz

Join Date: May 2014
Posts: 2594

Problem/bug with the new absorb() option in StataNow for regress: incorrect scores -> suest invalid

03 Jul 2024, 06:41

I very much welcome the new absorb() option for the regress command introduced in Stata 18.5 (StataNow). However, this now creates problems further down the assembly line.

The regress postestimation help file states the following description for the scores option of the predict command:

score is equivalent to residuals in linear regression.

However, this is no longer correct when variables have been absorbed. The scores produced here are incorrect. As a consequence, subsequent commands that require those scores will produce incorrect results as well. First and foremost, this is an issue for the suest command; see the following example:

Code:

. webuse psidextract

. quietly regress lwage wks, absorb(id)

. estimates store reg

. suest reg, vce(cluster id)

Cluster adjusted results for reg                         Number of obs = 4,165

                                   (Std. err. adjusted for 595 clusters in id)
------------------------------------------------------------------------------
             |               Robust
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
mean         |
         wks |   .0010085   .0041811     0.24   0.809    -.0071864    .0092033
       _cons |   6.629139   .1973423    33.59   0.000     6.242355    7.015923
-------------+----------------------------------------------------------------
lnvar        |
       _cons |  -2.696966   .1758439   -15.34   0.000    -3.041613   -2.352318
------------------------------------------------------------------------------

. regress lwage wks, absorb(id) vce(cluster id)

Linear regression, absorbing indicators         Number of obs     =      4,165
                                                F(0, 594)         =          .
                                                Prob > F          =          .
                                                R-squared         =     0.7287
                                                Adj R-squared     =     0.6835
                                                Root MSE          =     .25963

                                   (Std. err. adjusted for 595 clusters in id)
------------------------------------------------------------------------------
             |               Robust
       lwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         wks |   .0010085   .0014418     0.70   0.485    -.0018231      .00384
       _cons |   6.629139   .0674909    98.22   0.000     6.496589    6.761689
------------------------------------------------------------------------------

The robust standard errors obtained from suest are now very different (and wrong) compared to the correct robust standard errors from regress. Without the absorb() option, the are virtually identical (aside from different degrees-of-freedom corrections).

Ideally, this should be fixed by computing the correct scores, which are the residuals after absorbing the respective variables. Currently, they are computed as y-xb, ignoring the absorbed variables.

https://www.kripfganz.de/stata/

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10195
#2

03 Jul 2024, 12:04

Originally posted by Sebastian Kripfganz View Post

I very much welcome the new absorb() option for the regress command introduced in Stata 18.5 (StataNow).

I think you are confusing the -absorb()- option introduced in xtreg with the undocumented -absorb()- option of regress. I have StataNow 18.5 and there is no documented -absorb()- option for regress.

[R] regress -- Linear regression
(View complete PDF manual entry)

Syntax

regress depvar [indepvars] [if] [in] [weight] [, options]

options Description
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Model
noconstant suppress constant term
hascons has user-supplied constant
tsscons compute total sum of squares with constant; seldom used

SE/Robust
vce(vcetype) vcetype may be ols, robust, cluster clustvarlist, bootstrap, jackknife, hc2 [clustvar], or hc3

Reporting
level(#) set confidence level; default is level(95)
beta report standardized beta coefficients
eform(string) report exponentiated coefficients and label as string
depname(varname) substitute dependent variable name; programmer's option
clustertable display table of multiway cluster combinations
display_options control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable
labeling

noheader suppress output header
notable suppress coefficient table
plus make table extendable
mse1 force mean squared error to 1
coeflegend display legend instead of statistics
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
indepvars may contain factor variables; see fvvarlist.
depvar and indepvars may contain time-series operators; see tsvarlist.
bayes, bootstrap, by, collect, fmm, fp, jackknife, mfp, mi estimate, nestreg, rolling, statsby, stepwise, and svy are allowed; see prefix. For more details, see
[BAYES] bayes: regress and [FMM] fmm: regress.
vce(bootstrap) and vce(jackknife) are not allowed with the mi estimate prefix.
Weights are not allowed with the bootstrap prefix.
aweights are not allowed with the jackknife prefix.
hascons, tsscons, vce(), beta, noheader, notable, plus, depname(), mse1, and weights are not allowed with the svy prefix.
aweights, fweights, iweights, and pweights are allowed; see weight.
noheader, notable, plus, mse1, and coeflegend do not appear in the dialog box.
See [R] regress postestimation for features available after estimation.

The undocumented -absorb()- option predates StataNow 18.5, and it effectively turns regress into areg (see #5 of this thread from 2022, for example: https://www.statalist.org/forums/for...r-svy-xtreg-fe). There are also other undocumented options of regress, e.g., syntax that allows it to estimate instrumental variables 2SLS regression. Had you used areg, it would have informed you that suest does not support areg.

Code:

sysuse auto, clear areg mpg weight, absorb(rep78) suest .

Res.:

Code:

. suest . areg is not supported by suest r(322);

I agree that if regress is allowed to work as areg, then it should behave the same with other post-estimation commands. However, I think that the Stata developers would argue that using undocumented options may have unintended consequences and that they cannot guarantee support for such options.

Last edited by Andrew Musau; 03 Jul 2024, 12:06.
1 like
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#3

03 Jul 2024, 12:18

You are absolutely right. I indeed confused xtreg, fe with regress. This happened while I was working on a new postestimation command that is supposed to work after both commands. Apparently, I got lost in the various help files I had open at the same time.

Since absorb() is undocumented for regress, we should indeed not expect that all aspects of it work as intended.

The reason why suest does not work after areg is that predict after areg does not have a scores option.

It would be possible to provide correct scores, but this is a different grumble.

https://www.kripfganz.de/stata/
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#4

04 Jul 2024, 08:09

Whether you absorb or not, reg and suest will provide different standard errors, even for a single model.

Code:

sysuse auto, clear eststo reg: reg price mpg weight foreign suest reg
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#5

04 Jul 2024, 10:24

Originally posted by George Ford View Post

Whether you absorb or not, reg and suest will provide different standard errors, even for a single model.

Code:

sysuse auto, clear eststo reg: reg price mpg weight foreign suest reg

That's simply because the standard errors from regress are not robust, while those from suest are. Even if you run regress with vce(robust), the standard errors will differ numerically in small samples due to the degrees-of-freedom correction, but they are asymptotically equivalent. In my initial example, the standard errors computed by suest are just wrong.

https://www.kripfganz.de/stata/
Comment

Announcement

Problem/bug with the new absorb() option in StataNow for regress: incorrect scores -> suest invalid

Comment

Comment

Comment

Comment