Residual Regression

Tim Sulls

Join Date: May 2022

Posts: 25
#1

Residual Regression

01 Dec 2023, 13:29

I am trying to understand how much of the variation between martial status and race is explained by wage and what is left over after accounting for wage. I thought a residual analysis is best for this exercise but it is not working or I am implementing it incorrectly. After looking at the output,. Every p-value is 1.000.

Code:

sysuse nlsw88.dta, clear regress married i.race c.wage predict residuals, residuals regress residuals i.race
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

01 Dec 2023, 13:45

In a least squares regression, which is what you are doing here, the residuals are always independent of the predictor variables--that is a mathematical consequence of the least-squares estimation process. So any time you do

Code:

regress y x1 x2 predict residuals, residuals regress residuals x1 // OR regress residuals x2

the coefficients of the final regression will be zero (or very close to it with minor rounding error) and R² will be 0. The p-value of 1.0 is an automatic mathematical consequence of that.

I'm not entirely sure what you mean when you say

I am trying to understand how much of the variation between martial status and race is explained by wage and what is left over after accounting for wage.

but I think you probably want to do this:

Code:

regress married c.wage predict residuals, residuals regress residuals i.race
1 like
Comment
Tim Sulls

Join Date: May 2022

Posts: 25
#3

01 Dec 2023, 14:32

Thanks very helpful. How would we interpret the coefficient on race in the last regression versus the following:

Code:

regress married i.race c.wage
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

01 Dec 2023, 14:50

They are very different things, conceptually, although in this particular data set they come out nearly the same.

Using the three-step approach in #2, you are first calculating a variable, residuals, which represents the marriage variable with the part that is correlated with wage completely removed, and then you are examining in the final regression the differences among the racial groups in this marriage purged of all wage-related variation residual.

In single regression of married on both variables, you are estimating contributions of race and wage jointly to the marriage variable.

In many situations, race and wage would be substantially correlated to each other, so their contributions to married in the single regression would overlap a great deal. Consequently the effect of wage estimated in the single regression would be rather different from what was estimated in the first regression of the three-step approach, because in the latter, wage "got credit" for whatever variance was shared between race and wage in addition to whatever race-unrelated contribution wage makes to married.

In this particular data, however, if we -regress wage i.race-, we get R² = 0.0091, which tells us that race and wage only share a little bit of common variance. So wage didn't get very much "undeserved" credit in the first step of the three-step method, and the results come out nearly the same either way.
Comment
Tim Sulls

Join Date: May 2022

Posts: 25
#5

01 Dec 2023, 15:09

This is so helpful for someone learning this process. If I wanted to decompse these effects, is there another way you would recommend doing it?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#6

01 Dec 2023, 15:47

Well, one way that I use for understanding the contributions of multiple predictors to an outcome variable is calculating "variance when entered last." The code would go something like this:

Code:

regress married c.wage i.race local r2_both = e(r2) regress married c.wage local r2_remove_race = e(r2) regress married i.race local r2_remove_wage = e(r2) display "Increase of R2 adding race last = " %05.3f =`r2_remove_race' display "Increase of R2 adding wage last = " %05.3f =`r2_remove_wage'

The idea is that the exclusive contribution of a predictor to an outcome variable is the amount by which R² increases when the variable is added to a regression that already contains the other variables. That is the amount of additional variance that variable can explain in the face of all the other variables being present to "claim their share of variance that overlaps with that variable." If you apply the above code to the nlsw88.dta, you will see that wage's independent contribution to the variance of married is much, much greater than that of race in this data.
Comment

Announcement

Residual Regression

Comment

Comment

Comment

Comment

Comment