Dealing with heteroskedasticity with the regress command

Sarah Soendergaard

Join Date: Oct 2018
Posts: 4

Dealing with heteroskedasticity with the regress command

17 Oct 2018, 05:25

Dear All

I have been researching extensively (in here, the web and text books) the last three weeks to come up with a sound solution to the problem described below - and as a result have become more informed, but also increasingly confused as there seem to be many different solution (and perspectives on how to handle the problem) and any given solution seems to be specific to a certain situation/data set. Thus, I hope you guys can help me get on the right path here ;o)

I have a data (cross sectional) set (n ~200), which I would like to analyse using the regress command. However, when I check model assumptions heteroskedasticity appears (as a consequence of differences between genders) cf. Stata paste-in I.

Thus, I need to account for the heteroskedasticity somehow.

My research tell me, that several solution are available:

- using a model less sensitive to / taking into account heteroskedasticity

- weighting of independent variables

- transformation

I would prefer the first option, as the latter two appear to influence the data set (more or less).

In relation to the first option, I have looked into the hetregress command, as described here:

https://www.stata.com/new-in-stata/h...ar-regression/

https://www.stata.com/manuals/rhetregress.pdf

Consequently, I have tried to run that hetregress model (cf. Stata paste-in II), but I am uncertain how to check, whether using this model reduces or eliminate the effect of heteroskedasticity. The Stata manual refers to the Wald test for test of heteroskedasticity, but does not contain info in relation to interpretation (my take is that heteroskedasticity is still present).

It would be greatly appreciated, if one could tell me how to interpretate the Wald test and/or give me some hints to other (better) solutions to handle this problem (heteroskedasticity).

Thanks in advance.

Best, Sarah

Code:

Stata paste-in I

. regress Measurement gender Team

      Source |       SS           df       MS      Number of obs   =       412
-------------+----------------------------------   F(2, 409)       =    189.32
       Model |  723251.953         2  361625.977   Prob > F        =    0.0000
    Residual |  781261.969       409  1910.17596   R-squared       =    0.4807
-------------+----------------------------------   Adj R-squared   =    0.4782
       Total |  1504513.92       411  3660.61782   Root MSE        =    43.706

------------------------------------------------------------------------------
 Measurement |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      gender |  -83.87553   4.310553   -19.46   0.000    -92.34913   -75.40193
        Team |  -.4511106   4.306488    -0.10   0.917    -8.916722    8.014501
       _cons |   312.0256   9.319073    33.48   0.000     293.7063    330.3448
------------------------------------------------------------------------------

. estat hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
         Ho: Constant variance
         Variables: fitted values of Measurement

         chi2(1)      =    39.43
         Prob > chi2  =   0.0000

Code:

Stata paste-in II

. hetregress Measurement gender Team,  het(i.gender) twostep

Heteroskedastic linear regression               Number of obs     =        412
Two-step GLS estimation
                                                Wald chi2(2)      =     397.32
                                                Prob > chi2       =     0.0000

------------------------------------------------------------------------------
 Measurement |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Measurement  |
      gender |  -83.87537   4.207937   -19.93   0.000    -92.12277   -75.62797
        Team |   .3000442   3.878994     0.08   0.938    -7.302643    7.902732
       _cons |   310.9004   9.397964    33.08   0.000     292.4807    329.3201
-------------+----------------------------------------------------------------
lnsigma2     |
    2.gender |  -.9054411   .2190943    -4.13   0.000    -1.334858   -.4760242
       _cons |   7.908204    .151501    52.20   0.000     7.611267     8.20514
------------------------------------------------------------------------------
Wald test of lnsigma2=0: chi2(1) = 17.08                  Prob > chi2 = 0.0000

Last edited by Sarah Soendergaard; 17 Oct 2018, 06:00.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

17 Oct 2018, 08:48

The use of -hetregress- does not eliminate heteroscedasticity. Rather, it fits a model that does not require homoscedasticity as an assumption. It is a different regression model altogether with a different assumption about the distribution of the residuals.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#3

17 Oct 2018, 10:01

Sarah:
welcome to this forum.
As an aside to Clyde's helpful reply, you may want to consider logging the regressand of your OLS and check whether the model still suffers from heteroskedasticity.
Another recipe would consider including other predictors, if available and check as above.
If these fixes do not change the situation, you can invoke -robust- standard errors.
By the way, you do not say if the -estat ovtest- has been performed and what outcome gave you back: an uncorrect model specification is far more serious than heteroskedasticity.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Sarah Soendergaard

Join Date: Oct 2018

Posts: 4
#4

18 Oct 2018, 01:30

Dear Clyde and Carlo

Thanks for the fast replies - it is really appreciated and helpful.

Clyde - just to be entirely sure - are there any severe functional differences between the regress and hetregress commands that you think I need to be aware of (other than the hetregress not requiring homoscedasticity)? In other words is the hetregress more a less a regress model without the need for homocedasticity? Fx. are there other model assumptions etc.. I have read the information regarding the model provided by STATA, and have not come across anything which have caught my eye (but that may come down to my somewhat limited experience with statistical linguistics).

Carlo - If I have to be honest, I am not really sure I understand your proposal correctly. You want me to log transform the dependent variable? If my interpretation is correct I have to admit that I am hesitant about such a approach, as I have come across a few statisticians suggesting to refrain from transformation (last resort), as you 'tamper' with the data set and it is hard to track what the transformation does to the data set (at least as a rookie), which will reflect on the model output (i.e. how do I interpret model output from a data set that I do not have a thorough understanding of) + you need to back-transform the data.
(Update! it does seem to solve the issue though).

Another thing comes to mind. You seem to suggest to make the regress model work although the data is heteroscedastic. Would that be preferable to the use of the hetregress model? And if so why would that be? Again the STATA manual offers little information in that regard.

I have considered the regress robust commands - VCE(variable name), but my model seem to remain heteroscedasticity after this approach - however the question here is then if that is a problem, or not (as the robust command corrects the standard errors accordingly)? It would really helpfull if the STATA manuals were a bit more explicit in that regard.

Thanks for the estat ovtest suggestion - it comes out negative (non-significant).

Last edited by Sarah Soendergaard; 18 Oct 2018, 01:44.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#5

18 Oct 2018, 01:50

Sarah:
- natural log transformation is only an option, not mandatory at all. It brings abouit some issues concerning back-transformation to the raw scale that are well covered in the literature and creep up on this forum too from time to time. Among its pros for -regress-, the contribution of each predictor to explain variation of the regessand in percentage terms is worth mentioning, especially in quantitative fields such as economics.
As far as heteroskedastciity is concerned, I would stay with -regress- with -robust- standard error, which takes heteroskedasticity it into account (ie, it is no more a problem).
You cannot run -estat hettest- after invoking robust, because the outcome would be uninformative.
Happy with reading the -estat ovtest- returned no evidence for the need of a different specification.

Kind regards,
Carlo
(Stata 19.0)
Comment

Sarah Soendergaard

Join Date: Oct 2018
Posts: 4

18 Oct 2018, 02:45

Thanks, Carlo.

It really does a mayor difference to have confirmation from a experienced fellow, like you!

Just wish I could pay you back somehow.

The results looks like this - would that be the correct code?

Code:

. regress Measurement gender Team, vce(robust)

Linear regression                               Number of obs     =        412
                                                F(2, 409)         =     199.29
                                                Prob > F          =     0.0000
                                                R-squared         =     0.4807
                                                Root MSE          =     43.706

------------------------------------------------------------------------------
             |               Robust
 Measurement |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      gender |  -83.87553   4.227506   -19.84   0.000    -92.18588   -75.56518
        Team |  -.4511106   4.304205    -0.10   0.917    -8.912234    8.010013
       _cons |   312.0256   10.47784    29.78   0.000     291.4284    332.6227
------------------------------------------------------------------------------

Last edited by Sarah Soendergaard; 18 Oct 2018, 03:13.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#7

18 Oct 2018, 06:18

Sarah:
the code is correct.
You could also have written with the same effect:

Code:

regress Measurement gender Team, robust

As an aside, brackets are more useful for -cluster- option which requires specifying the variable the satnadrd errors should be clustered on (please note that this stuff is only informative, since has nothing to do with what your case).

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

18 Oct 2018, 08:21

Re #4: The -hetregress- command has an alternative assumption of its own about residual variance. Instead of assuming that the residual variance is constant, it assumes that the residual variance is an exponential function of the variable you specify in the -het()- option. Now, in your situation, the variable you are specifying is just dichotomous, so that assumption is trivially satisfied. Were you, however, dealing with heteroscedasticity driven by a continuous variable, this would be a very strong assumption that might or might not be satisfied.
1 like
Comment
Sarah Soendergaard

Join Date: Oct 2018

Posts: 4
#9

25 Oct 2018, 02:06

Sorry - to not respond on your later posts. I got caught up in proceeding with the stats.

Thanks for Carlo and Clyde - I very much appreciate you effort in helping me here!
Comment

Announcement

Dealing with heteroskedasticity with the regress command

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment