Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with heteroskedasticity with the regress command


    Dear All

    I have been researching extensively (in here, the web and text books) the last three weeks to come up with a sound solution to the problem described below - and as a result have become more informed, but also increasingly confused as there seem to be many different solution (and perspectives on how to handle the problem) and any given solution seems to be specific to a certain situation/data set. Thus, I hope you guys can help me get on the right path here ;o)

    I have a data (cross sectional) set (n ~200), which I would like to analyse using the regress command. However, when I check model assumptions heteroskedasticity appears (as a consequence of differences between genders) cf. Stata paste-in I.

    Thus, I need to account for the heteroskedasticity somehow.


    My research tell me, that several solution are available:

    - using a model less sensitive to / taking into account heteroskedasticity

    - weighting of independent variables

    - transformation


    I would prefer the first option, as the latter two appear to influence the data set (more or less).

    In relation to the first option, I have looked into the hetregress command, as described here:

    https://www.stata.com/new-in-stata/h...ar-regression/

    https://www.stata.com/manuals/rhetregress.pdf


    Consequently, I have tried to run that hetregress model (cf. Stata paste-in II), but I am uncertain how to check, whether using this model reduces or eliminate the effect of heteroskedasticity. The Stata manual refers to the Wald test for test of heteroskedasticity, but does not contain info in relation to interpretation (my take is that heteroskedasticity is still present).


    It would be greatly appreciated, if one could tell me how to interpretate the Wald test and/or give me some hints to other (better) solutions to handle this problem (heteroskedasticity).

    Thanks in advance.

    Best, Sarah



    Code:
    Stata paste-in I
    
    . regress Measurement gender Team
    
          Source |       SS           df       MS      Number of obs   =       412
    -------------+----------------------------------   F(2, 409)       =    189.32
           Model |  723251.953         2  361625.977   Prob > F        =    0.0000
        Residual |  781261.969       409  1910.17596   R-squared       =    0.4807
    -------------+----------------------------------   Adj R-squared   =    0.4782
           Total |  1504513.92       411  3660.61782   Root MSE        =    43.706
    
    ------------------------------------------------------------------------------
     Measurement |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          gender |  -83.87553   4.310553   -19.46   0.000    -92.34913   -75.40193
            Team |  -.4511106   4.306488    -0.10   0.917    -8.916722    8.014501
           _cons |   312.0256   9.319073    33.48   0.000     293.7063    330.3448
    ------------------------------------------------------------------------------
    
    . estat hettest
    
    Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
             Ho: Constant variance
             Variables: fitted values of Measurement
    
             chi2(1)      =    39.43
             Prob > chi2  =   0.0000
    Code:
    Stata paste-in II
    
    . hetregress Measurement gender Team,  het(i.gender) twostep
    
    Heteroskedastic linear regression               Number of obs     =        412
    Two-step GLS estimation
                                                    Wald chi2(2)      =     397.32
                                                    Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
     Measurement |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Measurement  |
          gender |  -83.87537   4.207937   -19.93   0.000    -92.12277   -75.62797
            Team |   .3000442   3.878994     0.08   0.938    -7.302643    7.902732
           _cons |   310.9004   9.397964    33.08   0.000     292.4807    329.3201
    -------------+----------------------------------------------------------------
    lnsigma2     |
        2.gender |  -.9054411   .2190943    -4.13   0.000    -1.334858   -.4760242
           _cons |   7.908204    .151501    52.20   0.000     7.611267     8.20514
    ------------------------------------------------------------------------------
    Wald test of lnsigma2=0: chi2(1) = 17.08                  Prob > chi2 = 0.0000
    Last edited by Sarah Soendergaard; 17 Oct 2018, 06:00.

  • #2
    The use of -hetregress- does not eliminate heteroscedasticity. Rather, it fits a model that does not require homoscedasticity as an assumption. It is a different regression model altogether with a different assumption about the distribution of the residuals.

    Comment


    • #3
      Sarah:
      welcome to this forum.
      As an aside to Clyde's helpful reply, you may want to consider logging the regressand of your OLS and check whether the model still suffers from heteroskedasticity.
      Another recipe would consider including other predictors, if available and check as above.
      If these fixes do not change the situation, you can invoke -robust- standard errors.
      By the way, you do not say if the -estat ovtest- has been performed and what outcome gave you back: an uncorrect model specification is far more serious than heteroskedasticity.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4

        Dear Clyde and Carlo

        Thanks for the fast replies - it is really appreciated and helpful.

        Clyde - just to be entirely sure - are there any severe functional differences between the regress and hetregress commands that you think I need to be aware of (other than the hetregress not requiring homoscedasticity)? In other words is the hetregress more a less a regress model without the need for homocedasticity? Fx. are there other model assumptions etc.. I have read the information regarding the model provided by STATA, and have not come across anything which have caught my eye (but that may come down to my somewhat limited experience with statistical linguistics).

        Carlo - If I have to be honest, I am not really sure I understand your proposal correctly. You want me to log transform the dependent variable? If my interpretation is correct I have to admit that I am hesitant about such a approach, as I have come across a few statisticians suggesting to refrain from transformation (last resort), as you 'tamper' with the data set and it is hard to track what the transformation does to the data set (at least as a rookie), which will reflect on the model output (i.e. how do I interpret model output from a data set that I do not have a thorough understanding of) + you need to back-transform the data.
        (Update! it does seem to solve the issue though).

        Another thing comes to mind. You seem to suggest to make the regress model work although the data is heteroscedastic. Would that be preferable to the use of the hetregress model? And if so why would that be? Again the STATA manual offers little information in that regard.

        I have considered the regress robust commands - VCE(variable name), but my model seem to remain heteroscedasticity after this approach - however the question here is then if that is a problem, or not (as the robust command corrects the standard errors accordingly)? It would really helpfull if the STATA manuals were a bit more explicit in that regard.

        Thanks for the estat ovtest suggestion - it comes out negative (non-significant).
        Last edited by Sarah Soendergaard; 18 Oct 2018, 01:44.

        Comment


        • #5
          Sarah:
          - natural log transformation is only an option, not mandatory at all. It brings abouit some issues concerning back-transformation to the raw scale that are well covered in the literature and creep up on this forum too from time to time. Among its pros for -regress-, the contribution of each predictor to explain variation of the regessand in percentage terms is worth mentioning, especially in quantitative fields such as economics.
          As far as heteroskedastciity is concerned, I would stay with -regress- with -robust- standard error, which takes heteroskedasticity it into account (ie, it is no more a problem).
          You cannot run -estat hettest- after invoking robust, because the outcome would be uninformative.
          Happy with reading the -estat ovtest- returned no evidence for the need of a different specification.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thanks, Carlo.

            It really does a mayor difference to have confirmation from a experienced fellow, like you!

            Just wish I could pay you back somehow.


            The results looks like this - would that be the correct code?

            Code:
            . regress Measurement gender Team, vce(robust)
            
            Linear regression                               Number of obs     =        412
                                                            F(2, 409)         =     199.29
                                                            Prob > F          =     0.0000
                                                            R-squared         =     0.4807
                                                            Root MSE          =     43.706
            
            ------------------------------------------------------------------------------
                         |               Robust
             Measurement |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                  gender |  -83.87553   4.227506   -19.84   0.000    -92.18588   -75.56518
                    Team |  -.4511106   4.304205    -0.10   0.917    -8.912234    8.010013
                   _cons |   312.0256   10.47784    29.78   0.000     291.4284    332.6227
            ------------------------------------------------------------------------------
            Last edited by Sarah Soendergaard; 18 Oct 2018, 03:13.

            Comment


            • #7
              Sarah:
              the code is correct.
              You could also have written with the same effect:
              Code:
              regress Measurement gender Team, robust
              As an aside, brackets are more useful for -cluster- option which requires specifying the variable the satnadrd errors should be clustered on (please note that this stuff is only informative, since has nothing to do with what your case).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Re #4: The -hetregress- command has an alternative assumption of its own about residual variance. Instead of assuming that the residual variance is constant, it assumes that the residual variance is an exponential function of the variable you specify in the -het()- option. Now, in your situation, the variable you are specifying is just dichotomous, so that assumption is trivially satisfied. Were you, however, dealing with heteroscedasticity driven by a continuous variable, this would be a very strong assumption that might or might not be satisfied.

                Comment


                • #9
                  Sorry - to not respond on your later posts. I got caught up in proceeding with the stats.

                  Thanks for Carlo and Clyde - I very much appreciate you effort in helping me here!

                  Comment

                  Working...
                  X