Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing the OLS assumptions in relation to standard robust errors

    Dear all,

    I'm testing the OLS assumptions for a multiple cross-sectional regression model.

    In relation to this, I would like to ask if one first have to test all the assumptions before running eventual standard robust errors-model (if heteroscedasticity is a problem).

    So, should I first test for homoscedasticity and if heteroscedasticity is present then use standard robust errors. Should I then test for linearity and normality on the standard robust errors-model? Or should all OLS assumptions tests be carried out on the initial model?

    Also, I would like to ask if somebody knows the code in stata for finding the 1 percent percentile and 99 percent percentile thresholds? (in relation to deleting outliers)

    Lastly, in relation to the constant variance (homoscedasticity) and linearity assumptions, should I then apply just normal redisuals, standardized or studentized residuals?

    Thanks in advance!

    Best,
    Anders

  • #2
    Hi Anders,

    Welcome to Statalist!

    UCLA has a chapter walking you through how to test all of the OLS assumptions in Stata here

    In particular, they state:
    This chapter will explore how you can use Stata to check on how well your data meet the assumptions of OLS regression. In particular, we will consider the following assumptions.
    • Linearity – the relationships between the predictors and the outcome variable should be linear
    • Normality – the errors should be normally distributed – technically normality is necessary only for hypothesis tests to be valid, estimation of the coefficients only requires that the errors be identically and independently distributed
    • Homogeneity of variance (homoscedasticity) – the error variance should be constant
    • Independence – the errors associated with one observation are not correlated with the errors of any other observation
    • Errors in variables – predictor variables are measured without error (we will cover this in Chapter 4)
    • Model specification – the model should be properly specified (including all relevant variables, and excluding irrelevant variables)
    Regarding standard vs robust standard errors: Do the tests, but I always assume that I will be using robust standard errors in the end. Though its not foolproof (you always want to plot the residuals), run the model both ways and see how much the standard errors change.

    Regarding finding the 99th and 1 percent thresholds:

    Code:
    summarize var1, detail
    return list  // when you run summarize, Stata saves a number of these values into memory
    // 99th percentile is r(p99), 1st percentile is r(p1).
    
    tabstat var1, stat(n mean median p1 p99)

    Comment


    • #3
      Anders:
      welcome to this forum.
      The first check that I usully perform is -estat ovtest-, as its outcome can reveal the need for squared terms and save your time before embarking on a misspecified model (which is by far worse than an heteroskedastic residual distribution, that you can check visually and/or via -estat hettest- and then fix applying -robust- if necessary).
      See -help pctile- and related stuff for percentiles.

      PS: crossed in the cyberspace with David's excellent advice.
      Last edited by Carlo Lazzaro; 27 Nov 2018, 10:29.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Dear David,

        Thank you very much for your answer and the warm welcome. And thanks for forwarding this guide.

        If heteroscedasticity is a problem, and I do robust standard errors, should I then change so I test for linearity, normality, independence etc. on the standard robust errors model? Or should these still be carried out on the initial model (i.e. non-robust standard errors if heteroscedasticity is a problem)? Because in my opinion I think all the tests should be carried out on the initial/standard model (where standard robust errors are not applied if violation of homogenity).

        And what is the difference between studentized, standardized and ''just normal'' residuals? In relation to the kdensity plot and the rvfplot... When should I apply just residuals, standardized residuals and studentized residuals?

        The code in stata for the percentage thresholds helped me a lot! Thanks.

        Best,
        Anders

        Comment


        • #5
          Dear Carlo,

          Thanks a lot for your answer too.

          That is a good point, which eventually saves me a lot of time.

          If I apply a step wise approach where I add one more explanatory variable for each regression, should I then also use the ovtest and test for the general OLS assumptions on each regression? I do this, as my data set reduces in sample numbers when adding explanatory variables (from 1750 to 1730).

          Best,
          Anders

          Comment


          • #6
            Anders:
            another "save your time" recipe when engaged in regression, is to stay away from -stepwise- procedures (see, if interested: https://www.stata.com/support/faqs/s...sion-problems/).
            The best approach is to give a fair and true view of the data generating process: the literature in your research field can help you out in this respect.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Anders Svejgaard

              If heteroscedasticity is a problem, and I do robust standard errors, should I then change so I test for linearity, normality, independence etc. on the standard robust errors model? Or should these still be carried out on the initial model (i.e. non-robust standard errors if heteroscedasticity is a problem)?
              You could run the checks with "standard" standard errors. That won't affect the linearity of the model, nor will it test for the independence of the standard errors. Others may disagree, but I generally worry about non-independence when working with panel data or datasets with multiple observations of some group (schools, firms, families, countries, etc). (NOTE: this can include cross-sectional data if, for instance, you have survey responses from individuals within the same family or same school).


              Code:
              And what is the difference between studentized, standardized and ''just normal'' residuals?
              Studentized residuals are just a form of standardized residuals (standardizing means converting them to Z-scores or dividing by the std deviation). Regarding your question about which to use with kdensity plot and the rvfplot - I don't know I never use them . I far prefer to use "graph matrix var1 var2 var3" to check my models. I also prefer to use leverage (lvr2plot) and DFBETA to check for influential outliers.
              Last edited by David Benson; 28 Nov 2018, 13:35.

              Comment

              Working...
              X