Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Test of linearity for multiple regression

    Hello,

    I am running STATA/IC 16.1 on mac/os Monterey. Both STATA and Monterey were updated within the past few hours. I am having a separate issue where STATA will not create graphs (not a scatterplot, histogram, nothing). I have emailed tech support about this issue, but, in the meantime, I need to continue analysis for my dissertation.

    I ran a regression model w/ controls and want to test assumptions. I found the Shapiro-Wilk test for normal data, ran an ovtest for omitted variables, a hettest for heteroscedasticity, and a linktest for specification error. So far...so good. Absent a scatterplot, I am struggling to test for linearity & outliers.

    Any suggestions that work around my graph glitch are super appreciated. I would be grateful if you could also include a brief description of how to interpret the test output.

    Thanks,
    Allison

  • #2
    Allison:
    while waiting for Stata tech support reply, please note:
    1) in regression normality is a (weak requirements) for residual distribution only;
    2) linearity relates to coefficients (and not to predictors). Hence, to explore whether a given predictor has a non-linear relationship with your regeressand, just include its linear and squared terms in the right-hand seide of your regression equation (example for the hypothetical predictor -age-):
    Code:
    c.age##.age
    ;
    3) oftentimes outliers are simply observations that we struggle to classify according to our previous experience (if any). However, unless you're 100% sure that they are the offspring of a mistaken data entry, they may well be legal sample realization of the data generating process you're investigating. Just to give you an (hopefully useful) idea, in health economics the cost distribution of a given health cae programme follows a gamma distribution, which is positively skewed because theere is a fraction of patients who due to a very severe disease and/or costly adverse events consumes a great amount of healthcare resources; that said, we would never consider them as outliers but simply a matter of fact.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Carlo,
      Thanks for your help. I did know about the normality for residual only, but at this point, all reminders are appreciated (my brain is a little mushy). Your healthcare example was also extremely helpful. I'm looking at a racial-ethnic socialization measure, so it applies quite well. Here's to hoping that the rest of my issues are figured out so I can finish this section.

      Thank you again for your helpful response.

      Allison

      Comment


      • #4
        Allison: I wouldn't test for most of those things. If you have a small sample size then you can't appeal to asymptotic analysis, and all normality tests assume a reasonably large sample size. If you have a large sample size then you don't need normality because the central limit kicks in. Same is true of heteroskedasticity. If you have a large sample size then vce(robust) or vce(hce3) produces valid standard errors.

        The ovtest is not an omitted variables test. Stata is, unfortunately, misleading in this regard. ovtest is RESET, and it's a test to find neglected nonlinearities -- so you have already carried out a "linearity" test. The only good test for omitted variables is to find a proxy for the omitted variable and include it. For example, if I'm worried of omitted "ability" in a wage equation I can include a standardized test score as a proxy. I might wind up leaving that proxy in the equation. The RESET puts in nonlinear functions of variables already in the model and therefore does not act as a good proxy. The linktest is a different test for nonlinearities in the conditional mean.

        If you ever reject using ovtest or linktest then, as Carlo said, you should try squares and interactions of key variables. Or, if y is binary, count, or has other special restrictions, try a nonlinear model.

        Comment


        • #5
          Thank you both for your help. I ended up downloading collin and running a regcheck command which gave me the attached output. I also ran kdensity on the residuals and am attaching that as well. The only issue that regcheck found was in normality of the residuals, as you can see. As Carlo said, it's a pretty weak requirement, and I'm not hypothesis testing at this point- just examining data.

          I truly appreciate the support!

          Allison

          Click image for larger version

Name:	Screen Shot 2022-01-19 at 6.45.29 PM.png
Views:	2
Size:	54.7 KB
ID:	1645888

          Click image for larger version

Name:	Screen Shot 2022-01-19 at 6.48.23 PM.png
Views:	1
Size:	126.2 KB
ID:	1645886

          Attached Files
          Last edited by Allison Kimble; 19 Jan 2022, 18:00. Reason: edited to remove duplicate screen shots

          Comment


          • #6
            Allison:
            no worries at all, then.
            Quoting Gary Koop's textbook https://www.wiley.com/en-gb/Introduc...-9780470032701 (admittedly, I judged [and purchased] the book by its cover ), page 72 "...OLS is BLUE even if errors are not normal".
            Therefore, the violation of the normality requirement in epsilon distribution still makes OLS Best Linear Unbiased Estimator (and us happy; trivial pun intended).
            As an aside, in your future posts please do not report screenshots (or similar), but use CODE delimiters instead. Thanks.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              I would rather see a residuals versus fitted plot or an observed versus fitted plot as an overall graphical reduction of the model. Added variable plots are also sadly neglected.

              If as a matter of routine you want to look at the distribution of residuals, qnorm in my view yields a better plot.

              Comment


              • #8
                Thank you both again. I apologize for my technical faux-pas and will post the code next time. It would probably have been easier as well! Nick, the reason I didn't plot the residuals was because of a technical glitch that meant I couldn't generate a single graph, but I still needed to push forward with my work. Fortunately, it's been resolved and I re-ran my code to double check.

                Truly thankful for the hive mind that is STATAlist.

                Allison

                Comment


                • #9
                  Good to hear you made progress, but this really isn't STATAlist. https://www.statalist.org/forums/help#spelling

                  Comment

                  Working...
                  X