Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • robust ANOVA for non-normal data

    Dear Statalist friends,

    I am unexperienced with most statistical methods and Stata and I need some help. I want to examine if scores on a questionnaire X differ between three pre-specified age groups and if the effect of age groups depends on weight status. I wanted to test this using a two-way ANOVA, but my data is non-normal. I then came across the possibility of performing a "robust ANOVA" but I could not find a command for that in Stata. The only command I could find was "anovalator" but it is not an official command to my understanding. Is there another command I could use? Or can I use "regress X i.Agegroup##1.Weightstatus, ice(robust)" ? would that be the equivalent? Any help is highly appreciated. As I am specifically interested in the interaction effect, I can not use a Kruskal Wallis and I am not aware of any other non-parametric tests that allows for interaction effects.

    Many thanks!

  • #2
    Laura:
    as oftentimes reported on this list there's nothing that -regress-cannot do better than -anova-.
    Moreover, norality is a (weak) requiìrement for residual distribution only.
    That said, you were probably meaning:
    Code:
    regress X i.Agegroup##i.Weightstatus, vce(robust)
    As an aside, please note that the -robust- option deals with heteroskedasticity only: be sure that this is what you want.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hello Laura. As Carlo noted in #2, the normality assumption for OLS models (including ANOVA & regression) applies to the errors, not to the outcome variable itself. And normality of the errors is a sufficient condition, but not a necessary condition. The necessary condition is (approximate) normality of the sampling distributions of the parameter estimates (i.e., the coefficients) from the model. As some better textbooks explain, those sampling distributions approach the normal distribution as n increases, even if the error distribution is not all that close to normal. (Jeff Wooldridge's econometrics textbook explains all of this very nicely. I put together some slides summarizing his main points--you can view them here.

      With all that in mind, what is your sample size? Have you considered using -qreg- rather than -regress-? If you use the default setting, the fitted values are conditional medians (rather than the condition means you get from OLS models).

      HTH.
      Last edited by Bruce Weaver; 08 Nov 2020, 09:36. Reason: Changed "normality of the parameter estimates" to "normality of the sampling distributions of the parameter estimates".
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        Hi both,

        thank you for the quick responses! My sample size is quite large (900+), but my supervisors wanted me to not ignore the violation of normality. - I will look into -qreg- as I have not considered that before. I am massively confused by now so I might have to ask some follow-up questions tomorrow. Thanks again! Hugely appreciated.

        Comment


        • #5
          Laura:
          with a 900+ sample your supervisor is overemphasizing the normality of residuals.
          I would be much more concerned about model misspecification.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Laura: Your ANOVA is equivalent to estimating different means for different cells defined by age\weight categories. Then you need to find the linear combinations that interest you. As others have noted, ANOVA is also equivalent to multiple regression.

            How many observations per cell do you have? If it’s more than, say, 50, nonnormality is probably not a big deal. And I’m not sure what you would do otherwise. ANOVA is about comparing means across cells. That’s it. If that’s what you want then use ANOVA (equivalently, regression). Anything else you do is less robust for estimating mean effects. If you use least absolute deviations then you are estimating differences in medians. If you use rrobust to get, say, a Huber robust estimator, you’re assuming symmetry of the distributions.

            There can me no “model misspecification” here because the linear model is saturated with exhaustive and mutually exclusive dummies. Only if you add other control variables does it become an issue.

            Comment


            • #7
              Laura, I forgot to ask you this in my first post: How are the age groups defined (i.e., what are the cut-points)? Do you have people's actual ages, or just the age groups they belong to? Thanks for clarifying.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Jeff is obviously right about model saturation.
                With a 900+ sample, I surmised (but did not make it explicit in my previous reply) that Laura had more stuff to include in the right-hand side of her regression equation.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Originally posted by Laura Kudlek View Post
                  . . . I want to examine if scores on a questionnaire X differ . . .
                  What do the scores look like?

                  If you're using a visual analog scale (VAS) or are summing the responses over a number of questionnaire items, then the scores will likely be continuous-looking.

                  On the other hand, if your scores are individual questionnaire items whose responses are ordered categorical (5 = "Strongly Agree" 4 = "Agree" 3 = "Undecided" 2 = "Disagree" 1 = "Strongly Disagree") or are sumscores across only a few such items with even fewer response options, then you might have so-called floor effects or ceiling effects (truncation & skew) and other issues to deal with.

                  Is that what your supervisors are concerned about?

                  Comment

                  Working...
                  X