Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Likert scale - highly skewed dependent variable

    Dear Statisticians,

    My dependent variable is a likert scale about mental health (0 to 27). The variable is highly right skewed. I am considering a poisson regression - but not sure if I should use a count-variable model in this case.
    Another option is transform my dependent variable into 5 categories and use an ordered logistic model - my concern here is the skewed data.

    Any suggestion which model would be a better choice?


    Thank you in advance,

  • #2
    This is a common problem. The conservative approach is: Just use linear regression (or a variant like GEE or hierarchical linear regression if you have repeated observations). If you ever do a systematic literature review, all or almost all of the articles you review that include symptom scores as an outcome will do this. It's not the end of the world. Remember that the outcome doesn't have to be normally distributed for linear regression to work acceptably. You only need for the residuals to be normally distributed, and if they aren't you merely have the problem that the model isn't efficient, not that it's biased.

    For a count model specifically, in principle that could be OK. Symptom scales may have pre-defined severity bands as well, e.g. in the PHQ-9 depression scale (which is scored 0-27), 0-4 points represents no to minimal depressive Sx, 5-9 represents mild, 10-14 represents moderate (and 10 points is the typical cutoff for a current major depressive episode), 15-19 represents moderately severe, 20-27 represents severe. You could treat those as ordered. You might want to think about what happens if you don't meet the proportional odds assumption (usually you'd try a generalized ordered logit model if so). I'm not sure how many people will object if you violate the proportional odds assumption, and how strongly they would object.

    The more advanced approach would be to learn and apply item response theory. The thing is, that can be a bit tricky to learn. Also, most properly, you need to apply either explanatory IRT (which is a type of generalized structural equation model) or you need to draw and use plausible values for the dependent variable. In my view, this has a much higher skill floor (term borrowed from gaming, means the minimum skill required to use a statistical technique or game character effectively) than just linear regression, and I also don't see it as obligatory (unless we want to go back and throw out all the thousands of randomized trials ever done).
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Weiwen Ng gives excellent advice. I would add the possibility of ordinal modelling with the data as they arrive. What will work well depends partly on whether all possible Likert items occur and the dataset size, which you don't tell us.

      Comment


      • #4
        To add to that, if you are interested in ologit and generalized ologit, there is some practical advice available for Stata, see

        https://www.stata.com/meeting/4nasug/Williams_NASUG.pdf
        https://www.stata.com/meeting/4nasug/gologit2.pdf
        Best wishes

        (Stata 16.1 MP)

        Comment

        Working...
        X