Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help needed with transforming Likert data or using non-parametric tests?

    Dear all,

    I'm researching whether several chatbot-aided online shopping scenarios have a different effect on one's customer experience and satisfaction.
    All data is acquired through the use of a survey which is attached to my experiment. Customer experience is measured using four variables, all measured on a 7-point Likert scale. Customer satisfaction is measured on a 7-point Likert scale as well.

    The problem is that this data tends to be negatively skewed. So much, that none of my variables are normally distributed. I tried transforming the data by doing a reverse score transformation Log(score +1 - highest score). Although the data ends up less negatively skewed, it still is very much not-normal.

    My questions to you:
    1. Are there any common measures (other than reverse score transformation) which can be taken to try to normalize data in this context?
    2. Even if it were possible to normalize the data, is it recommended or does it decrease reliability of test results?
    3. Following question 2, I find myself in a trade-off between trying to normalize it, or to move on and analyze my data with non-parametric tests. Which one would you recommend and why?
    4. In case you recommend non-parametric tests, is there a substitute for the Dunnett's post-hoc test? This test would be extremely useful to several of my hypotheses.

    Thanks in advance!

    Monique

  • #2
    There are several possible answers to your questions. One possible answer if you are comfortable with regression is: you don't even require the data to be normally distributed, so why try to transform the data? You only require the error term to be normally distributed for the least squares estimator to have optimal performance, but the estimator is merely not efficient if the data aren't normal. Also, you could use the robust estimator for the variance, which would ameliorate those problems. Furthermore, if your analysis is looking at each question individually, you are arguably better off using something like ordered logistic regression than linear regression.

    Another possible answer if you need to use an ANOVA-like framework is that non-parametric estimators can also be helpful. They will be less efficient than parametric ones, but your sample size may mean this problem is not relevant.

    I have a BA in psychology, but I'm in a PhD program in health services research, so regression-based approaches are the norm for me and probably for most posters here. These affect my answers.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank you Weiwen!

      Regarding my analyses, I need to run several One-Way ANOVA's as well as a Two-Way ANOVA (both looking at independent groups). From what I've read, there is no non-parametric equivalent of the latter, which means there is no way to find the interaction effect of, in my case, two independent variables. Unfortunately my experimental design does not require regression.

      Regarding the sample size, my sample contains 475 respondents, which are distributed over five groups with a sample size of 75,93,91,115 and 101 respectively. I need to compare the means for Likert-scaled variables (of which the score is the average of the Likert-score of all items belonging to that variable) for these groups.

      Hope this information adds to my problem description, feel free to ask anything else I left out.

      Comment


      • #4
        Unfortunately my experimental design does not require regression.
        But two-way ANOVA is just regression on indicator ("dummy") variables and their interaction. So I think Weiwen's advice here is excellent.

        With a sample size of 475, distributed over the the groups in the way you show, I think you have enough observations that non-normality, unless it is very extreme, will not matter much. And, remember, the issue of normality, to the extent it is an issue at all, applies to the residuals, not the outcome variables. You may find that the residuals are more normal than the dependent variables themselves. Do the regression that is equivalent to the ANOVA you are planning, and then use -qnorm- to check on the residuals: unless there is a really extreme departure from normality, I wouldn't worry about it. And, to boot, you can also use -vce(robust)- for added protection against non-normality.

        I think this is a molehill, or maybe even just an anthill, not a mountain.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          But two-way ANOVA is just regression on indicator ("dummy") variables and their interaction. So I think Weiwen's advice here is excellent.

          With a sample size of 475, distributed over the the groups in the way you show, I think you have enough observations that non-normality, unless it is very extreme, will not matter much. And, remember, the issue of normality, to the extent it is an issue at all, applies to the residuals, not the outcome variables. You may find that the residuals are more normal than the dependent variables themselves. Do the regression that is equivalent to the ANOVA you are planning, and then use -qnorm- to check on the residuals: unless there is a really extreme departure from normality, I wouldn't worry about it. And, to boot, you can also use -vce(robust)- for added protection against non-normality.

          I think this is a molehill, or maybe even just an anthill, not a mountain.
          One of Clyde's previous posts, which includes some discussion from other members, shows how you can take a regression model with a categorical independent variable, and perform a test equivalent to an ANOVA's F-test.

          Parenthetically, if all you want to do is an ANOVA-like non-parametric simultaneous test for the equality of the medians of the 5 groups, then you can use the Krukskal Wallis test. There's no regression equivalent, I think.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Weiwen Ng, first, note that K-W is a test of the distributions, not the medians; as with Mann-Whitney, it is possible to have identical medians but have K-W show the distributions to be different; second, I'm not sure what you mwan when you say, "There's no regression equivalent, I think"; rank regression (i.e., estimating a regression on the ranks of the data) will be an "equivalent"

            Comment


            • #7
              Originally posted by Rich Goldstein View Post
              Weiwen Ng, first, note that K-W is a test of the distributions, not the medians; as with Mann-Whitney, it is possible to have identical medians but have K-W show the distributions to be different; second, I'm not sure what you mwan when you say, "There's no regression equivalent, I think"; rank regression (i.e., estimating a regression on the ranks of the data) will be an "equivalent"
              Yep, I misspoke. Thanks!
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment

              Working...
              X