Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running a t-test/ANOVA/Regression on percentile rank scores

    I cross-post from: http://stats.stackexchange.com/quest...le-rank-scores

    This might sound trivial, but for one single variable I have percentile rank scores for 100 observations. My sample is divided in 2 groups: 20 Observations are in the Group 1, and 80 in the Group 2.
    Can I run a t-test to say if, on average, the Group 1 has higher percentile scores when compared with Group 2?
    In addition, could I use the percentile rank scores in a regression model (or ANOVA or logit), to predict another outcome variable?

    In a case study I am analyzing, working on percentile rank score leads to much more significant results than using the raw values. However, my fear is that using percentile rank scores in regression models or in t-tests might somehow be wrong.
    In addition, is there a way I could support my choice? (Apart from the fact that I get better results)

    Thanks a lot for your help!

  • #2
    Hello Andrea,

    This is a quite tricky question to me, and I hope you may get better advice. That said, I believe what you really have is a ranked score. Therefore, a nonparametric test would be among the first options. Transforming a rank into a percentile rank, IMHO, may not be the correct alternative to put data through parametric tests. On account of that, I guess you should stick with "raw data", I mean, ranks, no matter they won't provide "better" results.

    Best,

    Marcos
    Best regards,

    Marcos

    Comment


    • #3
      Percentile rank scores are presumably distributed uniformly in general but not necessarily in each group. An alternative to the t-test might be to bootstrap to get a confidence interval for the difference in means. My hunch is that the t-test should work quite well in this case, but we can see no data here to check for ourselves,

      As far as using such scores as a predictor in regression I see no issue. There is no assumption in regression that predictors have any particular marginal distribution. It seems unlikely that percentile rank score will itself be outlier-prone although the usual checks for nonlinearity, heteroscedasticity, outliers, etc. in terms of rank score and other variables are still germane.

      Naturally if the groups are themselves chosen on the basis of rank scores, the test is meaningless.

      Comment


      • #4
        Thanks Nick and Marcos!

        My "raw data" is a measure like the weight in KG of a person. I transformed that data into percentile rank scores (considering all the 100 observations). Then the two groups divisions could be Male/Female. Is that appropriate a t-test in this case to say if Males on average significantly rank higher with respect to their weight?

        Another possibility is when you do the Big 5 test of personality. The scores that you get there are expressed as percentile scores.



        Comment


        • #5
          Andrea:
          Surely I'm missing out something on your research goal, but I would consider -ttest weight, unequal by(gender)- as the way to go.
          If, as Nick posed, the prerequiments of -ttest- were limping in your case, you could make a -bootstrap- ttest as a sensitivity analysis of your base case findings.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thanks Carlo, they way you suggest is my usual way of doing. However, just in this case I was wondering if it's correct to use percentile rank scores.

            The measure "weight" is just an example. For the real measure that I have the rank is more important than the raw score. To get the ranks I thought using the percentile rank scores.
            But perhaps using a t-test on rank scores is wrong?

            Comment


            • #7
              Andrea:
              I'm simply not experienced in -ttest- on rank scores.
              If the literature in your research field allows that approach, follow that way.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                As Carlo I think implies, we can't impute your research goals. Personally, I doubt that percentile rank scores have more predictive value than what they transform. The effects of my weight aren't affected by how far everyone is heavier or lighter.

                To summarize this and #3 together: Technically there is no obvious problem, but watch out. Substantively, you can probably get a better model with the original data.
                Last edited by Nick Cox; 08 Nov 2016, 04:52.

                Comment


                • #9
                  Nick:
                  yes, I meant that.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    You might want to look at a text on non-parametric statistics. As I remember, some of those techniques do use ranks the way you want to.

                    Comment


                    • #11
                      By reading #4, now I realize your "raw data" is actually a continuous variable. Therefore, and maybe I didn't get your point, you could just stick to this sort of data and perfom, say, a parametric test, such as Student's. Shall you have unequal variances, as Carlo remarked, you just need to add the "unequal" option.

                      That said, shall you decide to apply a nonparametric test, you could do it directly with the raw data. After all, under this strategy, "good" information won't be partially lost...

                      Actually, the rank-sum estimation is done after ranking the data. I fear there may be much ado about nothing if we stick with the ranks or the raw data: shall we use the ranks or the raw data, when applying the Wilcoxon rank sum test, results will be the same.

                      Nota bene: I meant the ranks, not the percentile ranks. About percentile ranks, Nick's remarks in #3 nd #8 shed much light on the matter.

                      Besides, the raw data point to a direction. If transformed data point to the opposite direction, IMHO, I'd prefer to "believe" mostly in the raw data.


                      Best,

                      Marcos
                      Best regards,

                      Marcos

                      Comment

                      Working...
                      X