Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to analyze a subgroup with Wilcoxon signed rank test

    Dear all,

    I'm struggeling how to deal with the Wilcoxon signed rank test. I have a baseline dataset which contains about 50 persons. Variables height, weight and age were measured and all measurements turned out to be non-parametric. As I need a subgroup of the baseline dataset, I would like to check if the variables (height weight and so on) of the small subgroup do not statistically differ from the baseline 50 persons dataset.

    I figured I had to make a new variable, to distinguish persons belonging to the subgroup and persons from the baseline datset. The new variable was called 'subgroup' and I used the command
    Code:
    replace subgroup =1 if patientnumber == "xxx"
    to place the right persons in the subgroup.

    After finishing this selection, I tried using the command
    Code:
    signrank age=(age if subgroup ==1)
    , but this code does not result into a reliable Prob > |z| score. I placed brackets and "" at different positions, but this did not help me out either.

    Who can give me some advice on how to compare a subgroup to a total group when using a non parametric test as the wilcoxon rank?

    Thanks so much,
    Mariska

  • #2
    signrank is for matched data; you have two different groups. So the first thing you need to is do
    Code:
     replace subgroup=2 if subgroup !=1
    or something like that. You don't show us the original command that generated the "subgroup" variable, so perhaps this step is not necessary.

    Then compare the two subgroups:
    Code:
    ranksum age, by(subgroup)
    The fact that a statistical test doesn't reject the null hypothesis isn't necessarily an indication that the groups are similar. If the subgroup is small, as I expect, most tests will have low power. Confidence intervals would more likely indicate how dissimilar the groups are. But what is the need for this analysis? Tell us more about your study.
    Last edited by Steve Samuels; 29 Jul 2015, 17:41.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      I think I need some clarification as I am reading the original posting somewhat differently from how Steve is reading; as I read it, you have a data set of 50 observations; you then take a subset (subgroup) of this 50 (not clear how) and you want to compare this subgroup to the original 50 (of which the subgroup is a part); if my reading is accurate, I don't agree with Steve's recommendation; worse I don't see the point or what you are trying to do; if Steve is accurate (i.e., you want to compare the chosen subgroup to the non-chosen subgroup), then his suggestion is fine

      Comment


      • #4
        Rich, the null and alternative hypotheses for comparing part-to-whole means and part-to-other-part means are equivalent, but the only way to test the nulls is a part-to-part test. The equivalence doesn't hold for medians or for other implicit parameters of rank tests, but I can't think of any way of testing part-to-whole, except by testing part-to-part.
        Last edited by Steve Samuels; 29 Jul 2015, 19:11.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Dear Steve and Rich,

          Thanks for your advice so far. I will explain myself a bit further.

          My baseline data set contains 50 observations. My baseline characteristics table of these 50 observations contains elements like weight, height etcetera. Within these 50 observations, approximately 15 persons (the subgroup) have had an extra blood test . I want to use the subgroup of these 15 patients to look at the relationship between the bloodtest and height, weight and so on. However, I do not want to display a seperate baseline characteristic of these 15 persons (the subgroup). So I figured, that I should use a test like the wilcoxon (as all the parameters are non-parametric) to see if all baseline variables (weight height) etc. do not differ statistically between the total group of 50 persons and the subgroup of 15 persons. And of course, if statistically significant difference would be the case, then I will describe that difference in my results.

          Hope that this clearifies things and I would be happy to learn more about your point of view regarding the group vs subgroup analysis!

          Comment


          • #6
            Steve, thanks for the explanation

            Comment


            • #7
              My point is that you can't test a difference between the 15 and the 50 directly.


              You are going to report a separate analysis of the 15. So, even though you don't want to display their descriptive statistics I think you should do so or describe them in the text, Do not quote p-values: the small sample sizes guarantee low power; To report that differences are or are not "statistically significant" would be uninformative, misleading, and possibly irrelevant. The more pertinent and interesting question for you to address in your manuscript: what determined who got the test?

              You have a second sample size problem. The rule of thumb for the number of observations needed to do linear regression is that you need 10-15 observations per covariate or more (e.g. Green, 1991, Babyak, 2004) oThus at best you could fit one predictor at a time. I'd recommend rank correlations and two-way plots (graph matrix), which will have the advantage of showing non-linear association.

              References:

              Babyak, MA. 2004. What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom Med 66, no. 3: 411-421

              Green SB. How many subjects does it take to do a regression analysis? Multivar Behav Res 1991; 26: 499–510.
              http://www.ncbi.nlm.nih.gov/pubmed/1...?dopt=Citation
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Dear Steve,

                Thank you for your very helpful and relevant information regarding this topic. I studied the graph matrix command and think this will help me out a lot as well!

                Thanks again!
                Mariska

                Comment

                Working...
                X