Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compare the distributions of two variables

    I would like to compare the distributions of two variables: guess_attain_own_treatment and guess_attain_other_treatment . The variables are available for all individuals in my data set.

    I thought I would use a ksmirnof test, but it seems this only accept one variable (and you define a variable that separates observations into groups using , by() ). Is there a way to conduct the test I have in mind? This is what I would like to do:

    Code:
     ksmirnov guess_attain_own_treatment guess_attain_other_treatment, exact
    But I get r(103) Too many variables specified.

  • #2
    Curiously enough, -ksmirnov- does not accept two variables. There is no reason for that, but there it is.

    You need to reshape your data to long, and then use the syntax with the -by- option.

    If you do not manage to do it yourself, please provide a data sample using -dataex-.

    Comment


    • #3
      Jess: You can also use stack instead of reshape. I'm not sure which is more helpful for your particular context.

      Comment


      • #4
        Thank you Joro Kolev and John Mullahy ! I tried using both stack and reshape and both worked! I was able to compare the two variables.

        I actually now want to try something more complicated - it is still on the same vein though: I want to compare the distributions of one group's variable with another group's different variable.

        I've provided a data sample below. An observation is identified by bibnumber and race. I want to compare the distribution of guess_attain_own_treatment for observations in treatment M with the distribution of guess_attain_other_treatment for observations in treatment C ..

        Would stack or reshape work to be able to achieve this?

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long bibnumber byte(treatment guess_attain_own_treatment guess_attain_other_treatment) long race
            . .   .   . 2
         1294 3  60  50 1
         6632 3  70  60 3
         4910 2   0  10 1
         2219 .   .   . 2
         6214 4  60  50 3
         2541 .   .   . 1
        14792 3  50  30 4
         1263 1  60  50 1
        14925 2  70  70 4
         1239 2  70  70 1
         8648 2  60  60 4
        21233 4  70  60 4
          833 1  40  10 2
         2886 4  70  70 4
        11441 2  60  40 1
         1698 1  60  40 1
         1746 2  70  60 1
         9249 .   .   . 1
         9201 .   .   . 4
        18295 1  40  30 4
         5003 .   .   . 1
         2811 2  70  80 2
        15565 4  80  70 4
         7767 3  60  50 2
         3084 1  70  70 1
        10632 .   .   . 3
         1099 1  70  70 4
        10504 .   .   . 1
         2344 2  30  30 1
         2051 3  70  50 1
        16547 .   .   . 4
        17760 1  90  80 4
         7101 1  90  80 4
         6804 4  70  60 4
        15483 .   .   . 3
         7498 1  70  60 1
         9258 2  90  90 1
         5970 1  70  60 4
         5725 2  60  70 4
         1276 1  70  30 2
        10511 2  60  60 4
        13085 .   .   . 4
        11575 .   .   . 4
         8318 3  80  50 4
         4138 3  60  50 4
         8133 .   .   . 1
        10633 3  80  50 4
          560 2 100 100 1
         9778 2  60  60 4
        18501 .   .   . 4
         4205 1  80  70 4
         1122 .   .   . 4
         9020 3  50  40 4
        12764 3  50  70 4
         3969 3   .   . 4
         3650 4  50  50 1
        10475 .   .   . 3
        10839 1  80  30 4
         6054 1  70  50 1
         9623 2  60  60 3
          586 4   .   . 1
         1857 3  70  70 3
         2402 2  80  50 3
         2384 2  60  60 1
         4492 .   .   . 3
         2067 4  90  80 3
         1054 4  30  30 4
         4037 1  70  70 3
        10209 1  60  30 3
         9056 4  80  80 3
         4038 1  50  30 1
         4137 3  80  80 3
         5562 2 100  50 3
          470 4  50  40 2
        14172 1  50  50 4
         2754 .   .   . 4
         7108 3  70  50 1
         6868 2  90  90 1
         9662 .   .   . 4
         9881 4  80  50 4
          752 3  80  60 1
         7570 .   .   . 1
         7018 .   .   . 4
         1256 .   .   . 2
         7116 3  80  80 1
         1360 3  70  70 1
         6603 .   .   . 3
        15239 2  40  40 4
        19355 .   .   . 4
         2379 3  70  70 1
         4833 1  20  30 1
        13108 .   .   . 4
         6005 4  20  50 2
         4807 .   .   . 3
         2506 1  60  80 3
         9139 2  40  40 3
        16373 .   .   . 4
        17795 .   .   . 4
        16095 2   .   . 4
        end
        label values treatment treatments
        label def treatments 1 "C", modify
        label def treatments 2 "P", modify
        label def treatments 3 "M", modify
        label def treatments 4 "PM", modify

        Comment


        • #5
          Thanks for the data example. Here is a plot. The ttest command follows for your interest, but it's unconvincing. A plot is needed to make emphatic the granularity of the data (multiples of 10). The plot combines a quantile plot of the data, diamonds for medians, spikes connecting extremes and quartiles and horizontal reference lines for means.

          Code:
          gen wanted = cond(treat == 1,  guess_attain_own_treatment , cond(treat == 3, guess_attain_other_treatment, .))
          egen median = median(wanted), by(treat)
          gen where = treat - 0.1
          label var wanted "own for C, other for M"
          * stripplot is from SSC 
          stripplot wanted, over(treat) ms(Sh) cumul cumprob box(barw(0)) pctile(0) vertical addplot(scatter median where, ms(Dh) mc(black)) boffset(-0.1) refline scheme(s1color) yla(, ang(h))
          
          ttest wanted, by(treat)
          Click image for larger version

Name:	guess_treat.png
Views:	1
Size:	19.0 KB
ID:	1578697

          Last edited by Nick Cox; 23 Oct 2020, 15:52.

          Comment


          • #6
            Thank you very much for this Nick Cox ! Both the approach to structure the data to enable to comparison, and the plot suggestion are extremely helpful.

            Comment

            Working...
            X