Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing means with missing values

    Hello everyone,

    I have a small problem executing a one-would-think simple t-test and I hope you can help me!

    So let's say I have two variables x and y and the data set looks like the following:
    x y
    ------------------------------
    1 5
    2 .
    3 .
    . 4
    4 .
    . 3

    Now I want to test whether the mean of x is different to the mean of y ( so H0: 2.5 = 4)
    Is there any smart way to do this?
    If I just do ttest x=y I obviously get "no observation" and when I create a new variable which displays the mean respectively I get a test where 6 observations are tested against each other but I want to test 4 obs vs. 3 obs.

    Thank you in advance!

  • #2




    You can restructure the data so that you have two groups. if you have some pairing (as in your example) you are thereby ignoring information in the data.

    Code:
    clear 
    input x y
    1 5
    2 .
    3 .
    . 4
    4 .
    . 3
    end 
    
    stack (x y) , into(xy) 
    
    list 
    
    ttest xy, by(_stack)

    Comment


    • #3
      Nick's recommendation is the most principled, but you can just use the -unpaired- option of -ttest- if you want to be quick-and-dirty.

      .ÿ
      .ÿversionÿ16.0

      .ÿ
      .ÿclearÿ*

      .ÿ
      .ÿinputÿbyteÿ(xÿy)

      ÿÿÿÿÿÿÿÿÿÿÿÿxÿÿÿÿÿÿÿÿÿy
      ÿÿ1.ÿ1ÿ5
      ÿÿ2.ÿ2ÿ.
      ÿÿ3.ÿ3ÿ.
      ÿÿ4.ÿ.ÿ4
      ÿÿ5.ÿ4ÿ.
      ÿÿ6.ÿ.ÿ3
      ÿÿ7.ÿend

      .ÿ
      .ÿttestÿxÿ=ÿy,ÿunpaired

      Two-sampleÿtÿtestÿwithÿequalÿvariances
      ------------------------------------------------------------------------------
      Variableÿ|ÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿErr.ÿÿÿStd.ÿDev.ÿÿÿ[95%ÿConf.ÿInterval]
      ---------+--------------------------------------------------------------------
      ÿÿÿÿÿÿÿxÿ|ÿÿÿÿÿÿÿ4ÿÿÿÿÿÿÿÿÿ2.5ÿÿÿÿ.6454972ÿÿÿÿ1.290994ÿÿÿÿ.4457397ÿÿÿÿÿ4.55426
      ÿÿÿÿÿÿÿyÿ|ÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿÿÿ4ÿÿÿÿ.5773503ÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿ1.515862ÿÿÿÿ6.484138
      ---------+--------------------------------------------------------------------
      combinedÿ|ÿÿÿÿÿÿÿ7ÿÿÿÿ3.142857ÿÿÿÿ.5084323ÿÿÿÿ1.345185ÿÿÿÿ1.898768ÿÿÿÿ4.386946
      ---------+--------------------------------------------------------------------
      ÿÿÿÿdiffÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-1.5ÿÿÿÿ.9036961ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-3.823025ÿÿÿÿ.8230248
      ------------------------------------------------------------------------------
      ÿÿÿÿdiffÿ=ÿmean(x)ÿ-ÿmean(y)ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿtÿ=ÿÿ-1.6599
      Ho:ÿdiffÿ=ÿ0ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿdegreesÿofÿfreedomÿ=ÿÿÿÿÿÿÿÿ5

      ÿÿÿÿHa:ÿdiffÿ<ÿ0ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿHa:ÿdiffÿ!=ÿ0ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿHa:ÿdiffÿ>ÿ0
      ÿPr(Tÿ<ÿt)ÿ=ÿ0.0789ÿÿÿÿÿÿÿÿÿPr(|T|ÿ>ÿ|t|)ÿ=ÿ0.1578ÿÿÿÿÿÿÿÿÿÿPr(Tÿ>ÿt)ÿ=ÿ0.9211

      .ÿ
      .ÿexit

      endÿofÿdo-file


      .

      Comment


      • #4
        Thank you so much, both of you!
        You really saved my master thesis! :D

        Comment


        • #5
          Joseph's approach can tackle the unpaired t test under this display of data.

          But sometimes we need to perform a Wilcoxon ranksum test.

          Nick's approach in #2 can cope as well with a - ranksum - test under the display in #1.

          I came across this scenario several times and I needed to rebuild the dataset "by hand", hence I'm glad to have now learned how to use -stack- command!
          Best regards,

          Marcos

          Comment

          Working...
          X