Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting welch test

    Hi Evereyone, i need a little help here with interpretation of Welch Test
    i used this command
    Code:
    ttest healthy, by(treatment) welch
    I got following results:

    Code:
     ttest healthy, by(treatment) welch
    
    Two-sample t test with unequal variances
    ------------------------------------------------------------------------------
       Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
    ---------+--------------------------------------------------------------------
           0 |      84    .5119048    .0548666    .5028604    .4027774    .6210322
           1 |   2,607    .7092443    .0088956    .4541981    .6918012    .7266875
    ---------+--------------------------------------------------------------------
    Combined |   2,691    .7030844    .0088094     .456984    .6858106    .7203581
    ---------+--------------------------------------------------------------------
        diff |           -.1973396     .055583               -.3078075   -.0868717
    ------------------------------------------------------------------------------
        diff = mean(0) - mean(1)                                      t =  -3.5504
    H0: diff = 0                             Welch's degrees of freedom =  87.5254
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 0.0003         Pr(|T| > |t|) = 0.0006          Pr(T > t) = 0.9997


    Now, as we are supposed to look at Ha: diff ! = 0 and its p-stat, we can see that it is 0.0006, if we convert it to %age, that becomes 6% or 0.06, implying that it is greater than 5% or 0.05. Can we now conclude that there is no significant difference between the means of control and treatment group?
    please help me interpret the results

  • #2
    David Radwin please help

    Comment


    • #3
      Maoomal:
      I fear you're on the wrong track.
      Your -ttest- results tells you that the difference in -healthy- (I assume that this is a continuous variable, otherwise the use of -ttest- would be hard to justify) between the two groups (that you assume to have equal variance, which is questionable on a priori basis; see -unequal- option in -ttest-) does reach statistical significance (0.006<0.05).
      You can also see it from the limits of the 95% CI of the difference of the means, as both show the minus sign.
      That said, my concern rests on the sample size(s): does it make any sense to compare N=84 vs N=2607 and consider inferential results as informative?
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Carlo Lazzaro thankyou for the response. "healthy" is a binary variable where 0 is an unhealthy child with low birth weight and 1 is a healthy child with normal birth weight. are you telling me that i can't use this test if -healthy- is a binary variable and not continuous one?

        Comment


        • #5
          Moonal:
          this is exacly what I meant.
          You should consider something like:
          Code:
          logit healthy i.treatment
          or:
          Code:
          . prtesti 84 0.51 2607 .71
          
          Two-sample test of proportions                     x: Number of obs =       84
                                                             y: Number of obs =     2607
          ------------------------------------------------------------------------------
                       |       Mean   Std. err.      z    P>|z|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                     x |        .51   .0545436                      .4030966    .6169034
                     y |        .71   .0088871                      .6925817    .7274183
          -------------+----------------------------------------------------------------
                  diff |        -.2   .0552628                     -.3083131   -.0916869
                       |  under H0:   .0506153    -3.95   0.000
          ------------------------------------------------------------------------------
                  diff = prop(x) - prop(y)                                  z =  -3.9514
              H0: diff = 0
          
              Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
           Pr(Z < z) = 0.0000         Pr(|Z| > |z|) = 0.0001          Pr(Z > z) = 1.0000
          
          .
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            okay. what is this test for and how do we interpret this?
            basically what i wanted to do was that i had 84 observations in my control group and 2604 obs in my treatment group. my point of concern it that it might be problematic to have small number of observations in control group as compared to treatment group? so i did a little research to see how can i justify having small number of observations and see which test could help me in this regard. following this article https://www.statology.org/welchs-t-test-stata/ , i came across welch test and applied this.
            now i am a little confused here, what should I do in this regard. could you please guide me a little more as to what should i do?
            p.s. i am doing propensity score matching and with these set of observations i am getting significant results

            Comment


            • #7
              Moomal:
              the link you kindly shared points exactly to a comparison between the means of a continuous variable (-ttest-; -welsch- that, as I did not notice before, wisely assumes that the variance between the two populations from which the two samples were drawn have different variances).
              In addition, the -welsch- option does not address the relevant sample sizes difference between your two groups
              That said, I still won't get how -ttest- can be applied to the difference between two proportions, unless you trust tyat much the normal approximation to the binomial distribution.
              This trust can well let you down, as in the following toy-example, where the upper bound on the 95% CI straddles the upper bound of the probability (that is, 1):
              Code:
              . prtesti 84 0.99 2607 .98
              
              Two-sample test of proportions                     x: Number of obs =       84
                                                                 y: Number of obs =     2607
              ------------------------------------------------------------------------------
                           |       Mean   Std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                         x |        .99   .0108562                      .9687222    1.011278
                         y |        .98   .0027419                      .9746259    .9853741
              -------------+----------------------------------------------------------------
                      diff |        .01   .0111971                     -.0119459    .0319459
                           |  under H0:   .0154003     0.65   0.516
              ------------------------------------------------------------------------------
                      diff = prop(x) - prop(y)                                  z =   0.6493
                  H0: diff = 0
              
                  Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
               Pr(Z < z) = 0.7419         Pr(|Z| > |z|) = 0.5161          Pr(Z > z) = 0.2581
              
              .
              With a bit of guess-work, I could hypothesize that you are doing PSM to select, among the 2607 observations, those "similar" (controls?) to the 84 ones included in the other group.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Carlo Lazzaro here is a full scenario.
                i am running a quasi-experimental research technique PSM. my outcome variable is child birth weight and my treatment variable is prenatal care. my covariates are mothers age, mothers education, child birth order etc . i have categorized the outcome variable into healthy and unhealthy. healthy =1 and unhealthy = 0. my treatment variable is also binary, it is 0 for mothers who did not receive prenatal care and one for those who received prenatal care. now the point of my concern is that i have 2000 observations in treatment group and only 80 obs in control group. i think lopsided sample might be an issue. however i am getting significant ATT, ATE & ATC for this sample set. still i want to perform a sensitivity test; i was suggested to bootstrap the treatment variable and see if i get similar results. You think this is a good idea?

                Comment


                • #9
                  Moomal:
                  do you really mean bootstrapping a two-level categorical predictor or do you mean a random distribution of 1/0 within the sample?
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    I am not sure which will be more accurate in my case, what do do suggest?

                    Comment


                    • #11
                      I have another query, considering my scenario that I mentioned above, do I need to resample my treatment group or the control group? as i have 2000 obs in my treatment group and 84 in by control group.

                      Comment


                      • #12
                        Moomal:
                        1) & 2) I'd use runiform() to shuffle your main predictor and perform a sensitivity analyis to check the robustness of your baseline findings.
                        In the following toy-example, -shuffle- replaces -foreign- in sensitivity analysis:
                        Code:
                        . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
                        (1978 automobile data)
                        
                        . regress price i.foreign
                        
                              Source |       SS           df       MS      Number of obs   =        74
                        -------------+----------------------------------   F(1, 72)        =      0.17
                               Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
                            Residual |   633558013        72  8799416.85   R-squared       =    0.0024
                        -------------+----------------------------------   Adj R-squared   =   -0.0115
                               Total |   635065396        73  8699525.97   Root MSE        =    2966.4
                        
                        ------------------------------------------------------------------------------
                               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                             foreign |
                            Foreign  |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
                               _cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
                        ------------------------------------------------------------------------------
                        
                        . g shuffle=runiform()
                        
                        . replace shuffle=0 if shuffle<0.5
                        
                        . replace shuffle=1 if shuffle>=0.5
                        
                        . regress price i.shuffle
                        
                              Source |       SS           df       MS      Number of obs   =        74
                        -------------+----------------------------------   F(1, 72)        =      0.12
                               Model |  1056431.73         1  1056431.73   Prob > F        =    0.7301
                            Residual |   634008964        72  8805680.06   R-squared       =    0.0017
                        -------------+----------------------------------   Adj R-squared   =   -0.0122
                               Total |   635065396        73  8699525.97   Root MSE        =    2967.4
                        
                        ------------------------------------------------------------------------------
                               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                           1.shuffle |  -239.0526   690.1671    -0.35   0.730    -1614.876     1136.77
                               _cons |   6281.553   481.3818    13.05   0.000     5321.936     7241.17
                        ------------------------------------------------------------------------------
                        
                        .
                        What above also holds if you plan to perform a logistic regression.

                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment

                        Working...
                        X