Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compare permute to ttest

    Dear all,
    maybe this is a silly question but I cannot spot the error. Basically, I can compare 2 samples using either a t-test or a permutation test. Ideally, both results converge. I wonder about the results of one and two-sided tests and the p-values. See this basic example.

    Code:
    . clear all
    
    . version 16.1
    
    . sysuse auto
    (1978 Automobile Data)
    
    .
    .
    .
    . ttest headroom, by(foreign)
    
    Two-sample t test with equal variances
    ------------------------------------------------------------------------------
       Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
    Domestic |      52    3.153846    .1269928    .9157578    2.898898    3.408795
     Foreign |      22    2.613636     .103676    .4862837     2.39803    2.829242
    ---------+--------------------------------------------------------------------
    combined |      74    2.993243    .0983449    .8459948    2.797242    3.189244
    ---------+--------------------------------------------------------------------
        diff |            .5402098    .2070884                .1273867    .9530329
    ------------------------------------------------------------------------------
        diff = mean(Domestic) - mean(Foreign)                         t =   2.6086
    Ho: diff = 0                                     degrees of freedom =       72
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 0.9945         Pr(|T| > |t|) = 0.0110          Pr(T > t) = 0.0055
    
    . permute foreign r(mean) if foreign, reps(25000) seed(346) nodots nodrop nowarn: summarize headroom, meanonly
    
    Monte Carlo permutation results                 Number of obs     =         74
    
          command:  summarize headroom if foreign, meanonly
            _pm_1:  r(mean)
      permute var:  foreign
    
    ------------------------------------------------------------------------------
    T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _pm_1 |   2.613636   24877   25000  0.9951  0.0004  .9941325   .9959094
    ------------------------------------------------------------------------------
    Note: Confidence interval is with respect to p=c/n.
    Note: c = #{|T| >= |T(obs)|}
    
    . permute foreign r(mean) if foreign, right reps(25000) seed(346) nodots nodrop nowarn: summarize headroom, meanonly
    
    Monte Carlo permutation results                 Number of obs     =         74
    
          command:  summarize headroom if foreign, meanonly
            _pm_1:  r(mean)
      permute var:  foreign
    
    ------------------------------------------------------------------------------
    T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _pm_1 |   2.613636   24877   25000  0.9951  0.0004  .9941325   .9959094
    ------------------------------------------------------------------------------
    Note: Confidence interval is with respect to p=c/n.
    Note: c = #{T >= T(obs)}
    
    . permute foreign r(mean) if foreign, left reps(25000) seed(346) nodots nodrop nowarn: summarize headroom, meanonly
    
    Monte Carlo permutation results                 Number of obs     =         74
    
          command:  summarize headroom if foreign, meanonly
            _pm_1:  r(mean)
      permute var:  foreign
    
    ------------------------------------------------------------------------------
    T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _pm_1 |   2.613636     181   25000  0.0072  0.0005  .0062267   .0083703
    ------------------------------------------------------------------------------
    Note: Confidence interval is with respect to p=c/n.
    Note: c = #{T <= T(obs)}
    My question is: why is the result from the two sided t-test (p=0.0110) so different from the two-sided result from the two sided permutation test (p=0.9951)? The results for the other p-values are indeed very similar.


    Best wishes

    (Stata 16.1 MP)

  • #2
    You are not testing the same thing at all; you appear to be testing whether the mean for foreign is statistically significantly different from 0 in your -permute- command but that is not what the t-test is doing; I think that this is closer:
    Code:
    . permute foreign p=r(p), rep(10000) nodots: ttest headroom, by(foreign) 
    
    warning: because ttest is not an estimation command or does not set e(sample), permute has no way to
             determine which observations are used in calculating the statistics and so assumes that all
             observations are used. This means that no observations will be excluded from the resampling
             because of missing values or other reasons.
    
             If the assumption is not true, press Break, save the data, and drop the observations that are to
             be excluded. Be sure that the dataset in memory contains only the relevant data.
    
    Monte Carlo permutation results           Number of observations  =          74
                                              Number of permutations  =      10,000
    
          command:  ttest headroom, by(foreign)
                p:  r(p)
      permute var:  foreign
    
    -------------------------------------------------------------------------------
                 |                                               Monte Carlo error
                 |                                              -------------------
               T |    T(obs)       Test       c       n      p  SE(p)   [95% CI(p)]
    -------------+-----------------------------------------------------------------
               p |  .0110486      lower     116   10000  .0116  .0011  .0096  .0139
                 |                upper    9907   10000  .9907  .0010  .9886  .9925
                 |            two-sided                  .0232  .0015  .0202  .0262
    -------------------------------------------------------------------------------
    Note: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n.
    Note: For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n.
    Note: For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate.

    Comment


    • #3
      Felix:
      as an aside to Rich's excellent reply, why not considering a comparison between a classic ttest and and its -bootstrap- counterpart?
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        @Carlo: Actually I am finally trying to learn how the permute command works in Stata... I found this resource here but somehow the procedure is somewhat complicated and I guess that is the reasons for my misunderstanding. In a real application bootstrapping would be indeed a very nice alternative approach!

        @Rich: thanks for this clear example. I better understand now how the command works and why my first approach was not what I actually wanted to do.
        Best wishes

        (Stata 16.1 MP)

        Comment


        • #5
          Felix:
          I find -bootstrap- easier to run and explain than -permute- in research report and technjcal article.
          You might be interested to take a look at: https://pubmed.ncbi.nlm.nih.gov/11113956/.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Rich Goldstein shows how this is done correctly, but using the p-value as a test statistic might cause us a bit of headache when we try to interpret after that our results.

            Here is how you can achieve the same with regression analysis, and using a more standard pivot (the t-statistic):

            Code:
            . permute foreign t=_b[foreign]/_se[foreign], rep(10000) nodots: reg headroom foreign
            
            Monte Carlo permutation results                 Number of obs     =         74
            
                  command:  regress headroom foreign
                        t:  _b[foreign]/_se[foreign]
              permute var:  foreign
            
            ------------------------------------------------------------------------------
            T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                       t |  -2.608596     119   10000  0.0119  0.0011  .0098678   .0142234
            ------------------------------------------------------------------------------
            Note: Confidence interval is with respect to p=c/n.
            Note: c = #{|T| >= |T(obs)|}
            which gives pretty much the same results if we do what Rich did but using the t-statistic as a pivot:

            Code:
            . permute foreign t=r(t), rep(10000) nodots: ttest headroom, by(foreign)
            
            Warning:  Because ttest is not an estimation command or does not set e(sample), permute has no way
                      to determine which observations are used in calculating the statistics and so assumes
                      that all observations are used.  This means that no observations will be excluded from
                      the resampling because of missing values or other reasons.
            
                      If the assumption is not true, press Break, save the data, and drop the observations
                      that are to be excluded.  Be sure that the dataset in memory contains only the relevant
                      data.
            
            Monte Carlo permutation results                 Number of obs     =         74
            
                  command:  ttest headroom, by(foreign)
                        t:  r(t)
              permute var:  foreign
            
            ------------------------------------------------------------------------------
            T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                       t |   2.608596     124   10000  0.0124  0.0011   .010324   .0147668
            ------------------------------------------------------------------------------
            Note: Confidence interval is with respect to p=c/n.
            Note: c = #{|T| >= |T(obs)|}

            Comment


            • #7
              And by the way I forgot to set the seed, this is why the results were slightly different. (This is a procedure involving randomness.)

              If I set the seed, the results are numerically the same:

              Code:
              . permute foreign t=_b[foreign]/_se[foreign], rep(10000) nodots seed(69): reg headroom foreign
              
              Monte Carlo permutation results                 Number of obs     =         74
              
                    command:  regress headroom foreign
                          t:  _b[foreign]/_se[foreign]
                permute var:  foreign
              
              ------------------------------------------------------------------------------
              T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                         t |  -2.608596     119   10000  0.0119  0.0011  .0098678   .0142234
              ------------------------------------------------------------------------------
              Note: Confidence interval is with respect to p=c/n.
              Note: c = #{|T| >= |T(obs)|}
              
              . permute foreign t=r(t), rep(10000) nodots seed(69): ttest headroom, by(foreign)
              
              Warning:  Because ttest is not an estimation command or does not set e(sample), permute has no way
                        to determine which observations are used in calculating the statistics and so assumes
                        that all observations are used.  This means that no observations will be excluded from
                        the resampling because of missing values or other reasons.
              
                        If the assumption is not true, press Break, save the data, and drop the observations
                        that are to be excluded.  Be sure that the dataset in memory contains only the relevant
                        data.
              
              Monte Carlo permutation results                 Number of obs     =         74
              
                    command:  ttest headroom, by(foreign)
                          t:  r(t)
                permute var:  foreign
              
              ------------------------------------------------------------------------------
              T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                         t |   2.608596     119   10000  0.0119  0.0011  .0098678   .0142234
              ------------------------------------------------------------------------------
              Note: Confidence interval is with respect to p=c/n.
              Note: c = #{|T| >= |T(obs)|}

              Comment


              • #8
                Going to the issue of bootstrap vs. permutation approaches: Note that while they both simulate the sampling distribution, they do so under somewhat different assumptions, i.e., one in which the null hypothesis is enforced (-permute-) vs. one simulated under a particular alternative hypothesis. (My impression is that this gets overlooked.) The resulting sampling distributions are not in general the same. I'd use -bootstrap- for a confidence interval (my preference), and -permute- for an hypothesis test. These two approaches won't, for example, generally result in the same estimate of the standard error. Here's a simple illustration:
                Code:
                set seed 4754
                sysuse auto, clear
                tempfile permfile
                permute foreign diff = (r(mu_1) -r(mu_2)), ///
                   saving(`permfile') rep(10000) nodots: ttest headroom, by(foreign)
                use `permfile', clear
                gen byte perm = 1
                save `permfile', replace
                //
                set seed 4754
                sysuse auto, clear
                tempfile bootfile
                bootstrap diff = (r(mu_1) -r(mu_2)), ///
                   saving(`bootfile') rep(10000) nodots: ttest headroom, by(foreign)
                local obsdiff = el(r(table),1,1)   
                use `bootfile', clear
                gen byte perm = 0
                // adjust means for an easier comparison
                replace diff = diff -`obsdiff'
                append using `permfile'
                //
                bysort perm: summ diff
                hist diff, by(perm)

                Comment


                • #9
                  Thank you all for your comments. Actually, I am working on a publication about resampling methods and of course I want to introduce the options Stata has to offer. Thank you Mike for this very interesting example that is absolutely worth further research and discussion!
                  Best wishes

                  (Stata 16.1 MP)

                  Comment


                  • #10
                    during the 1980's, the relationship between bootstrap and randomization (permutation) tests was a subject of much discussion; all that pretty much ended with Romano, JP (1989), "Bootstrap and randomization tests of some nonparametric hypotheses," The Annals of Statistics, 17(1): 141-159 - enjoy

                    Comment


                    • #11
                      The interesting paper that Rich mentioned can be downloaded free of charge from https://projecteuclid.org/euclid.aos/1176347007
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Joro Kolev Hi Prof. regarding your example above:
                        Code:
                          
                         permute foreign t=_b[foreign]/_se[foreign], rep(10000) nodots seed(69): reg headroom foreign
                        How would I run this permutation for two or more variables? If for example the regression:
                        Code:
                        reg headroom foreign rep78 weight
                        and I want to randomise the values of foreign and rep78.
                        If I do as above,
                        Code:
                         
                         permute foreign rep78 t=_b[foreign]/_se[foreign], rep(10000) nodots seed(69): reg headroom foreign rep78 weight
                        Stata just permutes one of them.

                        Could you please help me? Many thanks.

                        Comment


                        • #13
                          What I read from the documentation is that permute only ever permutes one variable. In your example bootstrapping might offer a comparable solution:

                          Code:
                          sysuse auto, clear
                          bootstrap t=_b[foreign]/_se[foreign], reps(9999) nodots seed(69): reg headroom foreign rep78 weight
                          estat bootstrap, bc
                          Best wishes

                          (Stata 16.1 MP)

                          Comment


                          • #14
                            Per Felix's comment, I don't think there is a way to make -permute- permute more than one variable at a time. I can think of a possible way to do what you want, as I describe below, but I wonder a bit about whether what you say you want is what you need: The motivation for permuting both (say) foreign and rep78 would be, I presume, to obtain the distribution of the t-value for foreign if foreign and rep78 had no relationship with headroom *and* foreign and rep78 had no relationship with any other predictor (including one another). (Note that -bootstrap-, which resamples whole observations, will not render null the relationship of rep78 to foreign and headroom, so that might not fit what you want. What you say you want seems unusual to me, though possible, since I would think that you'd want to know something about how foreign behaved *while accounting for the fact that rep78 is also related to headroom and foreign.*

                            Anyway, if you're certain that this is what you want (I'd like to hear what others here say), the only solution I can think of is the user-written command -shufflevar- (available at -ssc-). That command can shuffle a whole varlist, in various ways, and create permuted (shuffled) versions of each of those variables. Using that, I suppose you could create a program something like this, and use it with -postfile- or -simulate-.

                            Code:
                            prog mypermute, rclass
                            shufflevar foreign rep78
                            reg headroom foreign_shuffled rep78_shuffled weight
                            return scalar t=_b[foreign]/_se[foreign
                            drop *shuffled
                            end

                            Comment


                            • #15
                              Thanks Mike for this info, that's a good point I did not consider with bootstrap. Given the description from post #12 I assume the solution with shufflevar seems to offer a solution and is also quite neat!
                              Best wishes

                              (Stata 16.1 MP)

                              Comment

                              Working...
                              X