Compare permute to ttest

Felix Bittmann

Join Date: Aug 2018
Posts: 693

Compare permute to ttest

28 Aug 2020, 06:47

Dear all,
maybe this is a silly question but I cannot spot the error. Basically, I can compare 2 samples using either a t-test or a permutation test. Ideally, both results converge. I wonder about the results of one and two-sided tests and the p-values. See this basic example.

Code:

. clear all

. version 16.1

. sysuse auto
(1978 Automobile Data)

.
.
.
. ttest headroom, by(foreign)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Domestic |      52    3.153846    .1269928    .9157578    2.898898    3.408795
 Foreign |      22    2.613636     .103676    .4862837     2.39803    2.829242
---------+--------------------------------------------------------------------
combined |      74    2.993243    .0983449    .8459948    2.797242    3.189244
---------+--------------------------------------------------------------------
    diff |            .5402098    .2070884                .1273867    .9530329
------------------------------------------------------------------------------
    diff = mean(Domestic) - mean(Foreign)                         t =   2.6086
Ho: diff = 0                                     degrees of freedom =       72

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.9945         Pr(|T| > |t|) = 0.0110          Pr(T > t) = 0.0055

. permute foreign r(mean) if foreign, reps(25000) seed(346) nodots nodrop nowarn: summarize headroom, meanonly

Monte Carlo permutation results                 Number of obs     =         74

      command:  summarize headroom if foreign, meanonly
        _pm_1:  r(mean)
  permute var:  foreign

------------------------------------------------------------------------------
T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _pm_1 |   2.613636   24877   25000  0.9951  0.0004  .9941325   .9959094
------------------------------------------------------------------------------
Note: Confidence interval is with respect to p=c/n.
Note: c = #{|T| >= |T(obs)|}

. permute foreign r(mean) if foreign, right reps(25000) seed(346) nodots nodrop nowarn: summarize headroom, meanonly

Monte Carlo permutation results                 Number of obs     =         74

      command:  summarize headroom if foreign, meanonly
        _pm_1:  r(mean)
  permute var:  foreign

------------------------------------------------------------------------------
T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _pm_1 |   2.613636   24877   25000  0.9951  0.0004  .9941325   .9959094
------------------------------------------------------------------------------
Note: Confidence interval is with respect to p=c/n.
Note: c = #{T >= T(obs)}

. permute foreign r(mean) if foreign, left reps(25000) seed(346) nodots nodrop nowarn: summarize headroom, meanonly

Monte Carlo permutation results                 Number of obs     =         74

      command:  summarize headroom if foreign, meanonly
        _pm_1:  r(mean)
  permute var:  foreign

------------------------------------------------------------------------------
T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _pm_1 |   2.613636     181   25000  0.0072  0.0005  .0062267   .0083703
------------------------------------------------------------------------------
Note: Confidence interval is with respect to p=c/n.
Note: c = #{T <= T(obs)}

My question is: why is the result from the two sided t-test (p=0.0110) so different from the two-sided result from the two sided permutation test (p=0.9951)? The results for the other p-values are indeed very similar.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar

Tags: Permutation, permute, ttest

Rich Goldstein

Join Date: Mar 2014
Posts: 4462

28 Aug 2020, 07:06

You are not testing the same thing at all; you appear to be testing whether the mean for foreign is statistically significantly different from 0 in your -permute- command but that is not what the t-test is doing; I think that this is closer:

Code:

. permute foreign p=r(p), rep(10000) nodots: ttest headroom, by(foreign) 

warning: because ttest is not an estimation command or does not set e(sample), permute has no way to
         determine which observations are used in calculating the statistics and so assumes that all
         observations are used. This means that no observations will be excluded from the resampling
         because of missing values or other reasons.

         If the assumption is not true, press Break, save the data, and drop the observations that are to
         be excluded. Be sure that the dataset in memory contains only the relevant data.

Monte Carlo permutation results           Number of observations  =          74
                                          Number of permutations  =      10,000

      command:  ttest headroom, by(foreign)
            p:  r(p)
  permute var:  foreign

-------------------------------------------------------------------------------
             |                                               Monte Carlo error
             |                                              -------------------
           T |    T(obs)       Test       c       n      p  SE(p)   [95% CI(p)]
-------------+-----------------------------------------------------------------
           p |  .0110486      lower     116   10000  .0116  .0011  .0096  .0139
             |                upper    9907   10000  .9907  .0010  .9886  .9925
             |            two-sided                  .0232  .0015  .0202  .0262
-------------------------------------------------------------------------------
Note: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n.
Note: For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n.
Note: For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#3

28 Aug 2020, 07:11

Felix:
as an aside to Rich's excellent reply, why not considering a comparison between a classic ttest and and its -bootstrap- counterpart?

Kind regards,
Carlo
(Stata 19.0)
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 693
#4

28 Aug 2020, 07:33

@Carlo: Actually I am finally trying to learn how the permute command works in Stata... I found this resource here but somehow the procedure is somewhat complicated and I guess that is the reasons for my misunderstanding. In a real application bootstrapping would be indeed a very nice alternative approach!

@Rich: thanks for this clear example. I better understand now how the command works and why my first approach was not what I actually wanted to do.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#5

28 Aug 2020, 08:18

Felix:
I find -bootstrap- easier to run and explain than -permute- in research report and technjcal article.
You might be interested to take a look at: https://pubmed.ncbi.nlm.nih.gov/11113956/.

Kind regards,
Carlo
(Stata 19.0)
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

28 Aug 2020, 09:21

Rich Goldstein shows how this is done correctly, but using the p-value as a test statistic might cause us a bit of headache when we try to interpret after that our results.

Here is how you can achieve the same with regression analysis, and using a more standard pivot (the t-statistic):

Code:

. permute foreign t=_b[foreign]/_se[foreign], rep(10000) nodots: reg headroom foreign

Monte Carlo permutation results                 Number of obs     =         74

      command:  regress headroom foreign
            t:  _b[foreign]/_se[foreign]
  permute var:  foreign

------------------------------------------------------------------------------
T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
-------------+----------------------------------------------------------------
           t |  -2.608596     119   10000  0.0119  0.0011  .0098678   .0142234
------------------------------------------------------------------------------
Note: Confidence interval is with respect to p=c/n.
Note: c = #{|T| >= |T(obs)|}

which gives pretty much the same results if we do what Rich did but using the t-statistic as a pivot:

Code:

. permute foreign t=r(t), rep(10000) nodots: ttest headroom, by(foreign)

Warning:  Because ttest is not an estimation command or does not set e(sample), permute has no way
          to determine which observations are used in calculating the statistics and so assumes
          that all observations are used.  This means that no observations will be excluded from
          the resampling because of missing values or other reasons.

          If the assumption is not true, press Break, save the data, and drop the observations
          that are to be excluded.  Be sure that the dataset in memory contains only the relevant
          data.

Monte Carlo permutation results                 Number of obs     =         74

      command:  ttest headroom, by(foreign)
            t:  r(t)
  permute var:  foreign

------------------------------------------------------------------------------
T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
-------------+----------------------------------------------------------------
           t |   2.608596     124   10000  0.0124  0.0011   .010324   .0147668
------------------------------------------------------------------------------
Note: Confidence interval is with respect to p=c/n.
Note: c = #{|T| >= |T(obs)|}

Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

28 Aug 2020, 09:35

And by the way I forgot to set the seed, this is why the results were slightly different. (This is a procedure involving randomness.)

If I set the seed, the results are numerically the same:

Code:

. permute foreign t=_b[foreign]/_se[foreign], rep(10000) nodots seed(69): reg headroom foreign

Monte Carlo permutation results                 Number of obs     =         74

      command:  regress headroom foreign
            t:  _b[foreign]/_se[foreign]
  permute var:  foreign

------------------------------------------------------------------------------
T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
-------------+----------------------------------------------------------------
           t |  -2.608596     119   10000  0.0119  0.0011  .0098678   .0142234
------------------------------------------------------------------------------
Note: Confidence interval is with respect to p=c/n.
Note: c = #{|T| >= |T(obs)|}

. permute foreign t=r(t), rep(10000) nodots seed(69): ttest headroom, by(foreign)

Warning:  Because ttest is not an estimation command or does not set e(sample), permute has no way
          to determine which observations are used in calculating the statistics and so assumes
          that all observations are used.  This means that no observations will be excluded from
          the resampling because of missing values or other reasons.

          If the assumption is not true, press Break, save the data, and drop the observations
          that are to be excluded.  Be sure that the dataset in memory contains only the relevant
          data.

Monte Carlo permutation results                 Number of obs     =         74

      command:  ttest headroom, by(foreign)
            t:  r(t)
  permute var:  foreign

------------------------------------------------------------------------------
T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
-------------+----------------------------------------------------------------
           t |   2.608596     119   10000  0.0119  0.0011  .0098678   .0142234
------------------------------------------------------------------------------
Note: Confidence interval is with respect to p=c/n.
Note: c = #{|T| >= |T(obs)|}

Comment

Mike Lacy

Join Date: Apr 2014

Posts: 2416
#8

28 Aug 2020, 11:37

Going to the issue of bootstrap vs. permutation approaches: Note that while they both simulate the sampling distribution, they do so under somewhat different assumptions, i.e., one in which the null hypothesis is enforced (-permute-) vs. one simulated under a particular alternative hypothesis. (My impression is that this gets overlooked.) The resulting sampling distributions are not in general the same. I'd use -bootstrap- for a confidence interval (my preference), and -permute- for an hypothesis test. These two approaches won't, for example, generally result in the same estimate of the standard error. Here's a simple illustration:

Code:

set seed 4754 sysuse auto, clear tempfile permfile permute foreign diff = (r(mu_1) -r(mu_2)), /// saving(`permfile') rep(10000) nodots: ttest headroom, by(foreign) use `permfile', clear gen byte perm = 1 save `permfile', replace // set seed 4754 sysuse auto, clear tempfile bootfile bootstrap diff = (r(mu_1) -r(mu_2)), /// saving(`bootfile') rep(10000) nodots: ttest headroom, by(foreign) local obsdiff = el(r(table),1,1) use `bootfile', clear gen byte perm = 0 // adjust means for an easier comparison replace diff = diff -`obsdiff' append using `permfile' // bysort perm: summ diff hist diff, by(perm)
2 likes
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 693
#9

29 Aug 2020, 10:35

Thank you all for your comments. Actually, I am working on a publication about resampling methods and of course I want to introduce the options Stata has to offer. Thank you Mike for this very interesting example that is absolutely worth further research and discussion!

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#10

31 Aug 2020, 11:56

during the 1980's, the relationship between bootstrap and randomization (permutation) tests was a subject of much discussion; all that pretty much ended with Romano, JP (1989), "Bootstrap and randomization tests of some nonparametric hypotheses," The Annals of Statistics, 17(1): 141-159 - enjoy
2 likes
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#11

01 Sep 2020, 02:53

The interesting paper that Rich mentioned can be downloaded free of charge from https://projecteuclid.org/euclid.aos/1176347007

Kind regards,
Carlo
(Stata 19.0)
Comment
juliana pinto

Join Date: Oct 2017

Posts: 74
#12

23 Nov 2020, 07:54

Joro Kolev Hi Prof. regarding your example above:

Code:

permute foreign t=_b[foreign]/_se[foreign], rep(10000) nodots seed(69): reg headroom foreign

How would I run this permutation for two or more variables? If for example the regression:

Code:

reg headroom foreign rep78 weight

and I want to randomise the values of foreign and rep78.
If I do as above,

Code:

permute foreign rep78 t=_b[foreign]/_se[foreign], rep(10000) nodots seed(69): reg headroom foreign rep78 weight

Stata just permutes one of them.

Could you please help me? Many thanks.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 693
#13

23 Nov 2020, 10:12

What I read from the documentation is that permute only ever permutes one variable. In your example bootstrapping might offer a comparable solution:

Code:

sysuse auto, clear bootstrap t=_b[foreign]/_se[foreign], reps(9999) nodots seed(69): reg headroom foreign rep78 weight estat bootstrap, bc

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#14

23 Nov 2020, 10:23

Per Felix's comment, I don't think there is a way to make -permute- permute more than one variable at a time. I can think of a possible way to do what you want, as I describe below, but I wonder a bit about whether what you say you want is what you need: The motivation for permuting both (say) foreign and rep78 would be, I presume, to obtain the distribution of the t-value for foreign if foreign and rep78 had no relationship with headroom *and* foreign and rep78 had no relationship with any other predictor (including one another). (Note that -bootstrap-, which resamples whole observations, will not render null the relationship of rep78 to foreign and headroom, so that might not fit what you want. What you say you want seems unusual to me, though possible, since I would think that you'd want to know something about how foreign behaved *while accounting for the fact that rep78 is also related to headroom and foreign.*

Anyway, if you're certain that this is what you want (I'd like to hear what others here say), the only solution I can think of is the user-written command -shufflevar- (available at -ssc-). That command can shuffle a whole varlist, in various ways, and create permuted (shuffled) versions of each of those variables. Using that, I suppose you could create a program something like this, and use it with -postfile- or -simulate-.

Code:

prog mypermute, rclass shufflevar foreign rep78 reg headroom foreign_shuffled rep78_shuffled weight return scalar t=_b[foreign]/_se[foreign drop *shuffled end
1 like
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 693
#15

23 Nov 2020, 10:31

Thanks Mike for this info, that's a good point I did not consider with bootstrap. Given the description from post #12 I assume the solution with shufflevar seems to offer a solution and is also quite neat!

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment

Announcement

Compare permute to ttest

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment