How can I randomly sample groups with replacement from existing data?

Hannah Redbury

Join Date: May 2017

Posts: 2
#1

How can I randomly sample groups with replacement from existing data?

27 May 2017, 10:40

Hi there,
I have almost finished my dissertation and I have just one last statistical flourish I want to do to make it beautiful - perhaps you can help.

I have a sample of 20 men and 20 women who each have a performance score. I want to draw 10,000 groups of 4 (with 2 men and 2 women per group) with replacement at random and calculate the percentage of groups in which a man wins vs the percentage of groups in which a woman wins, according to their performance score.

I am not very practiced at stata, but i know this will have something to do with the bsample command. however i am unsure as to what exactly I need to type in as a command.

I would be hugely grateful for any advice!
Thanks
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

27 May 2017, 11:04

Hannah:
welcome to the list.
The following thread might be on-target: http://www.statalist.org/forums/foru...mple-from-data

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Hannah Redbury

Join Date: May 2017

Posts: 2
#3

28 May 2017, 05:11

thank you!
Comment

John Mullahy

Join Date: Dec 2016
Posts: 751

28 May 2017, 06:56

Hanna: You may already be well aware of this, but what you are describing seems related to permutation testing. See

Code:

help permute

for instance.

Apart from this what Carlo suggests should be very helpful.

In addition, I often find that doing simulations like this is—for me—easier in Mata than in Stata itself. If you are familiar with Mata you might consider trying it. A simple little function I use to draw samples of size N<=T with replacement from a Tx1 vector x is

Code:

function bootn(x,n) return((x[ceil(rows(x):*uniform(rows(x),1)),.])[1..n,.])

A simple example would be

Code:

. mata
------------------------------------------------- mata (type end to exit) -------------------------------------
: rseed(22)

: x=ceil(10:*uniform(20,1))
 
: x'
        1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20
    +-----------------------------------------------------------------------------------------------------+
  1 |   1    6   10    1    9    1   10    4    2    8    3    4    1    1    3    1    9   10   10    8  |
    +-----------------------------------------------------------------------------------------------------+

 
: bootn(x,2)
       1
    +-----+
  1 |  1  |
  2 |  1  |
    +-----+
 
: bootn(x,2)
       1
    +-----+
  1 |  3  |
  2 |  4  |
    +-----+
 
: bootn(x,2)
        1
    +------+
  1 |   6  |
  2 |  10  |
    +------+

You could then embed this in a "for" or "while" loop that does the relevant calculations 10,000 times. Of course, it you are not familiar with Mata and you are "almost finished" your dissertation, then perhaps now is not the best time to learn Mata. 😀

Comment

Mike Lacy

Join Date: Apr 2014

Posts: 2416
#5

28 May 2017, 10:40

I agree with John here that -permute- is likely relevant. The reasoning here is that since -bootstrap-, by sampling with replacement, does not enforce a null hypothesis (i.e., samples are drawn from a population with the observed distribution, in which the null is not in general true.)

Notwithstanding that, there is a relatively straightforward -bootstrap- solution to what you *ask* for, which may not be want you want/need.

Code:

clear // Create some data for illustration set seed 475423 set obs 40 gen byte female = _n > 20 gen score = (runiform() + 0.15 * female) // // Program to be bootstrapped cap prog drop FemaleWins prog FemaleWins, rclass sort score // Note that my bootstrap command designates two individuals per stratum, // meaning N = 4. Therefore, the person at the 4th position is the winner. return scalar fwin = (female[4] ==1) end // tempfile temp bootstrap fwin = r(fwin), /// size(2) saving(`temp',replace) strata(female) reps(1000): FemaleWins use `temp', clear tab fwin

Permutation approaches, by shuffling the observed data, involve sampling *without* replacement, and do enforce the null hypothesis. In that conceptual framework, your question would be something like: "What percentage of times would a female win if
a) 2 females were drawn without replacement, and two males were drawn without replacement;
and
b) that sample of 4 individuals had their performance scores shuffled
and
c) The winner is a female if the person with highest randomly assigned score is female

I suspect there is a straightforward way to use -permute- to do this, but I can't think of it offhand.

Last edited by Mike Lacy; 28 May 2017, 11:24. Reason: Example data should have first 20 assigned as males, not first 21.
2 likes
Comment

Announcement