Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I randomly sample groups with replacement from existing data?

    Hi there,
    I have almost finished my dissertation and I have just one last statistical flourish I want to do to make it beautiful - perhaps you can help.

    I have a sample of 20 men and 20 women who each have a performance score. I want to draw 10,000 groups of 4 (with 2 men and 2 women per group) with replacement at random and calculate the percentage of groups in which a man wins vs the percentage of groups in which a woman wins, according to their performance score.

    I am not very practiced at stata, but i know this will have something to do with the bsample command. however i am unsure as to what exactly I need to type in as a command.

    I would be hugely grateful for any advice!
    Thanks

  • #2
    Hannah:
    welcome to the list.
    The following thread might be on-target: http://www.statalist.org/forums/foru...mple-from-data
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      thank you!

      Comment


      • #4
        Hanna: You may already be well aware of this, but what you are describing seems related to permutation testing. See
        Code:
        help permute
        for instance.

        Apart from this what Carlo suggests should be very helpful.

        In addition, I often find that doing simulations like this is—for me—easier in Mata than in Stata itself. If you are familiar with Mata you might consider trying it. A simple little function I use to draw samples of size N<=T with replacement from a Tx1 vector x is
        Code:
        function bootn(x,n) return((x[ceil(rows(x):*uniform(rows(x),1)),.])[1..n,.])
        A simple example would be
        Code:
        . mata
        ------------------------------------------------- mata (type end to exit) -------------------------------------
        : rseed(22)
        
        : x=ceil(10:*uniform(20,1))
         
        : x'
                1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20
            +-----------------------------------------------------------------------------------------------------+
          1 |   1    6   10    1    9    1   10    4    2    8    3    4    1    1    3    1    9   10   10    8  |
            +-----------------------------------------------------------------------------------------------------+
        
         
        : bootn(x,2)
               1
            +-----+
          1 |  1  |
          2 |  1  |
            +-----+
         
        : bootn(x,2)
               1
            +-----+
          1 |  3  |
          2 |  4  |
            +-----+
         
        : bootn(x,2)
                1
            +------+
          1 |   6  |
          2 |  10  |
            +------+
        You could then embed this in a "for" or "while" loop that does the relevant calculations 10,000 times. Of course, it you are not familiar with Mata and you are "almost finished" your dissertation, then perhaps now is not the best time to learn Mata. 😀

        Comment


        • #5
          I agree with John here that -permute- is likely relevant. The reasoning here is that since -bootstrap-, by sampling with replacement, does not enforce a null hypothesis (i.e., samples are drawn from a population with the observed distribution, in which the null is not in general true.)

          Notwithstanding that, there is a relatively straightforward -bootstrap- solution to what you *ask* for, which may not be want you want/need.
          Code:
          clear
          // Create some data for illustration
          set seed 475423
          set obs 40
          gen byte female = _n > 20
          gen score = (runiform() + 0.15 * female)
          //
          // Program to be bootstrapped
          cap prog drop FemaleWins
          prog FemaleWins, rclass
          sort score
          //  Note that my bootstrap command designates two individuals per stratum,
          // meaning N = 4.  Therefore, the person at the 4th position is the winner.
          return scalar fwin = (female[4] ==1)
          end
          //
          tempfile temp
          bootstrap fwin = r(fwin), ///
             size(2) saving(`temp',replace) strata(female) reps(1000): FemaleWins
          use `temp', clear
          tab fwin
          Permutation approaches, by shuffling the observed data, involve sampling *without* replacement, and do enforce the null hypothesis. In that conceptual framework, your question would be something like: "What percentage of times would a female win if
          a) 2 females were drawn without replacement, and two males were drawn without replacement;
          and
          b) that sample of 4 individuals had their performance scores shuffled
          and
          c) The winner is a female if the person with highest randomly assigned score is female

          I suspect there is a straightforward way to use -permute- to do this, but I can't think of it offhand.
          Last edited by Mike Lacy; 28 May 2017, 11:24. Reason: Example data should have first 20 assigned as males, not first 21.

          Comment

          Working...
          X