Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How does Stata select bootstrap samples?

    If I understand it correctly, when using the -vce(bootstrap)- command for a regression, Stata will randomly draw _N observations from the sample 50 times (by default) to calculate the t statistic.

    Stata provides users the ability to control the size of bootstrapped samples with the -size(#)- parameter. However, I'm a bit confused as to what Stata does if -size(#)- is not defined. The help section says by default _N is the number of observations in the sample... so is Stata drawing _N observations for every bootstrapped sample?

    For example, if I have a sample of 10,000 observations, when bootstrapping does Stata draw 10,000 observations of each bootstrap repetition? How does that make sense?

  • #2
    You are correct in understanding that the default size of the bootstrapped sample is _n. So, yes, if your sample has 10,000 observations, each bootstrap sample will also have 10,000 observations. The way this makes sense is to bear in mind that bootstrap sampling is sampling with replacement. In other words, some of the original 10,000 observations will appear multiple times in a bootstrap sample, and others will not appear at all.

    To see how it works, run this:
    Code:
    clear*
    set obs 10
    gen x = _n
    
    capture program drop once
    program define once
        bsample
        list
    end
    
    forvalues i = 1/5 {
        once
    }

    Comment


    • #3
      As Clyde explained, the samples are by default of size _N but with replacement. So on every next round you get a different sample.

      The option -size(#)- is if you want to sample with replacement, but samples of size less than _N. This is advanced use, most probably you will not need this.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        You are correct in understanding that the default size of the bootstrapped sample is _n. So, yes, if your sample has 10,000 observations, each bootstrap sample will also have 10,000 observations. The way this makes sense is to bear in mind that bootstrap sampling is sampling with replacement. In other words, some of the original 10,000 observations will appear multiple times in a bootstrap sample, and others will not appear at all.

        To see how it works, run this:
        Code:
        clear*
        set obs 10
        gen x = _n
        
        capture program drop once
        program define once
        bsample
        list
        end
        
        forvalues i = 1/5 {
        once
        }
        Ahhhh, I see. That makes sense now. Thank!

        Comment

        Working...
        X