How does Stata select bootstrap samples?

Bob Jones

Join Date: Dec 2018

Posts: 10
#1

How does Stata select bootstrap samples?

21 Dec 2018, 15:57

If I understand it correctly, when using the -vce(bootstrap)- command for a regression, Stata will randomly draw _N observations from the sample 50 times (by default) to calculate the t statistic.

Stata provides users the ability to control the size of bootstrapped samples with the -size(#)- parameter. However, I'm a bit confused as to what Stata does if -size(#)- is not defined. The help section says by default _N is the number of observations in the sample... so is Stata drawing _N observations for every bootstrapped sample?

For example, if I have a sample of 10,000 observations, when bootstrapping does Stata draw 10,000 observations of each bootstrap repetition? How does that make sense?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#2

21 Dec 2018, 16:19

You are correct in understanding that the default size of the bootstrapped sample is _n. So, yes, if your sample has 10,000 observations, each bootstrap sample will also have 10,000 observations. The way this makes sense is to bear in mind that bootstrap sampling is sampling with replacement. In other words, some of the original 10,000 observations will appear multiple times in a bootstrap sample, and others will not appear at all.

To see how it works, run this:

Code:

clear* set obs 10 gen x = _n capture program drop once program define once bsample list end forvalues i = 1/5 { once }
3 likes
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#3

21 Dec 2018, 16:49

As Clyde explained, the samples are by default of size _N but with replacement. So on every next round you get a different sample.

The option -size(#)- is if you want to sample with replacement, but samples of size less than _N. This is advanced use, most probably you will not need this.
Comment
Bob Jones

Join Date: Dec 2018

Posts: 10
#4

21 Dec 2018, 19:49

Originally posted by Clyde Schechter View Post

You are correct in understanding that the default size of the bootstrapped sample is _n. So, yes, if your sample has 10,000 observations, each bootstrap sample will also have 10,000 observations. The way this makes sense is to bear in mind that bootstrap sampling is sampling with replacement. In other words, some of the original 10,000 observations will appear multiple times in a bootstrap sample, and others will not appear at all.

To see how it works, run this:

Code:

clear* set obs 10 gen x = _n capture program drop once program define once bsample list end forvalues i = 1/5 { once }

Ahhhh, I see. That makes sense now. Thank!
Comment

Announcement

How does Stata select bootstrap samples?

Comment

Comment

Comment