What is wrong with my bootstrap program, and why does it give the same estimates in every bootstrap sample?

Joro Kolev

Join Date: Aug 2018
Posts: 3050

What is wrong with my bootstrap program, and why does it give the same estimates in every bootstrap sample?

18 Aug 2020, 04:47

Lets say that I am trying to compare -reg3- estimates over two different subsamples, using bootstrap. My bootstrap program is like this:

Code:

sysuse auto, clear

cap prog drop myboot

prog define myboot, rclass

reg3 ( price mpg) ( weight length)

sca Pricempg = [price]mpg

reg3 ( price mpg) ( weight length) if foreign==1

return sca Diff = Pricempg - [price]mpg

end

bootstrap Diff=r(Diff), reps(100) : myboot

My program seems correct, on one run it calculates what it is supposed to calculate. However, when I bootstrap my programme, Stata return this, meaning that my statistic Diff is identically 0 in all bootstrap samples.

Code:

. bootstrap Diff=r(Diff), reps(100) : myboot
(running myboot on estimation sample)

Bootstrap replications (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100

Bootstrap results                               Number of obs     =         22
                                                Replications      =        100

      command:  myboot
         Diff:  r(Diff)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        Diff |   20.35371          .        .       .            .           .
------------------------------------------------------------------------------

.

What is wrong here?

(This issue originated in the thread
https://www.statalist.org/forums/for...nts-after-reg3
I could not figure it out there, hence I am repeating it here with hopefully more informative title of the post)

Last edited by Joro Kolev; 18 Aug 2020, 04:50.

Tags: None

FernandoRios

Join Date: Apr 2014

Posts: 2469
#2

18 Aug 2020, 05:11

Hi Joro
I think the problem has to do with how the sample is set up. As your program is written, the estimation sample recognized by bootstrap is the same as the second regression, which considers only foreign cars.
What I find useful in this case is write the program as follows:

Code:

sysuse auto, clear cap prog drop myboot prog define myboot, eclass reg3 ( price mpg) ( weight length) scalar Pricempg = [price]mpg reg3 ( price mpg) ( weight length) if foreign==1 matrix Diff = [Pricempg - [price]mpg] ereturn post Diff end bootstrap , reps(100) saving(s, replace) : myboot

HTH
Fernando
1 like
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

18 Aug 2020, 09:04

Thank you very much, Fernando. You are showing me some serious black magic here, of which I was totally unaware, and which I will keep in mind in the future... So basically changing the program from rclass to eclass resolves the problem !

I also thought that how the sample is set is the root of the problem, but my attempted solution to this (which did not work) was

Code:

. cap prog drop myboot

. 
. prog define myboot, rclass
  1. 
. reg3 ( price mpg) ( weight length)
  2. 
. sca Pricempg = [price]mpg
  3. 
. keep if foreign==1
  4. 
. reg3 ( price mpg) ( weight length) 
  5. 
. return sca Diff = Pricempg - [price]mpg
  6. 
. end

. 
. bootstrap Diff=r(Diff), reps(100) : myboot
(running myboot on estimation sample)

Bootstrap replications (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100

Bootstrap results                               Number of obs     =         22
                                                Replications      =        100

      command:  myboot
         Diff:  r(Diff)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        Diff |   20.35371          .        .       .            .           .
------------------------------------------------------------------------------

Originally posted by FernandoRios View Post

Hi Joro
I think the problem has to do with how the sample is set up. As your program is written, the estimation sample recognized by bootstrap is the same as the second regression, which considers only foreign cars.
What I find useful in this case is write the program as follows:

Code:

sysuse auto, clear

cap prog drop myboot

prog define myboot, eclass

reg3 ( price mpg) ( weight length)

scalar Pricempg = [price]mpg

reg3 ( price mpg) ( weight length) if foreign==1

matrix Diff = [Pricempg - [price]mpg]
ereturn post Diff
end

bootstrap , reps(100) saving(s, replace) : myboot

HTH
Fernando

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2469
#4

18 Aug 2020, 09:50

As I call it, more than black magic is a black box that I try to break down and buildup whenever possible.

So problem is the following. Bootstrap uses two pieces of information when doing the resampling.
1, the statistic you are interested in (DIFF)
and 2 the sample that is used as baseline from which take the bootstrap samples.
If you run my code, you will see a warning message about myboot not setting e(sample).

Now, in your original code, the last equation to be estimated restricts the sample to foreign==1, thus, the bootstrap is only considering that subsample to draw the Bsamples.
While im not sure how to change the "sample" using rclass program, I think a solution here is to simply flip the order of the estimation. However this would be a case specific solution.

Code:

sysuse auto, clear cap prog drop myboot prog define myboot, rclass reg3 ( price mpg) ( weight length) if foreign==1 sca Pricempg = [price]mpg reg3 ( price mpg) ( weight length) return sca Diff = [price]mpg-Pricempg end bootstrap Diff=r(Diff), reps(100) : myboot
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2402

18 Aug 2020, 10:45

There are 3 alternatives I can think of, and Fernando has provided a great insight here.

1) Recast the problem to to use -simulate- and keep your program as an r-class. This is the least idea solution since it requires you to manage your own bootstrap sampling, so I didn't try it.

2) Use -bootstrap- with an -rclass- program that clears estimation results at the end of its execution. Stata will complain, and you will need to be careful about selecting the correct observations. If you are not careful, bootstrap sampling and estimation quantities will be wrong. See -myboot1- and -myboot1wrong-. Even though I intended to use a subset of the data, resampling and estimation took place on the whole sample.

3) Use an -eclass- command and set the e(sample), but still be careful and which post-estimation results are retrieved. See myboot2.

Code:

sysuse auto, clear

cap prog drop myboot1
prog define myboot1, rclass
  version 16
  syntax [in] [if]
  marksample touse
  reg3 ( price mpg) ( weight length) if `touse'
  scalar Pricempg = [price]mpg
  return scalar Nobs = e(N)
  reg3 ( price mpg) ( weight length) if `touse' & foreign==1
  scalar Diff = Pricempg - [price]mpg
  ereturn clear
  return scalar Diff = Diff
end

cap prog drop myboot1wrong
prog define myboot1wrong, rclass
  version 16
  syntax [in] [if]
  reg3 ( price mpg) ( weight length)
  scalar Pricempg = [price]mpg
  return scalar Nobs = e(N)
  reg3 ( price mpg) ( weight length) if foreign==1
  scalar Diff = Pricempg - [price]mpg
  ereturn clear
  return scalar Diff = Diff
end

cap prog drop myboot2
prog define myboot2, eclass
  version 16
  syntax [in] [if]
  marksample touse
 
  tempvar esample
  gen byte `esample' = `touse'
 
  reg3 ( price mpg) ( weight length) if `touse'
  scalar Pricempg = [price]mpg
  scalar Nobs = e(N)
 
  reg3 ( price mpg) ( weight length) if `touse' & foreign==1
  scalar Diff = Pricempg - [price]mpg
 
  ereturn post, esample(`esample')
  ereturn scalar Nobs = Nobs
  ereturn scalar Diff = Diff
end

bootstrap Diff=r(Diff) Nobs=r(Nobs), seed(17) reps(100) saving(s, replace) : myboot1
bootstrap Diff=r(Diff) Nobs=r(Nobs), seed(17) reps(100) saving(s, replace) : myboot1wrong in 1/65
bootstrap Diff=e(Diff) Nobs=e(Nobs), seed(17) reps(100) saving(s, replace) : myboot2

Returns

Code:

. bootstrap Diff=r(Diff) Nobs=r(Nobs), seed(17) reps(100) saving(s, replace) : myboot1

Bootstrap results                               Number of obs     =         74
                                                Replications      =        100

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        Diff |   20.35371   83.68171     0.24   0.808    -143.6594    184.3669
        Nobs |         74          .        .       .            .           .
------------------------------------------------------------------------------

. bootstrap Diff=r(Diff) Nobs=r(Nobs), seed(17) reps(100) saving(s, replace) : myboot1wrong in 1/65

Bootstrap results                               Number of obs     =         65
                                                Replications      =        100

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        Diff |   20.35371   127.3366     0.16   0.873    -229.2215    269.9289
        Nobs |         74          .        .       .            .           .
------------------------------------------------------------------------------

. bootstrap Diff=e(Diff) Nobs=e(Nobs), seed(17) reps(100) saving(s, replace) : myboot2

Bootstrap results                               Number of obs     =         74
                                                Replications      =        100

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        Diff |   20.35371   83.68171     0.24   0.808    -143.6594    184.3669
        Nobs |         74          .        .       .            .           .
------------------------------------------------------------------------------

Announcement

What is wrong with my bootstrap program, and why does it give the same estimates in every bootstrap sample?

Comment

Comment

Comment

Comment