Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is wrong with my bootstrap program, and why does it give the same estimates in every bootstrap sample?

    Lets say that I am trying to compare -reg3- estimates over two different subsamples, using bootstrap. My bootstrap program is like this:

    Code:
    sysuse auto, clear
    
    cap prog drop myboot
    
    prog define myboot, rclass
    
    reg3 ( price mpg) ( weight length)
    
    sca Pricempg = [price]mpg
    
    reg3 ( price mpg) ( weight length) if foreign==1
    
    return sca Diff = Pricempg - [price]mpg
    
    end
    
    bootstrap Diff=r(Diff), reps(100) : myboot
    My program seems correct, on one run it calculates what it is supposed to calculate. However, when I bootstrap my programme, Stata return this, meaning that my statistic Diff is identically 0 in all bootstrap samples.

    Code:
    . bootstrap Diff=r(Diff), reps(100) : myboot
    (running myboot on estimation sample)
    
    Bootstrap replications (100)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
    ..................................................    50
    ..................................................   100
    
    Bootstrap results                               Number of obs     =         22
                                                    Replications      =        100
    
          command:  myboot
             Diff:  r(Diff)
    
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            Diff |   20.35371          .        .       .            .           .
    ------------------------------------------------------------------------------
    
    .
    What is wrong here?

    (This issue originated in the thread
    https://www.statalist.org/forums/for...nts-after-reg3
    I could not figure it out there, hence I am repeating it here with hopefully more informative title of the post)
    Last edited by Joro Kolev; 18 Aug 2020, 04:50.

  • #2
    Hi Joro
    I think the problem has to do with how the sample is set up. As your program is written, the estimation sample recognized by bootstrap is the same as the second regression, which considers only foreign cars.
    What I find useful in this case is write the program as follows:

    Code:
    sysuse auto, clear
    
    cap prog drop myboot
    
    prog define myboot, eclass
    
    reg3 ( price mpg) ( weight length)
    
    scalar Pricempg = [price]mpg
    
    reg3 ( price mpg) ( weight length) if foreign==1
    
    matrix Diff = [Pricempg - [price]mpg]
    ereturn post Diff
    end
    
    bootstrap ,   reps(100) saving(s, replace) : myboot
    HTH
    Fernando

    Comment


    • #3
      Thank you very much, Fernando. You are showing me some serious black magic here, of which I was totally unaware, and which I will keep in mind in the future... So basically changing the program from rclass to eclass resolves the problem !

      I also thought that how the sample is set is the root of the problem, but my attempted solution to this (which did not work) was
      Code:
      . cap prog drop myboot
      
      . 
      . prog define myboot, rclass
        1. 
      . reg3 ( price mpg) ( weight length)
        2. 
      . sca Pricempg = [price]mpg
        3. 
      . keep if foreign==1
        4. 
      . reg3 ( price mpg) ( weight length) 
        5. 
      . return sca Diff = Pricempg - [price]mpg
        6. 
      . end
      
      . 
      . bootstrap Diff=r(Diff), reps(100) : myboot
      (running myboot on estimation sample)
      
      Bootstrap replications (100)
      ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
      ..................................................    50
      ..................................................   100
      
      Bootstrap results                               Number of obs     =         22
                                                      Replications      =        100
      
            command:  myboot
               Diff:  r(Diff)
      
      ------------------------------------------------------------------------------
                   |   Observed   Bootstrap                         Normal-based
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
              Diff |   20.35371          .        .       .            .           .
      ------------------------------------------------------------------------------

      Originally posted by FernandoRios View Post
      Hi Joro
      I think the problem has to do with how the sample is set up. As your program is written, the estimation sample recognized by bootstrap is the same as the second regression, which considers only foreign cars.
      What I find useful in this case is write the program as follows:

      Code:
      sysuse auto, clear
      
      cap prog drop myboot
      
      prog define myboot, eclass
      
      reg3 ( price mpg) ( weight length)
      
      scalar Pricempg = [price]mpg
      
      reg3 ( price mpg) ( weight length) if foreign==1
      
      matrix Diff = [Pricempg - [price]mpg]
      ereturn post Diff
      end
      
      bootstrap , reps(100) saving(s, replace) : myboot
      HTH
      Fernando

      Comment


      • #4
        As I call it, more than black magic is a black box that I try to break down and buildup whenever possible.

        So problem is the following. Bootstrap uses two pieces of information when doing the resampling.
        1, the statistic you are interested in (DIFF)
        and 2 the sample that is used as baseline from which take the bootstrap samples.
        If you run my code, you will see a warning message about myboot not setting e(sample).

        Now, in your original code, the last equation to be estimated restricts the sample to foreign==1, thus, the bootstrap is only considering that subsample to draw the Bsamples.
        While im not sure how to change the "sample" using rclass program, I think a solution here is to simply flip the order of the estimation. However this would be a case specific solution.
        Code:
        sysuse auto, clear
        cap prog drop myboot
        prog define myboot, rclass
        reg3 ( price mpg) ( weight length) if foreign==1
        sca Pricempg = [price]mpg
        reg3 ( price mpg) ( weight length)
        return sca Diff = [price]mpg-Pricempg  
        end
        bootstrap Diff=r(Diff), reps(100) : myboot

        Comment


        • #5
          There are 3 alternatives I can think of, and Fernando has provided a great insight here.

          1) Recast the problem to to use -simulate- and keep your program as an r-class. This is the least idea solution since it requires you to manage your own bootstrap sampling, so I didn't try it.

          2) Use -bootstrap- with an -rclass- program that clears estimation results at the end of its execution. Stata will complain, and you will need to be careful about selecting the correct observations. If you are not careful, bootstrap sampling and estimation quantities will be wrong. See -myboot1- and -myboot1wrong-. Even though I intended to use a subset of the data, resampling and estimation took place on the whole sample.

          3) Use an -eclass- command and set the e(sample), but still be careful and which post-estimation results are retrieved. See myboot2.

          Code:
          sysuse auto, clear
          
          cap prog drop myboot1
          prog define myboot1, rclass
            version 16
            syntax [in] [if]
            marksample touse
            reg3 ( price mpg) ( weight length) if `touse'
            scalar Pricempg = [price]mpg
            return scalar Nobs = e(N)
            reg3 ( price mpg) ( weight length) if `touse' & foreign==1
            scalar Diff = Pricempg - [price]mpg
            ereturn clear
            return scalar Diff = Diff
          end
          
          cap prog drop myboot1wrong
          prog define myboot1wrong, rclass
            version 16
            syntax [in] [if]
            reg3 ( price mpg) ( weight length)
            scalar Pricempg = [price]mpg
            return scalar Nobs = e(N)
            reg3 ( price mpg) ( weight length) if foreign==1
            scalar Diff = Pricempg - [price]mpg
            ereturn clear
            return scalar Diff = Diff
          end
          
          cap prog drop myboot2
          prog define myboot2, eclass
            version 16
            syntax [in] [if]
            marksample touse
           
            tempvar esample
            gen byte `esample' = `touse'
           
            reg3 ( price mpg) ( weight length) if `touse'
            scalar Pricempg = [price]mpg
            scalar Nobs = e(N)
           
            reg3 ( price mpg) ( weight length) if `touse' & foreign==1
            scalar Diff = Pricempg - [price]mpg
           
            ereturn post, esample(`esample')
            ereturn scalar Nobs = Nobs
            ereturn scalar Diff = Diff
          end
          
          bootstrap Diff=r(Diff) Nobs=r(Nobs), seed(17) reps(100) saving(s, replace) : myboot1
          bootstrap Diff=r(Diff) Nobs=r(Nobs), seed(17) reps(100) saving(s, replace) : myboot1wrong in 1/65
          bootstrap Diff=e(Diff) Nobs=e(Nobs), seed(17) reps(100) saving(s, replace) : myboot2
          Returns

          Code:
          . bootstrap Diff=r(Diff) Nobs=r(Nobs), seed(17) reps(100) saving(s, replace) : myboot1
          
          Bootstrap results                               Number of obs     =         74
                                                          Replications      =        100
          
          ------------------------------------------------------------------------------
                       |   Observed   Bootstrap                         Normal-based
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                  Diff |   20.35371   83.68171     0.24   0.808    -143.6594    184.3669
                  Nobs |         74          .        .       .            .           .
          ------------------------------------------------------------------------------
          
          . bootstrap Diff=r(Diff) Nobs=r(Nobs), seed(17) reps(100) saving(s, replace) : myboot1wrong in 1/65
          
          Bootstrap results                               Number of obs     =         65
                                                          Replications      =        100
          
          ------------------------------------------------------------------------------
                       |   Observed   Bootstrap                         Normal-based
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                  Diff |   20.35371   127.3366     0.16   0.873    -229.2215    269.9289
                  Nobs |         74          .        .       .            .           .
          ------------------------------------------------------------------------------
          
          . bootstrap Diff=e(Diff) Nobs=e(Nobs), seed(17) reps(100) saving(s, replace) : myboot2
          
          Bootstrap results                               Number of obs     =         74
                                                          Replications      =        100
          
          ------------------------------------------------------------------------------
                       |   Observed   Bootstrap                         Normal-based
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                  Diff |   20.35371   83.68171     0.24   0.808    -143.6594    184.3669
                  Nobs |         74          .        .       .            .           .
          ------------------------------------------------------------------------------

          Comment

          Working...
          X