Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standard errors becoming enormous when using bootstrap

    Hi Statalisters,

    My objective is to run an IV regression using --ivreg2-- and obtain a nonlinear combinations of the coefficients obtained through the --nlcom-- command. My independent variable is discrete and ranges from 1-7. There are eight dependent variables, six of which are endogenous, some of which are discrete and some of which are continuous.

    When I run a stand-alone regression and calculate the desired combinations, theoretically reasonable results are returned. When I write a small program and use --bootstrap--, however, the standard errors on the coefficients become enormous. Below I have inserted an example, with some output deleted for brevity, that illustrates the code i have and the results it returns.

    Stand-alone regression:

    Code:
    ivreg2 y var7 var8 (var1-var6 = inst1-inst21), robust
    
     * Recover structural parameters
        forval i=1/8
                
            nlcom _b[var`i']/(_b[var1] + _b[var2] + _b[var3] + _b[var4] + _b[var5] + _b[var6] + _b[var7]+ _b[var8])
            tempname a
            matrix `a' = r(b)
           
            return scalar alpha`i'`x' = `a'[1,1]    
                
        }
    
    
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _nl_1 |   .1996965   .1414433     1.41   0.158    -.0775273    .4769204
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _nl_1 |   .1853465   .1051298     1.76   0.078    -.0207041    .3913972
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _nl_1 |  -.1270809   .2085744    -0.61   0.542    -.5358792    .2817175
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _nl_1 |  -.0358595   .1373886    -0.26   0.794    -.3051362    .2334172
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _nl_1 |   .1144584   .0712662     1.61   0.108    -.0252208    .2541376
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _nl_1 |  -.1707009   .1426043    -1.20   0.231    -.4502003    .1087984
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _nl_1 |   .8341398   .1402829     5.95   0.000     .5591903    1.109089
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _nl_1 |  -.0786737    .015096    -5.21   0.000    -.1082612   -.0490861
    ------------------------------------------------------------------------------
    Bootstrap:

    Code:
    cap program drop myprog
    program define myprog, rclass
    args y
    
    ivreg2 `y' var7 var8 (var1-var6 = inst1-inst21), robust
    
     * Recover structural parameters
        forval i=1/8
                
            nlcom _b[var`i']/(_b[var1] + _b[var2] + _b[var3] + _b[var4] + _b[var5] + _b[var6] + _b[var7]+ _b[var8])
            tempname a
            matrix `a' = r(b)
           
            return scalar alpha`i' = `a'[1,1]    
                
        }
    
    end
    
    bootstrap alpha1 = r(alpha1) alpha2 = r(alpha2) alpha3 = r(alpha3) alpha4 = r(alpha4) ///
    alpha5 = r(alpha5) alpha6 = r(alpha6) alpha7 = r(alpha7) alpha8 = r(alpha8), reps(100) seed(123) myprog yvar
    
    Bootstrap replications (50)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
    ..................................................    50
    
    Bootstrap results                               Number of obs      =      8558
                                                    Replications       =        50
    
          command:  myprog y
    
              alpha1:  r(alpha1)
              alpha2:  r(alpha2)
              alpha3:  r(alpha3)
              alpha4:  r(alpha4)
              alpha5:  r(alpha5)
              alpha6:  r(alpha6)
             alpha7:  r(alpha7)
             alpha8:  r(alpha8)
    
                                    (Replications based on 8558 clusters in mcsid)
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             alpha1 |   .2096561   10.99415     0.02   0.985    -21.33849     21.7578
             alpha2 |   .1915124   8.941921     0.02   0.983    -17.33433    17.71736
             alpha3 |  -.0922069   3.705333    -0.02   0.980    -7.354527    7.170113
             alpha4 |  -.0344993   11.74894    -0.00   0.998    -23.06199    22.99299
             alpha5|   .1248302   9.492833     0.01   0.990    -18.48078    18.73044
             alpha6 |  -.1941109   1.939475    -0.10   0.920    -3.995411    3.607189
            alpha7|   .8777258   20.61115     0.04   0.966    -39.51938    41.27483
            alpha8|  -.0829073   .4029478    -0.21   0.837    -.8726704    .7068557
    ------------------------------------------------------------------------------
    Any suggestions as to where I am going wrong would be much appreciated.

    Thanks in advance,
    Mark

  • #2
    I dont see anything wrong in your code as you present it, but I do notice a couple of weird things about the results:

    1. The Observed Coeff in the bootstrap does not match the actually observed coefficient.

    2. The bootstrap reports "
    (Replications based on 8558 clusters in mcsid) "

    I cannot see anywhere in your code clusters specified.

    In short I kind of dont believe that the code you re presenting generates the output you re presenting.

    Comment


    • #3
      And one more:

      3) in your bootstrap you specify -reps(100)-, but your bootstrap output reports 50 replications:

      Number of obs = 8558 Replications = 50

      Comment


      • #4
        I got curious myself and I run a simplification of your code. It does not show any of the peculiarities encountered in the results from your code.

        But it does replicate the problem you report, the standard error from the nlcom is 10 times larger when bootstrap is used instead of the delta method.

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . 
        . ivreg2 price leng (mpg = head), robust
        
        IV (2SLS) estimation
        --------------------
        
        Estimates efficient for homoskedasticity only
        Statistics robust to heteroskedasticity
        
                                                              Number of obs =       74
                                                              F(  2,    71) =     0.01
                                                              Prob > F      =   0.9891
        Total (centered) SS     =  635065396.1                Centered R2   = -5.1e+02
        Total (uncentered) SS   =   3447834321                Uncentered R2 = -92.4013
        Residual SS             =  3.22032e+11                Root MSE      =    65968
        
        ------------------------------------------------------------------------------
                     |               Robust
               price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 mpg |   18768.93   287896.2     0.07   0.948    -545497.2    583035.1
              length |   3938.032   59613.02     0.07   0.947    -112901.3    120777.4
               _cons |   -1133646   1.73e+07    -0.07   0.948    -3.51e+07    3.28e+07
        ------------------------------------------------------------------------------
        Underidentification test (Kleibergen-Paap rk LM statistic):              0.004
                                                           Chi-sq(1) P-val =    0.9477
        ------------------------------------------------------------------------------
        Weak identification test (Cragg-Donald Wald F statistic):                0.002
                                 (Kleibergen-Paap rk Wald F statistic):          0.004
        Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                                 15% maximal IV size              8.96
                                                 20% maximal IV size              6.66
                                                 25% maximal IV size              5.53
        Source: Stock-Yogo (2005).  Reproduced by permission.
        NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
        ------------------------------------------------------------------------------
        Hansen J statistic (overidentification test of all instruments):         0.000
                                                         (equation exactly identified)
        ------------------------------------------------------------------------------
        Instrumented:         mpg
        Included instruments: length
        Excluded instruments: headroom
        ------------------------------------------------------------------------------
        
        . 
        .  * Recover structural parameters
        .     
        .             
        .         nlcom _b[mpg]/(_b[leng])
        
               _nl_1:  _b[mpg]/(_b[leng])
        
        ------------------------------------------------------------------------------
               price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               _nl_1 |   4.766069   1.065081     4.47   0.000      2.67855    6.853589
        ------------------------------------------------------------------------------
        
        .         tempname a
        
        .         matrix `a' = r(b)
        
        .        
        .         scalar alpha = `a'[1,1]    
        
        .      dis alpha       
        4.7660691
        
        .     
        .          
        . cap program drop myprog
        
        . program define myprog, rclass
          1. 
        . ivreg2 price leng (mpg = head), robust
          2. 
        .  * Recover structural parameters
        .     
        .             
        .         nlcom _b[mpg]/(_b[leng])
          3.         tempname a
          4.         matrix `a' = r(b)
          5.        
        .         return scalar alpha = `a'[1,1]    
          6.             
        .  
        . 
        . end
        
        . 
        . bootstrap alpha = r(alpha), reps(100) seed(123): myprog
        (running myprog on estimation sample)
        
        Bootstrap replications (100)
        ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
        ..................................................    50
        ..................................................   100
        
        Bootstrap results                               Number of obs     =         74
                                                        Replications      =        100
        
              command:  myprog
                alpha:  r(alpha)
        
        ------------------------------------------------------------------------------
                     |   Observed   Bootstrap                         Normal-based
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               alpha |   4.766069   11.41602     0.42   0.676    -17.60893    27.14107
        ------------------------------------------------------------------------------
        
        .

        Comment


        • #5
          Stata's pseudorandom number generator doesn't like 123 as a seed.
          Code:
          version 15.1
          
          clear *
          set seed `=strreverse("1473242")'
          
          quietly sysuse auto
          
          program define bootem, rclass
              version 15.1
              syntax
          
              ivregress 2sls price c.length (mpg = c.headroom), vce(robust)
          
              // return scalar alpha = _b[mpg] / _b[length]
              nlcom _b[mpg] / _b[length]
              tempname Ratio
              matrix define `Ratio' = r(b)
              return scalar alpha = `Ratio'[1, 1]
          end
          
          bootstrap alpha = (_b[mpg] / _b[length]) alphanlcom = r(alpha), reps(100) nodots: bootem
          
          set seed 123
          bootstrap alpha = (_b[mpg] / _b[length]) alphanlcom = r(alpha), reps(100) nodots: bootem
          
          set seed `=strreverse("1473242")'
          bootstrap alpha = (_b[mpg] / _b[length]) alphanlcom = r(alpha), reps(100) nodots: bootem
          
          set seed 0
          bootstrap alpha = (_b[mpg] / _b[length]) alphanlcom = r(alpha), reps(100) nodots: bootem
          
          set seed 123
          bootstrap alpha = (_b[mpg] / _b[length]) alphanlcom = r(alpha), reps(100) nodots: bootem
          
          exit
          I don't know, maybe this is something for Stata Technical Services. In the meantime, keep your RNG happy and use something else.

          Comment


          • #6
            Joro, Joseph,

            Thank you very much for your responses.

            Joro, I can only apologise - I pasted in the wrong results. Below are accurate code and results, this time with the seed not set as 123 - Joseph you will notice that, although it is less pronounced, the problem persists.

            Code:
            * Stand alone
             ivreg2 y var7 var8 (var1-var6 = inst1-inst21), robust  
            
            * Recover structural parameters    
            forval i=1/8{          
                      
                  nlcom _b[var`i']/(_b[var1] + _b[var2] + _b[var3] + _b[var4] + _b[var5] + _b[var6] + _b[var7]+ _b[var8])        
                  tempname a      
                  matrix `a' = r(b)                
                  return scalar alpha`i'`x' = `a'[1,1]                    
            
             }  
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _nl_1 |    .216749   .1520145     1.43   0.154     -.081194     .514692
            ------------------------------------------------------------------------------
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _nl_1 |   .2011736   .1134949     1.77   0.076    -.0212722    .4236194
            ------------------------------------------------------------------------------
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _nl_1 |  -.1379325   .2280048    -0.60   0.545    -.5848138    .3089487
            ------------------------------------------------------------------------------
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _nl_1 |  -.0389216   .1487852    -0.26   0.794    -.3305353    .2526921
            ------------------------------------------------------------------------------
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _nl_1 |   .1242322   .0765993     1.62   0.105    -.0258996     .274364
            ------------------------------------------------------------------------------
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _nl_1 |  -.1852774   .1535271    -1.21   0.228    -.4861849    .1156302
            ------------------------------------------------------------------------------
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _nl_1 |   .9053685   .1657121     5.46   0.000     .5805788    1.230158
            ------------------------------------------------------------------------------
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _nl_1 |  -.0853917   .0177842    -4.80   0.000    -.1202481   -.0505354
            ------------------------------------------------------------------------------
            
            
            * Bootstrap
            cap program drop myprog
            program define myprog, rclass
            args y  ivreg2 `y' var7 var8 (var1-var6 = inst1-inst21), robust  
            
            * Recover structural parameters    
            
            forval i=1/8{                   
            
             nlcom _b[var`i']/(_b[var1] + _b[var2] + _b[var3] + _b[var4] + _b[var5] + _b[var6] + _b[var7]+ _b[var8])        
            tempname a        
            matrix `a' = r(b)                
            return scalar alpha`i' = `a'[1,1]                      
            
            }  
            
            end  
            
            bootstrap alpha1 = r(alpha1) alpha2 = r(alpha2) alpha3 = r(alpha3) alpha4 = r(alpha4) ///
            alpha5 = r(alpha5) alpha6 = r(alpha6) alpha7 = r(alpha7) alpha8 = r(alpha8), reps(100) seed(78) myprog yvar
            (running myprog on estimation sample)  
            
            Bootstrap replications (100)
            ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5  
            ..................................................    50
            ..................................................   100
            
             Bootstrap results                               Number of obs      =      8557                                                
                                                                       Replications       =       100        
            command:  myprog yvar        
            alpha1:  r(var1)        
            alpha2:  r(var2)        
            alpha3:  r(var3)        
            alpha4:  r(var4)        
            alpha5:  r(var5)        
            alpha6:  r(var6)      
             alpha7:  r(var7)      
             alpha8:  r(var8)
             ------------------------------------------------------------------------------              
            |   Observed   Bootstrap                         Normal-based              
            |      Coef.        Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------      
            alpha1 |    .216749   1.522343     0.14   0.887    -2.766989    3.200487      
            alpha2 |   .2011736   .9371533     0.21   0.830    -1.635613     2.03796      
            alpha3 |  -.1379325   1.253288    -0.11   0.912    -2.594332    2.318467      
            alpha4 |  -.0389216   1.216812    -0.03   0.974     -2.42383    2.345986      
            alpha5 |   .1242322    .741305     0.17   0.867    -1.328699    1.577163      
            alpha6 |  -.1852774    1.23634    -0.15   0.881    -2.608459    2.237904      
            alpha7 |   .9053685   .4861818     1.86   0.063    -.0475304    1.858267      
            alpha8 |  -.0853917   .0660953    -1.29   0.196    -.2149361    .0441526
            ------------------------------------------------------------------------------
            Any further suggestions would be welcomed.

            Apologies again for my sloppy pasting,
            Mark
            Last edited by Mark Mitchell; 04 Dec 2018, 08:35.

            Comment


            • #7
              Hi all,
              interesting topic. It seems that seed 123 produces some sequences of odd draws. I.e. either the drawn sample from the bootstrap is very close to the observed one or one or more observations are drawn multiple times. This then leads to outliers.

              Mark, what happens if you increase the number of repetitions to lets say 500? My experience from a similar application is that outliers can have a huge effect if the number of repetitions is "small". Especially if an equation such as yours is estimated resp. bootstrapped.
              Intuitively my explanation is the following: assume that across the bootstrap repetitions all but one estimated coefficients are very similar to the one obtained from the estimated coefficients. The one which differs a lot, does by a margin. Then this will in- or deflate the results of the calculation. For example, assume that the coefficient of var1 varies a lot across the bootstrap repetitions, the others not. The bootstrapped SE is roughly 10 times the one obtained by the delta method, this hints that the coefficients in the bootstrap runs vary much more than in the real sample. The calculated coefficient for var1 might be stable, because the variable occurs in the denominator and the nominator, but all other coefficients will be smaller. This then in- or deflates standard errors. Besides increasing the number of repetitions, what I found helpful is to look at the actual bootstrapped coefficients, hence var1-var8 and how the non-bootstrapped and bootstrapped standard errors differ.

              Another thing is, I noted that you assume var7 and var8 as exogenous. It appears both have the smallest standard error and the ones obtained by bootstrap and the delta method are the closest.

              Hope this helps,
              Jan

              Comment


              • #8
                My first reaction too was that the problem might be caused by the small number of replications (100 reps), but not.

                When I jack up the number of replication to 10,000, the problem gets worse. The standard error of the nonlinear combination grows from
                1.065 (delta method) to
                11.416 (bootstrap, 100 reps) to 24.720 (bootstrap, 10,000 reps).

                Code:
                . bootstrap alpha = r(alpha), reps(10000) nodots seed(123): myprog
                
                Bootstrap results                               Number of obs     =         74
                                                                Replications      =      9,942
                
                      command:  myprog
                        alpha:  r(alpha)
                
                ------------------------------------------------------------------------------
                             |   Observed   Bootstrap                         Normal-based
                             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                       alpha |   4.766069   24.72033     0.19   0.847    -43.68489    53.21703
                ------------------------------------------------------------------------------
                Note: One or more parameters could not be estimated in 58 bootstrap replicates;
                      standard-error estimates include only complete replications.

                Comment


                • #9
                  Also it does not seem that the problem is what Joseph suggested, the problem does not seem to be in the seed.

                  When I set the seed to a large number seed(98769876), I observe the same patters as above.

                  The standard error grows from 1.065 (delta method) to 4.521 (bootstrap, 100 reps) to 22.637 (bootstrap, 10,000 reps).

                  Comment


                  • #10
                    Originally posted by Joro Kolev View Post
                    Also it does not seem that the problem is what Joseph suggested, the problem does not seem to be in the seed.
                    Okay, so I don't quit my day job. (But I will try to keep my RNG happy.)

                    Anyway, take a look at your denominator—the coefficient for length (4000 ± 60 000) straddles zero. Don't bootstrap that thing.

                    I assume that that also obtains with the denominators of the OP's problematic ratios.

                    Comment


                    • #11
                      Jospeh, this is a pretty good hint and follows up on my post. In theory in a large sample, standard errors from nlcom (or the delta method) and bootstrapped are equal. Thus if the standard error obtained by nlcom is large, the bootstrapped coefficients vary a lot and thus their standard error. If you then divide by this coefficient, the change in the product (in Joro's and Mark's case alpha) will be overstated if the coefficient is close to zero. It might actually have been zero, the bootstrap failed in 58 cases!
                      This is actually supported by Mark's results as well. Those coefficients with low z-values using nlcom, have a similar small z-value and thus large standard error obtained by bootstrap.

                      The length of the seed shouldn't make a difference. The seed is mapped into a random number state, which then follows a deterministic sequence. If the seed is 1 or 123456789, in both cases it will be mapped to a point on the sequence, thus neither is good or bad.

                      Comment


                      • #12
                        Jan, thanks very much for the detailed response. My intuition after re-setting the seed as per Joseph's suggestion was along the lines of what you suggest - there must be large variability across bootstrap samples in one of my coefficients that is "infecting" the estimated standard errors. As Joro points out, increasing the number of repetitions (I have tried with 1000 and 10000) in fact worsens the issue. This again points to your suggestions about variation in at least one coefficient across samples.

                        I haven't had time as of yet but will try and pinpoint the problematic coefficients, exclude them from the bootstrapping and see if this makes a difference.

                        All, thanks again for the help with this, it's been extremely helpful.

                        Mark

                        Comment


                        • #13
                          Hallo Mark,
                          did you find a solution to your problem? I am experiencing something similar. Thanks!

                          Comment


                          • #14
                            Hi Mark,


                            Did you ever solve this explosive SEs issue? Please, if you did, could you share your solution? It would really help me now.


                            Also, any other person who has the solution to this issue of explosive standard errors should also share, please.


                            Thanks,
                            Davdimac

                            Comment

                            Working...
                            X