Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with bootstrap in obtaining a standard error.

    Hi Everyone: I think my problem has nothing to do with the data set and so I'm not showing a data example. In the bootstrap, I want a standard error for the average of six estimated marginal effects but I get a missing value. I get the standard errors for the marginal effects themselves, but not the average. Clearly I don't know how to compute a function of coefficients within a bootstrap program. See my calculation of tauavg. Thanks for any hints.

    Code:
    . capture program drop aggregate_boot
    
    .         
    . program aggregate_boot, rclass
      1. 
    . poisson y i.w#c.d4#c.f04 i.w#c.d4#c.f05 i.w#c.d4#c.f06 ///
    >         i.w#c.d5#c.f05 i.w#c.d5#c.f06 ///
    >         i.w#c.d6#c.f06 ///
    >         i.w#c.d4#c.f04#c.x i.w#c.d4#c.f05#c.x i.w#c.d4#c.f06#c.x ///
    >         i.w#c.d5#c.f05#c.x i.w#c.d5#c.f06#c.x ///
    >         i.w#c.d6#c.f06#c.x ///
    >         f02 f03 f04 f05 f06 ///
    >         c.f02#c.x c.f03#c.x c.f04#c.x c.f05#c.x c.f06#c.x ///
    >         d4 d5 d6 x c.d4#c.x c.d5#c.x c.d6#c.x, noomitted
      2. estimates store beta
      3.         
    . margins, dydx(w) at(d4 = 1 d5 = 0 d6 = 0 f02 = 0 f03 = 0 f04 = 1 f05 = 0 f06 = 0) ///
    >         subpop(if d4 == 1) noestimcheck post
      4. return scalar tau44 = _b[1.w]
      5. estimates restore beta
      6. margins, dydx(w) at(d4 = 1 d5 = 0 d6 = 0 f02 = 0 f03 = 0 f04 = 0 f05 = 1 f06 = 0) ///
    >         subpop(if d4 == 1) noestimcheck post
      7. return scalar tau45 = _b[1.w]
      8. estimates restore beta
      9. margins, dydx(w) at(d4 = 1 d5 = 0 d6 = 0 f02 = 0 f03 = 0 f04 = 0 f05 = 0 f06 = 1) ///
    >         subpop(if d4 == 1) noestimcheck post
     10. return scalar tau46 = _b[1.w]
     11. estimates restore beta
     12. margins, dydx(w) at(d4 = 0 d5 = 1 d6 = 0 f02 = 0 f03 = 0 f04 = 0 f05 = 1 f06 = 0) ///
    >         subpop(if d5 == 1) noestimcheck post
     13. return scalar tau55 = _b[1.w]
     14. estimates restore beta
     15. margins, dydx(w) at(d4 = 0 d5 = 1 d6 = 0 f02 = 0 f03 = 0 f04 = 0 f05 = 0 f06 = 1) ///
    >         subpop(if d5 == 1) noestimcheck post
     16. return scalar tau56 = _b[1.w]
     17. estimates restore beta
     18. margins, dydx(w) at(d4 = 0 d5 = 0 d6 = 1 f02 = 0 f03 = 0 f04 = 0 f05 = 0 f06 = 1) ///
    >         subpop(if d6 == 1) noestimcheck post
     19. return scalar tau66 = _b[1.w]
     20. 
    . return scalar tauavg = (tau44 + tau45 + tau46 + tau55 + tau56 + tau66)/6
     21. 
    . end
    
    . 
    . bootstrap r(tau44) r(tau45) r(tau46) r(tau55) r(tau56) r(tau66) r(tauavg),  ///
    >         reps(50) seed(123) cluster(id) idcluster(newid): aggregate_boot
    (running aggregate_boot on estimation sample)
    
    Bootstrap replications (50)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
    ..................................................    50
    
    Bootstrap results                                        Number of obs = 6,000
                                                             Replications  =    50
    
          Command: aggregate_boot
            _bs_1: r(tau44)
            _bs_2: r(tau45)
            _bs_3: r(tau46)
            _bs_4: r(tau55)
            _bs_5: r(tau56)
            _bs_6: r(tau66)
            _bs_7: r(tauavg)
    
                                      (Replications based on 1,000 clusters in id)
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
                 | coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           _bs_1 |   1.017501   1.119012     0.91   0.363    -1.175723    3.210725
           _bs_2 |    6.00713   1.978965     3.04   0.002     2.128431    9.885829
           _bs_3 |   4.569667   1.379358     3.31   0.001     1.866174    7.273159
           _bs_4 |   7.170127   2.957668     2.42   0.015     1.373203    12.96705
           _bs_5 |   7.185492   2.297489     3.13   0.002     2.682496    11.68849
           _bs_6 |   13.73294   12.83413     1.07   0.285    -11.42151    38.88738
           _bs_7 |   6.613809          .        .       .            .           .
    ------------------------------------------------------------------------------

  • #2
    ADDED IN EDIT: See corrected explanation and new example in post #4 below.

    Here is an example that demonstrates that return scalar apparently generates a scalar in r() but does not generate a scalar in the program, or at least not a scalar of the given name. This would throw an error message, but in your case you apparently had a scalar tauavg defined outside of the program, and that constant - rather than the results calculated in the program - is what was reported.
    Code:
    . set obs 100
    Number of observations (_N) was 0, now 100.
    
    . generate x = 1
    
    . capture program drop demo
    
    . program demo, rclass
      1.     return scalar foo = 666
      2.     capture noisily return scalar bar = foo / 2
      3. end
    
    . capture noisily bootstrap r(foo) r(bar), reps(2) : demo
    (running demo on estimation sample)
    foo not found
    'r(bar)' evaluated to missing in full sample
    
    . scalar foo = 42
    
    . capture noisily bootstrap r(foo) r(bar), reps(2) : demo
    (running demo on estimation sample)
    
    warning: demo does not set e(sample), so no observations will be excluded from the resampling
             because of missing values or other reasons. To exclude observations, press Break,
             save the data, drop any observations that are to be excluded, and rerun bootstrap.
    
    Bootstrap replications (2)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    ..
    
    Bootstrap results                                          Number of obs = 100
                                                               Replications  =   2
    
          Command: demo
            _bs_1: r(foo)
            _bs_2: r(bar)
    
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
                 | coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           _bs_1 |        666          .        .       .            .           .
           _bs_2 |         21          .        .       .            .           .
    ------------------------------------------------------------------------------
    
    .
    Last edited by William Lisowski; 23 Jun 2022, 18:56.

    Comment


    • #3
      Thanks William. I figured out that I had to generate the marginal effects within the program and not just return them. I'd had that in the program at some point but deleted them when doing some other debugging (the dreaded "repeated time values" problem).

      Comment


      • #4
        The return scalar command does not generate a scalar of the given name in the program. From the PDF documentation linked to in [P] from help return we find

        Distinguish between r() (returned results) and return() (results being assembled that will be returned). The program you write actually stores results in return(). Then when your program completes, whatever is in return() is copied to r().
        This example demonstrates this in action. From post #2 my assertion about the lack of an error message for the syntax you used still holds.
        Code:
        . set obs 100
        Number of observations (_N) was 0, now 100.
        
        . generate x = 1
        
        . capture program drop demo
        
        . program demo, rclass
          1.     return scalar foo = 666
          2.     return scalar bar = return(foo) / 2
          3. end
        
        .
        . capture noisily bootstrap r(foo) r(bar), reps(2) : demo
        (running demo on estimation sample)
        
        warning: demo does not set e(sample), so no observations will be excluded from the resampling
                 because of missing values or other reasons. To exclude observations, press Break,
                 save the data, drop any observations that are to be excluded, and rerun bootstrap.
        
        Bootstrap replications (2)
        ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
        ..
        
        Bootstrap results                                          Number of obs = 100
                                                                   Replications  =   2
        
              Command: demo
                _bs_1: r(foo)
                _bs_2: r(bar)
        
        ------------------------------------------------------------------------------
                     |   Observed   Bootstrap                         Normal-based
                     | coefficient  std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
               _bs_1 |        666          .        .       .            .           .
               _bs_2 |        333          .        .       .            .           .
        ------------------------------------------------------------------------------
        
        .

        Comment


        • #5
          Hi Jeff and William,

          You both might be aware of this, but just in case someone finds this post in the future, if you are using -margins- in a bootstrap loop, you may tell -margins- not to compute standard errors using the -nose- option. -margins- spends most of its computation time obtaining numerical derivatives for the delta method. This trick will save you a considerable amount of time.

          Comment


          • #6
            Thanks Enrique! I think I used to know this ....

            I don't suppose that you know of an easy way to obtain the standard error of tauavg without bootstrapping. As has been discussed before, a straight application of -lincom- doesn't do it (or at least the way I apply -lincom-) and -suest- does not seem to apply to partial effects.

            Comment


            • #7
              Hi Jeff,

              I am playing right now with this problem for the French User Group meeting. What I am going to show is very preliminary. I conjecture you can do what you want using -margins- and slightly different specification of the regressors (please let me know if I am on the wrong route). I have only worked on the linear case so far, but it is a good benchmark for the other cases. In particular, it is a good benchmark because the coefficients from -regress- after proper recentering (as you illustrate in your working paper) are the effects of interest. I show some code you wrote a while back and my suggestion after that.

              Code:
              use staggered_6, clear
              
              xtset id year
              
              gen f2014 = year == 2014
              gen f2015 = year == 2015
              gen f2016 = year == 2016
              
              egen wsum = sum(w), by(id)
              
              gen d4 = wsum == 3
              gen d5 = wsum == 2
              gen d6 = wsum == 1
              
              sum x1 if d4
              gen x1_dm4 = x1 - r(mean)
              sum x1 if d5
              gen x1_dm5 = x1 - r(mean)
              sum x1 if d6
              gen x1_dm6 = x1 - r(mean)
              
                  
              reg logy c.w#c.d4#c.f2014 c.w#c.d4#c.f2015 c.w#c.d4#c.f2016    ///
                  c.w#c.d5#c.f2015 c.w#c.d5#c.f2016                          ///
                  c.w#c.d6#c.f2016 c.w#c.d4#c.f2014#c.x1_dm4                 ///
                  c.w#c.d4#c.f2015#c.x1_dm4 c.w#c.d4#c.f2016#c.x1_dm4        ///
                  c.w#c.d5#c.f2015#c.x1_dm5 c.w#c.d5#c.f2016#c.x1_dm5        ///
                  c.w#c.d6#c.f2016#c.x1_dm6                                  ///
                  f2014 f2015 f2016 c.f2014#c.x1 c.f2015#c.x1 c.f2016#c.x1   ///
                  d4 d5 d6 x1 c.d4#c.x1 c.d5#c.x1 c.d6#c.x1, vce(cluster id)
                  
              // Alternative specification using margins
              
              generate double cohort = 0 
              bysort id: generate ttimes = year[_n] if w==1
              bysort id: egen cohort0 = min(ttimes)
              replace cohort = cohort0 if cohort0!=.
              tab cohort 
              
              qui reg logy i.w#2014bn.cohort#2014bn.year i.w#2014bn.cohort#2015bn.year   ///
                       i.w#2014bn.cohort#2016bn.year i.w#2015bn.cohort#2015bn.year       ///
                       i.w#2015bn.cohort#2016bn.year i.w#2016bn.cohort#2016bn.year       ///
                       i.w#2014bn.cohort#2014bn.year#c.x1                                ///         
                       i.w#2014bn.cohort#2015bn.year#c.x1                                ///
                       i.w#2014bn.cohort#2016bn.year#c.x1                                ///
                       i.w#2015bn.cohort#2015bn.year#c.x1                                ///
                       i.w#2015bn.cohort#2016bn.year#c.x1                                ///
                       i.w#2016bn.cohort#2016bn.year#c.x1                                ///
                       (c.x1)##(2014bn.year 2015bn.year 2016bn.year i.cohort),          ///
                       vce(cluster id) 
                       
              // I just want cohorts that are treated 
              generate over = cohort if  cohort!=0
              
              // margins 
              margins, dydx(w)  over(over)                        ///
                       at(year=2014) at(year=2015) at(year=2016)  ///
                       vce(unconditional) noestimcheck post
              Below is the output. You will see that the coefficients match but standard errors are a bit different.

              Code:
              . use staggered_6, clear
              
              . xtset id year
              
              Panel variable: id (strongly balanced)
               Time variable: year, 2011 to 2016
                       Delta: 1 unit
              
              . gen f2014 = year == 2014
              
              . gen f2015 = year == 2015
              
              . gen f2016 = year == 2016
              
              . egen wsum = sum(w), by(id)
              
              . gen d4 = wsum == 3
              
              . gen d5 = wsum == 2
              
              . gen d6 = wsum == 1
              
              . sum x1 if d4
              
                  Variable |        Obs        Mean    Std. dev.       Min        Max
              -------------+---------------------------------------------------------
                        x1 |        804    11.73881    1.197314          8         15
              
              . gen x1_dm4 = x1 - r(mean)
              
              . sum x1 if d5
              
                  Variable |        Obs        Mean    Std. dev.       Min        Max
              -------------+---------------------------------------------------------
                        x1 |        192    11.28125    1.529488          6         14
              
              . gen x1_dm5 = x1 - r(mean)
              
              . sum x1 if d6
              
                  Variable |        Obs        Mean    Std. dev.       Min        Max
              -------------+---------------------------------------------------------
                        x1 |        102          12    1.980099          9         15
              
              . gen x1_dm6 = x1 - r(mean)
              
              . reg logy c.w#c.d4#c.f2014 c.w#c.d4#c.f2015 c.w#c.d4#c.f2016    ///
              >         c.w#c.d5#c.f2015 c.w#c.d5#c.f2016                          ///
              >         c.w#c.d6#c.f2016 c.w#c.d4#c.f2014#c.x1_dm4                 ///
              >         c.w#c.d4#c.f2015#c.x1_dm4 c.w#c.d4#c.f2016#c.x1_dm4        ///
              >         c.w#c.d5#c.f2015#c.x1_dm5 c.w#c.d5#c.f2016#c.x1_dm5        ///
              >         c.w#c.d6#c.f2016#c.x1_dm6                                  ///
              >         f2014 f2015 f2016 c.f2014#c.x1 c.f2015#c.x1 c.f2016#c.x1   ///
              >         d4 d5 d6 x1 c.d4#c.x1 c.d5#c.x1 c.d6#c.x1, vce(cluster id)      
              
              Linear regression                               Number of obs     =      3,270
                                                              F(25, 544)        =      15.84
                                                              Prob > F          =     0.0000
                                                              R-squared         =     0.0370
                                                              Root MSE          =     .97442
              
                                                 (Std. err. adjusted for 545 clusters in id)
              ------------------------------------------------------------------------------
                           |               Robust
                      logy | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                  c.w#c.d4#|
                   c.f2014 |   .1800395   .0223136     8.07   0.000     .1362082    .2238708
                           |
                  c.w#c.d4#|
                   c.f2015 |   .1758216    .022537     7.80   0.000     .1315514    .2200917
                           |
                  c.w#c.d4#|
                   c.f2016 |   .1849706   .0249782     7.41   0.000     .1359051    .2340361
                           |
                  c.w#c.d5#|
                   c.f2015 |   .0978163   .0413848     2.36   0.018     .0165227    .1791098
                           |
                  c.w#c.d5#|
                   c.f2016 |   .1327046   .0434477     3.05   0.002     .0473587    .2180505
                           |
                  c.w#c.d6#|
                   c.f2016 |    .092621   .0624522     1.48   0.139    -.0300561    .2152981
                           |
                  c.w#c.d4#|
                   c.f2014#|
                  c.x1_dm4 |  -.0189186   .0162029    -1.17   0.243    -.0507465    .0129092
                           |
                  c.w#c.d4#|
                   c.f2015#|
                  c.x1_dm4 |  -.0392521   .0176407    -2.23   0.026    -.0739044   -.0045999
                           |
                  c.w#c.d4#|
                   c.f2016#|
                  c.x1_dm4 |  -.0246808   .0215565    -1.14   0.253    -.0670251    .0176634
                           |
                  c.w#c.d5#|
                   c.f2015#|
                  c.x1_dm5 |   .0001442   .0238144     0.01   0.995    -.0466352    .0469236
                           |
                  c.w#c.d5#|
                   c.f2016#|
                  c.x1_dm5 |   -.039724   .0213054    -1.86   0.063     -.081575     .002127
                           |
                  c.w#c.d6#|
                   c.f2016#|
                  c.x1_dm6 |  -.0403321   .0322365    -1.25   0.211    -.1036554    .0229912
                           |
                     f2014 |   .0454511   .0708177     0.64   0.521    -.0936586    .1845608
                     f2015 |  -.0158952   .0784413    -0.20   0.839    -.1699801    .1381897
                     f2016 |  -.0474077    .079603    -0.60   0.552    -.2037747    .1089592
                           |
              c.f2014#c.x1 |   .0015447   .0060817     0.25   0.800    -.0104018    .0134911
                           |
              c.f2015#c.x1 |  -.0012545   .0065617    -0.19   0.848    -.0141438    .0116348
                           |
              c.f2016#c.x1 |   .0091707   .0066336     1.38   0.167    -.0038599    .0222014
                           |
                        d4 |  -1.356329   .7810326    -1.74   0.083    -2.890538      .17788
                        d5 |   1.191999   1.144603     1.04   0.298    -1.056385    3.440383
                        d6 |   1.690494   1.679344     1.01   0.315      -1.6083    4.989288
                        x1 |   .0682135    .025351     2.69   0.007     .0184158    .1180113
                           |
                 c.d4#c.x1 |   .0872358   .0672249     1.30   0.195    -.0448164    .2192881
                           |
                 c.d5#c.x1 |  -.1026349   .1014854    -1.01   0.312    -.3019862    .0967163
                           |
                 c.d6#c.x1 |  -.1394164   .1293121    -1.08   0.281    -.3934286    .1145958
                           |
                     _cons |   1.689357   .3034252     5.57   0.000     1.093328    2.285385
              ------------------------------------------------------------------------------
              
              . // Alternative specification using margins
              . generate double cohort = 0 
              
              . bysort id: generate ttimes = year[_n] if w==1
              (2,787 missing values generated)
              
              . bysort id: egen cohort0 = min(ttimes)
              (2,172 missing values generated)
              
              . replace cohort = cohort0 if cohort0!=.
              (1,098 real changes made)
              
              . tab cohort 
              
                   cohort |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                        0 |      2,172       66.42       66.42
                     2014 |        804       24.59       91.01
                     2015 |        192        5.87       96.88
                     2016 |        102        3.12      100.00
              ------------+-----------------------------------
                    Total |      3,270      100.00
              
              . qui reg logy i.w#2014bn.cohort#2014bn.year i.w#2014bn.cohort#2015bn.year   //
              > /
              >          i.w#2014bn.cohort#2016bn.year i.w#2015bn.cohort#2015bn.year       //
              > /
              >          i.w#2015bn.cohort#2016bn.year i.w#2016bn.cohort#2016bn.year       //
              > /
              >          i.w#2014bn.cohort#2014bn.year#c.x1                                //
              > /         
              >          i.w#2014bn.cohort#2015bn.year#c.x1                                //
              > /
              >          i.w#2014bn.cohort#2016bn.year#c.x1                                //
              > /
              >          i.w#2015bn.cohort#2015bn.year#c.x1                                //
              > /
              >          i.w#2015bn.cohort#2016bn.year#c.x1                                //
              > /
              >          i.w#2016bn.cohort#2016bn.year#c.x1                                //
              > /
              >          (c.x1)##(2014bn.year 2015bn.year 2016bn.year i.cohort),          ///
              >          vce(cluster id) 
              
              . // I just want cohorts that are treated 
              . generate over = cohort if  cohort!=0
              (2,172 missing values generated)
              
              . // margins 
              . margins, dydx(w)  over(over)                        ///
              >          at(year=2014) at(year=2015) at(year=2016)  ///
              >          vce(unconditional) noestimcheck post
              note: 2172 observations omitted because of missing values in over() variable.
              
              Average marginal effects                               Number of obs   = 3,270
                                                                     Subpop. no. obs = 1,098
              
              Expression: Linear prediction, predict()
              dy/dx wrt:  1.w
              Over:       over
              1._at: 2014.over
                         year = 2014
                     2015.over
                         year = 2014
                     2016.over
                         year = 2014
              2._at: 2014.over
                         year = 2015
                     2015.over
                         year = 2015
                     2016.over
                         year = 2015
              3._at: 2014.over
                         year = 2016
                     2015.over
                         year = 2016
                     2016.over
                         year = 2016
              
                                                 (Std. err. adjusted for 545 clusters in id)
              ------------------------------------------------------------------------------
                           |            Unconditional
                           |      dy/dx   std. err.      t    P>|t|     [95% conf. interval]
              -------------+----------------------------------------------------------------
              0.w          |  (base outcome)
              -------------+----------------------------------------------------------------
              1.w          |
                  _at#over |
                   1 2014  |   .1800395   .0223999     8.04   0.000     .1360386    .2240404
                   1 2015  |          0  (empty)
                   1 2016  |          0  (empty)
                   2 2014  |   .1758216   .0229027     7.68   0.000     .1308329    .2208102
                   2 2015  |   .0978163   .0413848     2.36   0.018     .0165227    .1791098
                   2 2016  |          0  (empty)
                   3 2014  |   .1849706   .0251094     7.37   0.000     .1356474    .2342938
                   3 2015  |   .1327046   .0447612     2.96   0.003     .0447787    .2206305
                   3 2016  |    .092621    .065386     1.42   0.157    -.0358189    .2210609
              ------------------------------------------------------------------------------
              Note: dy/dx for factor levels is the discrete change from the base level.
              With this at hand you can use -nlcom-. Incidentally, you can also obtain this results using GMM with -gmm- but I might have more to say about that in a couple of days.

              Comment


              • #8
                Jeff, how about defining tauavg outside the program?

                Code:
                program define simple_program, rclass
                        reg wage union##race
                        estimates store beta
                        
                        margins, dydx(union) at(race==1) post
                        return scalar tau1 = _b[1.union]
                        
                        estimates restore beta
                        margins, dydx(union) at(race==2) post
                        return scalar tau2 = _b[1.union]
                        
                end
                
                sysuse nlsw88, clear
                
                bootstrap r(tau1) r(tau2) tauavg=((r(tau1)+r(tau2))/2), reps(50) strata(union race): simple_program 
                
                Bootstrap replications (50)
                ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
                ..................................................    50
                
                Bootstrap results
                
                Number of strata   =         6                  Number of obs     =      1,878
                                                                Replications      =         50
                
                      command:  simple_program
                        _bs_1:  r(tau1)
                        _bs_2:  r(tau2)
                       tauavg:  (r(tau1)+r(tau2))/2
                
                ------------------------------------------------------------------------------
                             |   Observed   Bootstrap                         Normal-based
                             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                       _bs_1 |   1.153829   .3054846     3.78   0.000     .5550898    1.752567
                       _bs_2 |   2.646457   .4295824     6.16   0.000     1.804491    3.488423
                      tauavg |   1.900143   .2766763     6.87   0.000     1.357867    2.442418
                ------------------------------------------------------------------------------

                Comment


                • #9
                  Enrique, that looks promising! Looking forward to seeing more in a few days -- even if it's not precisely on my problem.

                  Comment

                  Working...
                  X