Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrap and strata and size options: number of observations doesn't change!?

    I have a sample of about 100 firms, and a number of peers (from 1 to 40) for each firm. I'm using the bootstrap command to, basically, compute an average value that involves running a regression each time with one different peer per firm. That is why I'm using the following command:

    bootstrap, strata(firm_id) size(1): reg y x

    I'm quite convinced this is ok, but the problem is that the boostrapped regression results show as the "number of observations" the total size of the dataset, instead of 100 (given that in theory it is taking 1 random peer by firm in every iteration, so each regression is using 100 observations). Is this a mistake or am I doing something wrong in how I'm using the strata option?

    By the way, when I use the bsample command in the following form;

    bsample, strata(firm_id) size(1)

    I end up having a sample of 100 observations (one peer per firm), so that should be consistent with what I'm doing above, right?

    Thanks,

    Dany
    Last edited by Dany Bahar; 14 Mar 2017, 17:26.

  • #2
    Dany:
    welcome to the list.
    By imposing -strata- Stata draws with replacement from all the stata your sample is composed of, as you can see in the following toy-example:
    Code:
    . sysuse auto.dta
    (1978 Automobile Data)
    
    . reg price mpg
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(1, 72)        =     20.26
           Model |   139449474         1   139449474   Prob > F        =    0.0000
        Residual |   495615923        72  6883554.48   R-squared       =    0.2196
    -------------+----------------------------------   Adj R-squared   =    0.2087
           Total |   635065396        73  8699525.97   Root MSE        =    2623.7
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
           _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
    ------------------------------------------------------------------------------
    
    . bootstrap, strata(foreign) size(1): reg price mpg
    (running regress on estimation sample)
    
    Bootstrap replications (50)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    .........................x....x..........x........    50
    
    Linear regression
    
    Number of strata   =         2                  Number of obs     =         74
                                                    Replications      =         47
                                                    Wald chi2(1)      =       0.04
                                                    Prob > chi2       =     0.8456
                                                    R-squared         =     0.2196
                                                    Adj R-squared     =     0.2087
                                                    Root MSE          =  2623.6529
    
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
           price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |  -238.8943    1226.99    -0.19   0.846     -2643.75    2165.962
           _cons |   11253.06   25202.49     0.45   0.655    -38142.92    60649.04
    ------------------------------------------------------------------------------
    Note: One or more parameters could not be estimated in 3 bootstrap replicates;
          standard-error estimates include only complete replications.

    Hence, no wonder that your -bootstrap- code ends up considering the whole sample.

    Referring to your second code, it sounds weird that -bsample- support -size- option (which is seemingly undocumented in the related -help file-).
    In all likelihood, you typed:
    Code:
    bsample 100, strata(firm_id)
    As an aside, you would receive more helpful reply conditional on posting what you typed (as you already did) and what Stata gave you back, too (as per AFQ, that you're kindly asked to read before your first post). Thanks.
    Last edited by Carlo Lazzaro; 15 Mar 2017, 03:34.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thank you very much Carlo Lazzaro for the explanation. I understand that it draws from all the sample, but what is happening in the back-end? Is every iteration of the bootstrap using one observation per strata (in your example, that'd mean running a regression with 2 observations every iteration)? I assume that is not the case b/c a regression with only 2 observations doesn't make any sense.

      Maybe another way to frame my question is how can I use the bootstrap command to run a regression using in each iteration a random sample that results from using bsample 100, strata(firm_id)?

      Thanks!

      Dany

      Comment


      • #4
        Dany:
        in my example -bootstrap- standard errors (which are sky-sky rocketing wnen compared to those obtained with the original regression) are averaged over 50 regression made with 2 observation drawn with replacement (1 per stratum). You can see that
        I agree that it makes no sense; it was indeed a toy-example (that mimicks yours, by the way).
        Depending on your data, you may also want considering creating an indicator that gathers -firm_id- and -peer- together:
        Code:
        egen indicator=group (firm_id
        to work as -strata- variable.

        As far as your question on -bsample- and -regression. is concerned, you way want to take a look at -preserve- and -restore- entries in Stata .pdf manua.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Excellent. Now I understand. So the command is doing what I intended, which is running a regression in each iteration using a sample of 1 observation per strata drawn with replacement (so a regression of 100 observations), and taking the results simply show averages over all the different iterations. The "number of observations" was confusing. Thank you Carlo Lazzaro !

          Comment

          Working...
          X