Speed up bootstrapped t-stat calcilations

Olena Onishchenko

Join Date: Oct 2015

Posts: 165
#1

Speed up bootstrapped t-stat calcilations

29 Feb 2016, 16:21

Dear Stata users,

My dataset has 776 variables in time-series format.

I am running the following code:

forval i=1/776 {
regress var_MA22Return`i', vce(bootstrap, reps(500))
eststo var_MA22Return`i'
}

I need to obtain bootstrapped t-stat (with 500 replications) of mean stock return.

My problem is that current code setting is very time consuming. It took Stata 18 hrs to calculate t-stats for 176 variables. And I need to repeat this code over another 2328 variables.

Any idea how to incease prodactivity of this estimation? I may reshape data into panel format if it may help...

Thank you.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

29 Feb 2016, 17:33

If you are just looking for bootstrap estimates of the mean, I think that -bootstrap r(mean), reps(500): summarize var_MA22Return`i', meanonly- would run a lot faster. You would not be able to use -eststo- after that, but the results of -bootstrap- are placed in r(table), which you can keep as a matrix and then dispose of in some appropriate way. I ran some experiments on my own setup and found that using -regress- instead of -summarize- multiplies the run time by a factor of about 2.9.
Comment
Olena Onishchenko

Join Date: Oct 2015

Posts: 165
#3

29 Feb 2016, 22:25

Clyde

I run the code you suggested:

forval i=1/776 {
bootstrap r(mean), reps(500): summarize var_MA22Return`i', meanonly

}

It did estimation but I have no experience how to extract t-stat out of this results.

May you please let me know how to get t-stat? Eventually I need to export results to the spreadsheet.

Thank you.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

29 Feb 2016, 23:56

Olena:
as far as bootstrapped t-stat can be obtained, you may be interested in developing some chuncks of code along the following lines:

Code:

. sysuse auto.dta, clear
(1978 Automobile Data)
. ttest price, by(foreign) unequal

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Domestic |      52    6072.423    429.4911    3097.104    5210.184    6934.662
 Foreign |      22    6384.682    558.9942    2621.915     5222.19    7547.174
---------+--------------------------------------------------------------------
combined |      74    6165.257    342.8719    2949.496    5481.914      6848.6
---------+--------------------------------------------------------------------
    diff |           -312.2587    704.9376               -1730.856    1106.339
------------------------------------------------------------------------------
    diff = mean(Domestic) - mean(Foreign)                         t =  -0.4430
Ho: diff = 0                     Satterthwaite's degrees of freedom =  46.4471

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.3299         Pr(|T| > |t|) = 0.6599          Pr(T > t) = 0.6701

. bootstrap r(mu_1) r(mu_2) r(t), reps(200) strata(foreign) saving(C:\Users\user\Desktop\Carlo_try.dta, replace) : ttest price, by(foreign) unequal
(running ttest on estimation sample)

Warning:  Because ttest is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations
          are used in calculating the statistics and so assumes that all observations are used.  This means that no observations will
          be excluded from the resampling because of missing values or other reasons.

          If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded.  Be sure that
          the dataset in memory contains only the relevant data.

Bootstrap replications (200)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100
..................................................   150
..................................................   200

Bootstrap results

Number of strata   =         2                  Number of obs     =         74
                                                Replications      =        200

      command:  ttest price, by(foreign) unequal
        _bs_1:  r(mu_1)
        _bs_2:  r(mu_2)
        _bs_3:  r(t)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   6072.423   468.8205    12.95   0.000     5153.552    6991.294
       _bs_2 |   6384.682   565.3566    11.29   0.000     5276.603     7492.76
       _bs_3 |  -.4429594   1.098561    -0.40   0.687      -2.5961    1.710181
------------------------------------------------------------------------------

. use "C:\Users\user\Desktop\Carlo_try.dta", clear
(bootstrap: ttest)

Kind regards,
Carlo
(Stata 19.0)

Comment

daniel klein

Join Date: Mar 2014

Posts: 3850
#5

01 Mar 2016, 00:09

It is a z-stat, not t. Anyway, See Buis (2007).

Code:

sysuse auto , clear replace price = price - 6100 bootstrap r(mean) , reps(500) : summarize price tempname z scalar `z' = _b[_bs_1]/_se[_bs_1] display "z = " `z' display "p = " 2 * normal(-abs(`z'))

You can skip the last step since you are interested in the z-stat, only, not the p-value.

Best
Daniel

Buis, M., L. (2007). Stata tip 53: Where did my p-values go? The Stata Journal, 7(4), pp. 584-586.

Last edited by daniel klein; 01 Mar 2016, 00:16.
Comment
Olena Onishchenko

Join Date: Oct 2015

Posts: 165
#6

01 Mar 2016, 01:23

Daniel

I am not 100% sure I understand your code. In your case you run bootstrap r(mean) for one variable - Price. My case is for 776 variables and that's where difficulty arises.

Following your advice do I run estimates using loop?
When I try something like that it does not display anything.

Code:

forval i=1/776 { bootstrap r(mean), reps(500): summarize var_MA22Return`i', meanonly tempname z scalar `z' = _b[_bs_1]/_se[_bs_1] display "z = " `z' }
Comment

daniel klein

Join Date: Mar 2014
Posts: 3850

01 Mar 2016, 04:58

Details depend very much on what exactly you are trying to do. In your initial post you obtain bootstrapped standard errors only. Adopting Clyde's suggestion you now obtain point estimates and the standard errors via bootstrap. Sticking with this latter approach, I guess you want something along these lines

Code:

// set up a 776 by 2 matrix
matrix mean_z = J(776, 2, .)
matrix colnames mean_z = "mean" "z"

// now fill the matirx with means and z-stats
forvalues j = 1/776 {
    quietly bootstrap r(mean) , reps(500) : ///
        summarize var_MA22Return`j' , meanonly
    matrix mean_z[`j', 1] = _b[_bs_1]
    matrix mean_z[`j', 2] = _b[_bs_1]/_se[bs_1]
}

// now put the matrix into a new Stata dataset
preserve // <- makes a temporary copy of the curret dataset
clear
set obs 776
svmat mean_z , names(col)

// optionally save the dataset
save mean_z.dta

// ... or export to Excel
export excel ...

// ... or ...
whatever
restore // <- brings back the original dataset

Whether there are (much) better ways to get what you want, I cannot tell for sure. Perhaps something like statsby is an alternative here?

Best
Daniel

Comment

Olena Onishchenko

Join Date: Oct 2015

Posts: 165
#8

01 Mar 2016, 18:32

Daniel

I am running the code you have suggested:

Code:

set matsize 11000 matrix mean_z = J(776, 2, .) matrix colnames mean_z = "mean" "z" forvalues j=1/776 { bootstrap r(mean), reps(500):summarize var_MA22Return`j', meanonly matrix mean_z[`j', 1] = _b[_bs_1] matrix mean_z[`j', 2] = _b[_bs_1]/_se[bs_1] }

It starts bootstrap replications and finishes shortly sending the following error message:

[bs_1] not found
r(111);

What may cause this error message?

Thank you.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#9

02 Mar 2016, 03:38

There is a typo in the code. Look for [bs_1] instead of [_bs_1] in the last line of the loop and change it.
Comment
Reuben Mathew

Join Date: Jun 2018

Posts: 3
#10

14 Jun 2018, 21:47

Hello all,

I have a similar issue that I'd like to address as well. I am a first-time poster so please bear with me.

Context:
I am conducting an analysis where I have 13 sites, and am conducting a risk factor analysis of tuberculosis (TB) on the outcome of COPD, adjusting for just 7 variables. Unfortunately, although the dataset that I've been given is quite large (12k participants), among the 13 sites, the amount of reported TB cases can be very low, in some cases <10. Therefore, to get the site-specific 95% CI, my advisor (who is an R man, hence this posting), suggested I bootstrap the mixed-effects regression I'm using, with the reps equal to the site N. That all sounded great, but when I test out my code even for 11 reps, let alone the nearly 2,000 I'll have to do for some sites, it takes at least 20 minutes. I would ideally like to get the OR and 95% CI from each of the covariates, but if push came to shove, I would settle simply for the TB variable.

I am using Stata 14.2 I/C and the melogit package.

Code:

bootstrap if site_city==1,reps(1102) seed(1984): melogit copd tb center_age sex center_bmi center_pack_years10 secondary_education biomass || site_city: ,or estat bootstrap, percentile

Primary issue: What can I do to speed up the code?
Secondary Issue: What is the best way to pull out the 5th and 95th values for the ORs that I need? I'm currently using estat bootstrap, percentile, which only gives me the tb covariate, but then also generates the following error, breaking up my dofile.

equation [var(_cons[site_city] not found
r(303);

I think this is because the random intercept for the site variable is not being stored through estat, but I also haven't been able to find where it might be in the Stata file on melogit, if it is stored at all.

Thank you for the time, I hope to hear from you soon!

Best,
Reuben
Comment

Announcement