Slow computation

Nicolas Orgeira

Join Date: Sep 2015
Posts: 165

20 Dec 2021, 12:37

Hi,

I'm trying to generate 200 samples of n=100, 200 and 400 normally distributed observations, average them by sample and see I should reject them. The code below works but is taking a very long time (6-7 hours so far and not even completed n=100). I was wondering if there is any faster way to do it, while following the instructions above?

Thanks

Code:

clear all
local j=0
foreach sample_size of numlist 100 200 400 {
    forvalues theta=-1(0.01)3 {
        local seed 20211217+1000*`theta'
        quietly {
            forvalues i=1/200 {
                clear all
                set obs `sample_size'
                local new_seed=`seed'+`i'
                set seed `new_seed'
                gen double theta=round(`theta', 0.01)
                gen obs=rnormal(1+`theta'/sqrt(`sample_size'), 1)
                display `j'
                if `j'==0 {
                    capture erase normal_R200_n`sample_size'_2.dta
                }
                local ++j
                * Test
                gen test=obs>invnormal(0.95)+sqrt(`sample_size')*(`theta0'-theta)         
                if `j'>1 {
                    append using "normal_R200_n`sample_size'_2"
                }
                save "normal_R200_n`sample_size'_2", replace
            }
        }
    }
    
    * Average rejection rate by R samples
    collapse (mean) average_rejection=test, by(theta)

    la var average_rejection "Rejection rate"
    la var theta "{&theta}"

    graph twoway scatter average_rejection theta, ///
        msize(tiny) yline(0.05, lcolor(red)) ///
        ylabel(0 "0" 0.05 "{&alpha}" 1 "1", angle(0))
    graph export "Q2e_rejection_prob_by_theta_n`sample_size'.png", replace    
    save "normal_R200_n`sample_size'_collapsed_2", replace
}

Tags: None

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

20 Dec 2021, 14:28

I'm trying to generate 200 samples of n=100, 200 and 400 normally distributed observations

It appears to me you are trying to generate 200 samples of n=100, 200 and 400 normally distributed observations for about 400 values of theta.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29801
#3

20 Dec 2021, 14:38

Well, look, you are looping over 3 values of sample size, 300 values of theta, and 200 values of i. So that's 3*300*200 = 180,000 repetitions of the code inside. Each of those repetitions includes reading (-append-) and writing (-save-) a data set. So I'm not surprised it's slow. Here are a few suggestions that may help:
Instead of re-reading and re-writing to the disk, save the results in a -frame- instead. (To implement this you need Stata version 16 or later, and you will also need to install Jeremy Freese's -frameappend- program from SSC) Writing to RAM should be a lot faster.

While I doubt it will save a noticeable amount of time, there is no reason to set the random number seed here more than once at the very beginning (before entering any of the loops). Just let the random number generator keep going--it will give you independent samples. In fact, constantly resetting the seed can sometimes inadvertently introduce dependence among the samples!

Again, this one is not about saving time, but even though your code seems to be working, I think it is incorrect. Your -gen test ...- command references a local macro -theta0- which is not defined. As a result, the expression `theta0'-theta evaluates to just -theta. So unless theta0 is supposed to be 0, your code is incorrect.

Added: Crossed with #2.
1 like
Comment

Announcement

Slow computation

Comment

Comment