Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Slow computation

    Hi,

    I'm trying to generate 200 samples of n=100, 200 and 400 normally distributed observations, average them by sample and see I should reject them. The code below works but is taking a very long time (6-7 hours so far and not even completed n=100). I was wondering if there is any faster way to do it, while following the instructions above?

    Thanks

    Code:
    clear all
    local j=0
    foreach sample_size of numlist 100 200 400 {
        forvalues theta=-1(0.01)3 {
            local seed 20211217+1000*`theta'
            quietly {
                forvalues i=1/200 {
                    clear all
                    set obs `sample_size'
                    local new_seed=`seed'+`i'
                    set seed `new_seed'
                    gen double theta=round(`theta', 0.01)
                    gen obs=rnormal(1+`theta'/sqrt(`sample_size'), 1)
                    display `j'
                    if `j'==0 {
                        capture erase normal_R200_n`sample_size'_2.dta
                    }
                    local ++j
                    * Test
                    gen test=obs>invnormal(0.95)+sqrt(`sample_size')*(`theta0'-theta)         
                    if `j'>1 {
                        append using "normal_R200_n`sample_size'_2"
                    }
                    save "normal_R200_n`sample_size'_2", replace
                }
            }
        }
        
        * Average rejection rate by R samples
        collapse (mean) average_rejection=test, by(theta)
    
        la var average_rejection "Rejection rate"
        la var theta "{&theta}"
    
        graph twoway scatter average_rejection theta, ///
            msize(tiny) yline(0.05, lcolor(red)) ///
            ylabel(0 "0" 0.05 "{&alpha}" 1 "1", angle(0))
        graph export "Q2e_rejection_prob_by_theta_n`sample_size'.png", replace    
        save "normal_R200_n`sample_size'_collapsed_2", replace
    }

  • #2
    I'm trying to generate 200 samples of n=100, 200 and 400 normally distributed observations
    It appears to me you are trying to generate 200 samples of n=100, 200 and 400 normally distributed observations for about 400 values of theta.

    Comment


    • #3
      Well, look, you are looping over 3 values of sample size, 300 values of theta, and 200 values of i. So that's 3*300*200 = 180,000 repetitions of the code inside. Each of those repetitions includes reading (-append-) and writing (-save-) a data set. So I'm not surprised it's slow. Here are a few suggestions that may help:
      1. Instead of re-reading and re-writing to the disk, save the results in a -frame- instead. (To implement this you need Stata version 16 or later, and you will also need to install Jeremy Freese's -frameappend- program from SSC) Writing to RAM should be a lot faster.
      2. While I doubt it will save a noticeable amount of time, there is no reason to set the random number seed here more than once at the very beginning (before entering any of the loops). Just let the random number generator keep going--it will give you independent samples. In fact, constantly resetting the seed can sometimes inadvertently introduce dependence among the samples!
      3. Again, this one is not about saving time, but even though your code seems to be working, I think it is incorrect. Your -gen test ...- command references a local macro -theta0- which is not defined. As a result, the expression `theta0'-theta evaluates to just -theta. So unless theta0 is supposed to be 0, your code is incorrect.
      Added: Crossed with #2.

      Comment

      Working...
      X