How do I use STATA to do simulations for a Bernoulli trial?

Jiale Zhang

Join Date: Sep 2019

Posts: 13
#1

How do I use STATA to do simulations for a Bernoulli trial?

15 Sep 2019, 21:31

Suppose I have a Bernoulli trial that has a success rate of 0.05, and I run it for 30 times. Then I collect the data for the sample mean and sample variance.
Then I repeat this process for 1000 times.
How do I collect data for mean of sample mean and sample variance?
I've looked everywhere but all the results turned to be slightly off of what I was looking for.
I saw two commands forvalues i=1/n and reps.
But I have no idea how to use them.
Thank you in advance!
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#2

15 Sep 2019, 21:43

Here's one way to do that:

Code:

clear set obs 1000 gen x = rbinomial(30,0.05) // try -help random number- summarize x
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#3

15 Sep 2019, 21:54

Code:

clear* capture program drop one_rep program define one_rep, rclass clear set obs 30 local p_success 0.05 gen byte success = runiform() < `p_success' summ success return scalar mean = r(mean) return scalar variance = r(Var) exit end tempfile results simulate mean = r(mean) variance = r(variance), /// dots(20) saving(`results') reps(1000) seed(1234): one_rep use `results', clear summarize

The results of all 1000 replications will now be the data in memory.

Notes: the -dots(20)- is not essential. It's there so you can see progress as you go, but, frankly, this executes so fast you don't really need it. The number in -seed()- can be any positive integer you like.

Added: Crossed with #2. Mike Lacy's solution is a direct route to the results, and provides the mean and variance of the total number of successes in each trial of 30. The more convoluted approach shown here is to demonstrate the process of simulating Bernoulli trials, rather than to go straight to the results, and gets the mean and variance of the proportion of successes in each trial of 30.

Also, to assure replicability of the results, the code in #2 should include a -set seed- command before its -gen x = ...- command.

Last edited by Clyde Schechter; 15 Sep 2019, 22:01.
Comment
Jiale Zhang

Join Date: Sep 2019

Posts: 13
#4

15 Sep 2019, 22:56

Originally posted by Clyde Schechter View Post

Code:

clear* capture program drop one_rep program define one_rep, rclass clear set obs 30 local p_success 0.05 gen byte success = runiform() < `p_success' summ success return scalar mean = r(mean) return scalar variance = r(Var) exit end tempfile results simulate mean = r(mean) variance = r(variance), /// dots(20) saving(`results') reps(1000) seed(1234): one_rep use `results', clear summarize

The results of all 1000 replications will now be the data in memory.

Notes: the -dots(20)- is not essential. It's there so you can see progress as you go, but, frankly, this executes so fast you don't really need it. The number in -seed()- can be any positive integer you like.

Added: Crossed with #2. Mike Lacy's solution is a direct route to the results, and provides the mean and variance of the total number of successes in each trial of 30. The more convoluted approach shown here is to demonstrate the process of simulating Bernoulli trials, rather than to go straight to the results, and gets the mean and variance of the proportion of successes in each trial of 30.

Also, to assure replicability of the results, the code in #2 should include a -set seed- command before its -gen x = ...- command.

Hi, I tried to use the code you provided, but after I typed in
simulate mean = r(mean) variance = r(variance), ///
I got stata saying
invalid 'variance' .

Why?

Also, my professor is assuming that we know all tricks with stata, but unfortunately I don't, could you also recommend some books for these? Much much much appreciated!
Comment
Jiale Zhang

Join Date: Sep 2019

Posts: 13
#5

15 Sep 2019, 22:59

Originally posted by Mike Lacy View Post

Here's one way to do that:

Code:

clear set obs 1000 gen x = rbinomial(30,0.05) // try -help random number- summarize x

Hi Mike, I think there is a little misunderstanding here. Binomial variables are Bernoulli only when n=1, they only take values 0 and 1.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#6

16 Sep 2019, 00:56

Originally posted by Jiale Zhang View Post

I tried to use the code you provided, but after I typed in
simulate mean = r(mean) variance = r(variance), ///
I got stata saying
invalid 'variance' .

That is because you tried to copy those commands in the command line. The code by Clyde is supposed to be entered in the do-file editor.

In the command line type doedit. This will open the do-file editor. You can paste the code in there, and than press one of the symbols top right that executes that do file. Any project you do will consist of many lines of code, so it is good to collect those in one file, so you can later reproduce exactly what you did. That is what the do-file editor is for.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#7

16 Sep 2019, 00:59

Originally posted by Jiale Zhang View Post

Hi Mike, I think there is a little misunderstanding here. Binomial variables are Bernoulli only when n=1, they only take values 0 and 1.

No misunderstanding, Mike is just being efficient. All you need to know is the number of successes in each iteration, the rest you can derive from that.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4374

16 Sep 2019, 04:56

Originally posted by Jiale Zhang View Post

Suppose I have a Bernoulli trial that has a success rate of 0.05, and I run it for 30 times. Then I collect the data for the sample mean and sample variance.
Then I repeat this process for 1000 times.
How do I collect data for mean of sample mean and sample variance?

Did you want the mean of the sample variance? If so, then you'll need to go Clyde's route or else modify Mike's code like below.

Clyde's -one_rep- program accidentally wipes out the variance from -summarize- before returning it. Modifying it as below fixes that.

Code:

version 16.0

clear *

set seed `=strreverse("1516458")'

// Mike's code
quietly set obs 1000
generate byte x1 = rbinomial(30, 0.05)
quietly summarize
display in smcl as text r(mean) / 30, r(Var) / 30

// Mean of sample variance
generate int row = _n
generate byte x0 = 30 - x1
generate double av = x1 * (30 - x1) / 30 / 30
quietly reshape long x, i(row) j(pos)
quietly drop if !x
quietly expand x
egen double pr = mean(pos), by(row)
egen double v = sd(pos), by(row)
quietly replace v = v * v
quietly bysort row: keep if _n == 1
// Cf. above
summarize pr v av

// Fix for Clyde's simulation
program define simem //, rclass
    version 16.0
    syntax

    drop _all
    set obs 30
    generate byte x = rbinomial(1, 0.05)
    summarize x
end

simulate mu = r(mean) var = r(Var), nodots reps(1000): simem
summarize

exit

Last edited by Joseph Coveney; 16 Sep 2019, 04:59. Reason: ETA: Check that on Clyde's -one_step-; maybe you copied and pasted incorrectly, but what I have above works, too.

Comment

Jiale Zhang

Join Date: Sep 2019

Posts: 13
#9

16 Sep 2019, 14:08

Thanks to everyone! I finally figured out the problem!
Comment

Announcement