Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I use STATA to do simulations for a Bernoulli trial?

    Suppose I have a Bernoulli trial that has a success rate of 0.05, and I run it for 30 times. Then I collect the data for the sample mean and sample variance.
    Then I repeat this process for 1000 times.
    How do I collect data for mean of sample mean and sample variance?
    I've looked everywhere but all the results turned to be slightly off of what I was looking for.
    I saw two commands forvalues i=1/n and reps.
    But I have no idea how to use them.
    Thank you in advance!

  • #2
    Here's one way to do that:
    Code:
    clear
    set obs 1000
    gen x = rbinomial(30,0.05)   // try -help random number-
    summarize x

    Comment


    • #3
      Code:
      clear*
      
      capture program drop one_rep
      program define one_rep, rclass
          clear
          set obs 30
          local p_success 0.05
          gen byte success = runiform() < `p_success'
          summ success
          return scalar mean = r(mean)
          return scalar variance = r(Var)
          exit
      end
      
      tempfile results
      simulate mean = r(mean) variance = r(variance), ///
          dots(20) saving(`results') reps(1000) seed(1234): one_rep
          
      use `results', clear
      summarize
      The results of all 1000 replications will now be the data in memory.

      Notes: the -dots(20)- is not essential. It's there so you can see progress as you go, but, frankly, this executes so fast you don't really need it. The number in -seed()- can be any positive integer you like.

      Added: Crossed with #2. Mike Lacy's solution is a direct route to the results, and provides the mean and variance of the total number of successes in each trial of 30. The more convoluted approach shown here is to demonstrate the process of simulating Bernoulli trials, rather than to go straight to the results, and gets the mean and variance of the proportion of successes in each trial of 30.

      Also, to assure replicability of the results, the code in #2 should include a -set seed- command before its -gen x = ...- command.
      Last edited by Clyde Schechter; 15 Sep 2019, 22:01.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Code:
        clear*
        
        capture program drop one_rep
        program define one_rep, rclass
        clear
        set obs 30
        local p_success 0.05
        gen byte success = runiform() < `p_success'
        summ success
        return scalar mean = r(mean)
        return scalar variance = r(Var)
        exit
        end
        
        tempfile results
        simulate mean = r(mean) variance = r(variance), ///
        dots(20) saving(`results') reps(1000) seed(1234): one_rep
        
        use `results', clear
        summarize
        The results of all 1000 replications will now be the data in memory.

        Notes: the -dots(20)- is not essential. It's there so you can see progress as you go, but, frankly, this executes so fast you don't really need it. The number in -seed()- can be any positive integer you like.

        Added: Crossed with #2. Mike Lacy's solution is a direct route to the results, and provides the mean and variance of the total number of successes in each trial of 30. The more convoluted approach shown here is to demonstrate the process of simulating Bernoulli trials, rather than to go straight to the results, and gets the mean and variance of the proportion of successes in each trial of 30.

        Also, to assure replicability of the results, the code in #2 should include a -set seed- command before its -gen x = ...- command.
        Hi, I tried to use the code you provided, but after I typed in
        simulate mean = r(mean) variance = r(variance), ///
        I got stata saying
        invalid 'variance' .

        Why?

        Also, my professor is assuming that we know all tricks with stata, but unfortunately I don't, could you also recommend some books for these? Much much much appreciated!

        Comment


        • #5
          Originally posted by Mike Lacy View Post
          Here's one way to do that:
          Code:
          clear
          set obs 1000
          gen x = rbinomial(30,0.05) // try -help random number-
          summarize x
          Hi Mike, I think there is a little misunderstanding here. Binomial variables are Bernoulli only when n=1, they only take values 0 and 1.

          Comment


          • #6
            Originally posted by Jiale Zhang View Post
            I tried to use the code you provided, but after I typed in
            simulate mean = r(mean) variance = r(variance), ///
            I got stata saying
            invalid 'variance' .
            That is because you tried to copy those commands in the command line. The code by Clyde is supposed to be entered in the do-file editor.

            In the command line type doedit. This will open the do-file editor. You can paste the code in there, and than press one of the symbols top right that executes that do file. Any project you do will consist of many lines of code, so it is good to collect those in one file, so you can later reproduce exactly what you did. That is what the do-file editor is for.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Originally posted by Jiale Zhang View Post
              Hi Mike, I think there is a little misunderstanding here. Binomial variables are Bernoulli only when n=1, they only take values 0 and 1.
              No misunderstanding, Mike is just being efficient. All you need to know is the number of successes in each iteration, the rest you can derive from that.
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                Originally posted by Jiale Zhang View Post
                Suppose I have a Bernoulli trial that has a success rate of 0.05, and I run it for 30 times. Then I collect the data for the sample mean and sample variance.
                Then I repeat this process for 1000 times.
                How do I collect data for mean of sample mean and sample variance?
                Did you want the mean of the sample variance? If so, then you'll need to go Clyde's route or else modify Mike's code like below.

                Clyde's -one_rep- program accidentally wipes out the variance from -summarize- before returning it. Modifying it as below fixes that.
                Code:
                version 16.0
                
                clear *
                
                set seed `=strreverse("1516458")'
                
                // Mike's code
                quietly set obs 1000
                generate byte x1 = rbinomial(30, 0.05)
                quietly summarize
                display in smcl as text r(mean) / 30, r(Var) / 30
                
                // Mean of sample variance
                generate int row = _n
                generate byte x0 = 30 - x1
                generate double av = x1 * (30 - x1) / 30 / 30
                quietly reshape long x, i(row) j(pos)
                quietly drop if !x
                quietly expand x
                egen double pr = mean(pos), by(row)
                egen double v = sd(pos), by(row)
                quietly replace v = v * v
                quietly bysort row: keep if _n == 1
                // Cf. above
                summarize pr v av
                
                // Fix for Clyde's simulation
                program define simem //, rclass
                    version 16.0
                    syntax
                
                    drop _all
                    set obs 30
                    generate byte x = rbinomial(1, 0.05)
                    summarize x
                end
                
                simulate mu = r(mean) var = r(Var), nodots reps(1000): simem
                summarize
                
                exit
                Last edited by Joseph Coveney; 16 Sep 2019, 04:59. Reason: ETA: Check that on Clyde's -one_step-; maybe you copied and pasted incorrectly, but what I have above works, too.

                Comment


                • #9
                  Thanks to everyone! I finally figured out the problem!

                  Comment

                  Working...
                  X