Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • running regression 1,000 times with random variables

    Hi Everyone,

    I am creating 100 random variables at the beginning as follows:

    set obs 100
    gen z1 = rnormal(6,1)
    gen a=3
    gen B=2
    gen gama=1
    gen y=a+B*z1+rnormal(0,1)
    gen ym= y+ gama*rnormal(0,1)

    I am trying to run regression of ym on z1 1,000 times. I want to record average,min,max of a and B (coefficients and constant) and R^2. Also I want to record t statistics for each regression.

    Your helps will be appreciated.

    Thanks in advance.
    Ulas

  • #2
    This is a task for the -simulate- command. First you have to wrap you random variate generation and regression into a program that -simulate- can call.

    Code:
    clear*
    
    capture program drop my_model
    program define my_model, rclass
        local a = 3
        local B = 2
        local gamma = 1
        replace z1 = rnormal(6, 1)
        replace y = `a' + `B'*z1 + rnormal(0, 1)
        replace ym = y + `gamma'*rnormal(0,1)
        regress ym z1
        return scalar constant = _b[_cons]
        return scalar coef = _b[z1]
        return scalar r2 = e(r2)
        exit
    end
    
    set seed 1234
    set obs 100
    gen y = .
    gen z1 = .
    gen ym = .
    
    simulate constant=r(constant) coef=r(coef) r2=r(r2), reps(1000): my_model
    At the end of this code you will have in memory three variables, constant coef and r2 with the results from the 1,000 regressions and you can get the summary statistics you want in the usual way.

    Note: I have include your command that generates y, but it seems to play no actual role in anything. You never use y in the regression, or to calculate anything else. It seems superfluous and if I were certain you haven't left out something that does use it, I would just omit that command.

    Comment


    • #3
      Thank you very much for your very useful response. How about t value? How can I record it? Is there any specific code for it?

      Comment


      • #4
        Yes. You didn't ask for that before. But you just calculate it in the program in the usual way and return it, and then pick it up with the -simulate- command. You can get anything that -regress- outputs, or anything that can be calculated from that, in this same way. Everything is in _b[], _se[], or somewhere in e().

        Code:
        clear*
        
        capture program drop my_model
        program define my_model, rclass
            local a = 3
            local B = 2
            local gamma = 1
            replace z1 = rnormal(6, 1)
            replace y = `a' + `B'*z1 + rnormal(0, 1)
            replace ym = y + `gamma'*rnormal(0,1)
            regress ym z1
            return scalar constant = _b[_cons]
            return scalar coef = _b[z1]
            return scalar t = _b[z1]/_se[z1]
            return scalar r2 = e(r2)
            exit
        end
        
        set seed 1234
        set obs 100
        gen y = .
        gen z1 = .
        gen ym = .
        
        simulate constant=r(constant) coef=r(coef) t = r(t) r2=r(r2), reps(1000): my_model

        Comment


        • #5
          Thanks for your reply. It worked. I need one more help. How can I record correlation between two dependent variables? My regression is y= a+a1*x1+a2*x2 this time and I want to see the correlation between x1 and x2? Thanks in advance.

          Comment


          • #6
            -help corr-

            This doesn't appear to relate to the original topic of this thread. In the future, when raising a new topic, please start a new thread.

            Comment


            • #7

              I want to randomly assign treatment to others in the same year and then simulate it for 1000 times for all my y variables. I want the coefficient, standard error, and the confidence interval.


              I am running the following code but STATA is issuing "varlist not allowed" after program NO treatment - Copy, rclass. I am not sure if I am coding it correctly. Any help is appreciated.


              drop random
              drop post
              set more off
              set matsize 800


              cap program drop NO treatment - Copy
              set seed 6000
              program NO treatment - Copy, rclass


              unab y : wheat rice lentils
              local l : word count `y'

              generate random=uniform()
              sort year random
              generate post=0
              replace post=1 in 1 in 1/29 // for year 2010
              replace post=1 in 296/337 // for year 2011
              replace post=1 in 578/619 // for year 2012

              foreach var of local y {
              quietly reg `var' post i.country_id i.year i.country#c.line_time_trend [aw=ypop], cluster( country_id )
              }

              return scalar tau_coeff = _b[post]
              return scalar tau_se = _se[post]
              end

              simulate post= r(post)
              tau_coeff = r(tau_coeff) ///
              tau_se = r(tau_se) , reps(1000) saving(NO treatment - Copy.dta, replace): NO treatment - Copy

              sum

              Comment


              • #8
                You cannot name your program NO treatment - Copy. Program names, like variable names, must contain only letters, numbers, and underscore (_) characters (and the first may not be a number).

                Because of the blank between NO and treatment, Stata thinks you are trying to define a program called NO and that you are mistakenly specifying a variable list (treatment - Copy) in that command. Of course, you cannot specify a variable list in a -program define- command.

                By the way, once you fix that you are also going to trip over the embedded blanks problem again with your -saving()- option in the -simulate- command. While it is allowable to have blanks in a filename, when you do have them, you must enclose the filename in quotes.

                Comment


                • #9
                  I am putting it as model and stata is issuing me unexpected end of file

                  Comment


                  • #10
                    "
                    I am putting it as model"

                    What does that mean?

                    I don't see any reason why this code should lead to an unexpected end of file message. And when I run it on my setup after fixing the problem with the blanks, that is not what happens. What does happen first is that the command -simulate post = r(post)- throws an error about '' found where ':' expected. That's because you forgot to put /// at the end of that line, so Stata thinks the command ends with just that line. When I fix that, the next thing that happens is that I'm told that variable wheat is not found--which makes sense because no such variable is ever defined.

                    Look, clearly you are not being truthful about what code you are running and what Stata is doing in response. It is not possible to troubleshoot your problems if you don't show the actual code you are running. If you think you are just making "minor changes" to it, then you need to learn that there is no such thing as a minor change in coding. If you want help, post the real code and the real output. Don't waste your time, and that of others, with spurious code and questions.

                    Comment


                    • #11
                      I meant I am changing the file name to "model". and I have defined wheat (all the outcome variable) under local y:

                      Comment


                      • #12
                        I would like to use the Sept 26 code here, but I have a few different items so I don't know how to add them.
                        1. I need x1 = 20 observation of 1's
                        2. I need x2, x3, x4 to be uniform (0,50)
                        3. I need to generate e = standard normal errors
                        4. I need to run it 100 times and 1000 times.
                        5. I need to obtain mean value over all samples of the coeffiicents, variance for each, and MSE
                        I apologize I am a bit of a novice and cannot see how to refine the great loop program provided to include these changes?
                        FL

                        Comment


                        • #13
                          In case anyone sees this, I have gotten this far but it has a bug in the data store for constant 'con'. Any help appreciated.
                          clear

                          local mc = 50

                          set obs `mc'

                          g data_store_x3 = .
                          g data_store_x2 = .
                          g data_store_con = .
                          g data_store_con_50 = .
                          g data_store_con_500 = .
                          g data_store_x2_50 = .
                          g data_store_x2_500 = .
                          g data_store_x3_50 = .
                          g data_store_x3_500 = .

                          quietly{
                          foreach obs in 50 500 {
                          forvalues i = 1(1) `mc' {
                          if floor((`i'-1)/100) == ((`i'-1)/100) {
                          noisily display "Working on `i' out of `mc' at $S_TIME"
                          }
                          preserve

                          clear

                          set obs `obs'

                          g x2 = rnormal()

                          g x3 = rnormal()

                          g e = runiform()

                          g y = 1 -3*x2 + 2*x3 + e

                          reg y x2 x3

                          local x2coeff = _b[x2]

                          local x3coeff = _b[x3]

                          local const1 = _b[_cons]

                          restore

                          replace data_store_con_`obs' = `const1' in `i'

                          replace data_store_x3_`obs' = `x3coeff' in `i'

                          replace data_store_x2_`obs' = `x2coeff' in `i'


                          }
                          }
                          }
                          summ data_store_con_`obs' data_store_x2_`obs' data_store_x3_`obs'

                          Comment


                          • #14
                            ..
                            but it has a bug in the data store for constant 'con'.

                            No, it doesn't. If you browse the data at the end of the run you will see that your variables look just fine. However, your code is a bit off. The initial commands that generate data_store_x3, data_store_x2, and data_store_con create missing values, but then you never modify them anywhere in the code. So there is no point to them. Eliminate those three commands.

                            Next, your -summ command is outside the loop that defines `obs', so it interprets this as a command to summarize data_store_con_, data_store_x2_, and data_store_x3_, which are non-existent variables in your data. I'm guessing you actually want that to be inside the loop, probably right after the -replace- statements. But there, it finds itself inside a -quietly- block so you won't actually see the output unless you prefix the -summ- command with -noisily-.

                            Comment


                            • #15
                              Thanks. If I eliminate the original data_store_x3, I immediately get a message variable data_store_x3 not found so I had to keep that. When I put the noisily summ, it shows me all ten iterations.
                              I solved my summ problem with tabstat and this code is working. But I have another question. How can I get the usual ANOVA to display showing RSS, etc, over all itertions?

                              clear

                              local mc = 10
                              set obs `mc'
                              g data_store_x3 = .
                              g data_store_x2 = .
                              g data_store_con = .
                              g data_store_y = .

                              quietly{
                              forvalues i = 1(1) `mc' {
                              if floor((`i'-1)/100) == ((`i'-1)/100) {
                              noisily display "Working on `i' out of `mc' at $S_TIME"
                              }
                              preserve
                              clear
                              set obs 50
                              g x2 = runiform()
                              g x3 = runiform()
                              g e = rnormal()
                              g y = 1 -3*x2 + 2*x3 + e
                              reg y x2 x3
                              local x2coeff = _b[x2]
                              local x3coeff = _b[x3]
                              local const = _b[_cons]
                              restore
                              replace data_store_x3 = `x3coeff' in `i'
                              replace data_store_x2 = `x2coeff' in `i'
                              replace data_store_con = `const' in `i'
                              noisily summ
                              }
                              }
                              summ data_store_con data_store_x2 data_store_x3
                              display e(rmse)
                              tabstat (data_store_x3 data_store_x2 data_store_con), statistics(mean, variance, sd)
                              histogram data_store_x2, title(Histogram Of B2)

                              Comment

                              Working...
                              X