Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a random variable that is constant across group

    Say I wish to generate a random shock per panel unit in my dataset - meaning that it must be constant across time in each panel unit.
    I can achieve this as follows:
    Code:
    clear all
    webuse nlswork
    bysort idcode: gen ind_shock = rnormal()
    bysort idcode: replace ind_shock = ind_shock[1]
    I'm looking to replace this 2 line code, which generates a random draw per observation and then replaces all the observations within each panel with the value of the 1st "shock" in each panel - with a single line of code. I wish to do this since: this is sure to speed things up in large datasets and I wish to make the code more "elegant". it would also help in making MC simulation codes nicer to work with in panel data. I thought that:
    Code:
    bysort idcode: gen ind_shock = rnormal()
    would produce ind_shock as constant inside idcode, but apparently it doesn't!

  • #2
    On your last point: it shouldn't be expected to generate a constant, as the simpler version:
    Code:
    gen ind_shock = rnormal()
    wouldn't be expected to generate a constant value for all observations either.

    simplification could therefore be:
    Code:
    clear all
    webuse nlswork
    gen ind_shock = rnormal()
    bysort idcode: replace ind_shock = ind_shock[1]
    I dont have an answer on how to combine the two in one line. I doubt somewhat that combining two simple lines into a more complex one would yield much speed increases. Someone else may hve a better idea on this topic, though.

    Comment


    • #3
      I don't know how to do this in one line either. The key point is that by default random number generation functions yield different results in different observations, as almost always that is exactly what you want: here that isn't so.

      Comment


      • #4
        Here is an amusing alternative that appears to call rnormal() only once per panel. It still takes two commands, however.
        Code:
        clear all
        webuse nlswork
        generate ind_shock = .
        bysort idcode: replace ind_shock = cond(_n==1,rnormal(),ind_shock[1])

        Comment


        • #5
          Thanks William, but apparently this takes much longer than the original solution, I did a little test on a panel I have (N=~150,000, T=~10). the first method takes 10 seconds, the second one takes more than 24.

          Comment


          • #6
            Code:
            clear
            set obs 150000
            gen n=_n
            expand 10
            bysort n: gen t=_n
            
            timer clear
            
            timer on 1
            bysort n: gen ind_shock = rnormal()
            bysort n: replace ind_shock = ind_shock[1]
            timer off 1
            drop ind_shock
            
            timer on 2
            gen ind_shock = rnormal()
            bysort n: replace ind_shock = ind_shock[1]
            timer off 2
            drop ind_shock
            
            timer on 3
            generate ind_shock = .
            bysort n: replace ind_shock = cond(_n==1,rnormal(),ind_shock[1])
            timer off 3
            drop ind_shock
            
            timer list
            Code:
            . timer list
               1:      0.47 /        1 =       0.4680
               2:      0.17 /        1 =       0.1720
               3:      0.50 /        1 =       0.5010
            Not getting such big differences in timing here. Seems the simplification suggested in #2 takes some time off though. Do wonder what kind of hardware and Stata version you have. If youre concerned with time saving, maybe an upgrade of either might help?

            Comment


            • #7
              How do I generate a set of X0's so there is a column of 10 "1's" in my regression, for the coefficient of BO?

              Comment


              • #8
                #7 Stata includes an intercept or constant in regression by default, so you should not ever to do that.

                Comment


                • #9
                  Thanks. I have already come thus far. I am trying to create a loop that will perform the following 50 and 500 times but its got an error that the data_store_con ambiguous.
                  Could you help me get this loop to work correctly?
                  clear

                  local mc = 10

                  set obs `mc'

                  g data_store_x2 = .
                  g data_store_x3 = .
                  g data_store_con = .

                  quietly{
                  forvalues i = 1(1) `mc' {

                  if floor((`i'-1)/100) == ((`i'-1)/100) {
                  noisily display "Working on `i' out of `mc' at $S_TIME"
                  }
                  preserve

                  clear

                  set obs 2000

                  g x2 = runiform()

                  g x3 = runiform()

                  g e = rnormal()

                  g y = 1 - 3*x2 + 2*x3 + e

                  reg y x2 x3

                  local x2coeff = _b[x2]

                  local x3coeff = _b[x3]

                  local const = _b[_cons]

                  restore

                  replace data_store_x3 = `x3coeff' in `i'
                  replace data_store_x2 = `x2coeff' in `i'
                  replace data_store_con = `const' in `i'
                  }
                  }
                  summ data_store_con data_store_x2 data_store_x3


                  Comment


                  • #10
                    Oops sorry thats wrong one.. I meant this one.
                    clear

                    local mc = 50

                    set obs `mc'

                    g data_store_x3 = .
                    g data_store_x2 = .
                    g data_store_con = .
                    g data_store_con_50 = .
                    g data_store_con_500 = .
                    g data_store_x2_50 = .
                    g data_store_x2_500 = .
                    g data_store_x3_50 = .
                    g data_store_x3_500 = .

                    quietly{
                    foreach obs in 50 500 {
                    forvalues i = 1(1) `mc' {
                    if floor((`i'-1)/100) == ((`i'-1)/100) {
                    noisily display "Working on `i' out of `mc' at $S_TIME"
                    }
                    preserve

                    clear

                    set obs `obs'

                    g x2 = rnormal()

                    g x3 = rnormal()

                    g e = runiform()

                    g y = 1 -3*x2 + 2*x3 + e

                    reg y x2 x3

                    local x2coeff = _b[x2]

                    local x3coeff = _b[x3]

                    local const1 = _b[_cons]

                    restore

                    replace data_store_con_`obs' = `const1' in `i'

                    replace data_store_x3_`obs' = `x3coeff' in `i'

                    replace data_store_x2_`obs' = `x2coeff' in `i'


                    }
                    }
                    }
                    summ data_store_con_`obs' data_store_x2_`obs' data_store_x3_`obs'

                    Comment

                    Working...
                    X