Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Monte Carlo Simulation - dynamic panel data model

    Dear Statalist-ers,

    I would like to perform a Monte Carlo simulation in Stata, but I get some counter-intuitive results, and I was wondering if someone could kindly check that my code is doing what I would like it do to.

    I am simulating a panel dataset with individual unobserved effects, a lagged dependent variable, lagged feedback effects, and a linear time trend. This is the DGP that I assume:

    Click image for larger version

Name:	eq.png
Views:	1
Size:	5.2 KB
ID:	1672215

    I am interested in simulating the behaviour of the OLS estimator of the coefficient on X(it-1) in the first equation, the true value of which is -0.2.

    I set T= 10 and N=182 (panel units), for a total of 1820 observations. I produce 2000 repeated samples (replications) of this size, each time running a FE regression for the first equation and storing the estimated coefficient on X(it-1). Then I compute the expectation of the OLS estimates obtained in this way.

    This is the code I use:

    Code:
    program drop _all
    program define ar1sim, rclass
    drop _all
    set obs 1820
    gen id = round(uniform()*182)
    sort id
    by id: gen t = _n
    xtset id t
    gen u = invnorm(uniform())
    gen v = invnorm(uniform())
    gen y = invnorm(uniform())
    gen x = invnorm(uniform())
    more
    by id: replace y = y - 0.05*t + 0.01*id + u if t==1
    by id: replace x = x + 0.04*t + 0.02*id + v if t==1
    forvalues i = 2/10 {
    by id: replace y = 0.9*L.y - 0.2*L.x - 0.05*t + 0.01*id + u if t==`i'
    by id: replace x = 0.5*L.x - 0.2*L.y + 0.04*t + 0.02*id + v if t==`i'
    }
    more
    xtreg y L.y L.x t, fe 
    more
    return scalar b = _b[L.ln_y]
    more
    end
    more
    simulate beta = r(b), reps(2000): ar1sim
    
    program drop _all
    
    sum beta

    Is this code actually implementing the DGP as per above?

    Thanks a lot!

    Luca

  • #2
    Luca, 0.01*id in your code is a linear trend of id, not an id-specific fixed effect. Below is a way of defining FEs.

    Code:
    clear
    
    cap program drop arlsim
    
    program define arlsim
        drop _all
        set obs 182
        gen id = _n
        gen fe_y = rnormal()    // FE of y
        gen fe_x = rnormal()    // FE of x
        expand 10
        bys id: gen t = _n
        
        gen u = rnormal()
        gen v = rnormal()
        gen y = -0.05 + fe_y + u if t == 1
        gen x = 0.04 + fe_x + v if t == 1
        
        forvalues i = 2(1)10 {
            bys id (t): replace y = 0.9*y[_n-1] - 0.2*x[_n-1] - 0.05*t + fe_y + u if t == `i'
            bys id (t): replace x = 0.5*x[_n-1] - 0.2*y[_n-1] + 0.04*t + fe_x + v if t == `i'
        }
        
        xtset id t
        xtreg y L.y L.x t, fe
    end
    
    simulate _b[L1.x], reps(2000): arlsim
    
    sum _sim_1
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
          _sim_1 |      2,000   -.2372499    .0242774  -.3089076  -.1394651

    Comment


    • #3
      Dear Fei,

      Thank you for your comment and your useful code! (which is much more elegantly written than mine)

      - What is the problem with defining the FE the way I define them? id (in my code) is still common to all the observations within each group, and I don't think it can be thought of as a trend since the groups (id) do not have a natural order
      - Also, in your code, how could you set the degree to which x and y are correlated with fe_x, and fe_y?

      (I still get puzzling results, anyway)

      Best,

      Luca

      Comment


      • #4
        Luca, in the equation below, y is indeed a linear function of id: if id increases by one unit, then y would increase by 0.01 unit, other terms being constant. For example, subject 4's FE is larger than subject 3's FE by 0.01, and subject 5's FE is also 0.01 larger than subject 4. The FEs of the three subjects are equally distant -- apparently some unnecessary restrictions have been imposed.

        Code:
        y = ... + 0.01*id + ...
        One way of allowing fe_x and fe_y to be correlated is to let them have common terms. In the example below, the correlation coefficient of fe_y and fe_x would be 0.2 by construction.

        Code:
        clear
        
        cap program drop arlsim
        
        program define arlsim
            drop _all
            set obs 182
            gen id = _n
            gen fe_u = 0.5*rnormal()
            gen fe_y = rnormal() + fe_u    // FE of y
            gen fe_x = rnormal() + fe_u    // FE of x
            expand 10
            bys id: gen t = _n
            
            gen u = rnormal()
            gen v = rnormal()
            gen y = -0.05 + fe_y + u if t == 1
            gen x = 0.04 + fe_x + v if t == 1
            
            forvalues i = 2(1)10 {
                bys id (t): replace y = 0.9*y[_n-1] - 0.2*x[_n-1] - 0.05*t + fe_y + u if t == `i'
                bys id (t): replace x = 0.5*x[_n-1] - 0.2*y[_n-1] + 0.04*t + fe_x + v if t == `i'
            }
            
            xtset id t
            xtreg y L.y L.x t, fe
        end
        
        simulate _b[L1.x], reps(2000): arlsim

        Comment


        • #5
          Dear Fei,

          Thank you so much again for your extremely useful input.

          "some unnecessary restrictions have been imposed" - I see the point now, regarding the FE. The approach you propose is indeed more general.

          Regarding the correlation between the fe - your suggestion makes perfect sense. Thank you. Can I ask for two additional clarifications?

          1. In your example, do you meant that the AVERAGE correlation coefficient of fe_y and fe_x ACROSS REPEATED SAMPLES is 0.2 by construction?

          2. In my replication exercise, I am actually interested in setting (and then varying) the correlation between x[_n-1] and fe_y (rather than between fe_x and fe_y? How can I accomplish this?

          Thanks again

          Best,

          Luca

          Comment


          • #6
            Luca, I computed the correlation based on its formula. You may empirically regard it as the cross-sectional correlation of fe_x and fe_y among subjects.

            Code:
            Corr(fe_x, fe_y) = Cov(fe_x, fe_y)/(Var(fe_x)*Var(fe_y))^0.5  = 0.25/1.25 = 0.2
            Your second question is more like a theoretical question. Start with the initial value of X.

            Code:
            X_0 = 0
            Y_0 = 0
            
            X_1 = 0.04 + fe_x + v_1
            Y_1 = -0.05 + fe_y + u_1
            
            X_2 = 0.5*X_1 -0.2*Y_1 + 0.04*2 + fe_x + v_2 = 0.11 + 1.5*fe_x - 0.2*fe_y + 0.5*v_1 - 0.2*u_1 + v_2
            ...
            If we assume u and v are independent across individuals and over time, then you may find that X_1 can be correlated with fe_y when fe_x and fe_y are correlated. In the expression of X_2, there are both fe_x and fe_y. So X_2 is correlated with fe_y by construction and this part of correlation depends on the coefficient of fe_y (-0.2) and the distribution of fe_y. If you allow fe_x and fe_y to be correlated, then there is a second source for the correlation of X_2 and fe_y.

            Eventually, X_(t-1) for any t >= 2 can be correlated with fe_y if you allow fe_x and fe_y to be correlated. BUT the correlation varies by t, and depends on the coefficients of your true model, the distribution of fe_y, and the assumed correlation of fe_x and fe_y. Finding how the correlation of fe_x and fe_y is passed to the correlation of X_(t-1) and fe_y needs the theoretical expression of X(t-1).

            Comment

            Working...
            X