Monte Carlo Simulation - dynamic panel data model

Luca Uberti

Join Date: Aug 2016

Posts: 33
#1

Monte Carlo Simulation - dynamic panel data model

05 Jul 2022, 08:43

Dear Statalist-ers,

I would like to perform a Monte Carlo simulation in Stata, but I get some counter-intuitive results, and I was wondering if someone could kindly check that my code is doing what I would like it do to.

I am simulating a panel dataset with individual unobserved effects, a lagged dependent variable, lagged feedback effects, and a linear time trend. This is the DGP that I assume:

I am interested in simulating the behaviour of the OLS estimator of the coefficient on X(it-1) in the first equation, the true value of which is -0.2.

I set T= 10 and N=182 (panel units), for a total of 1820 observations. I produce 2000 repeated samples (replications) of this size, each time running a FE regression for the first equation and storing the estimated coefficient on X(it-1). Then I compute the expectation of the OLS estimates obtained in this way.

This is the code I use:

Code:

program drop _all program define ar1sim, rclass drop _all set obs 1820 gen id = round(uniform()*182) sort id by id: gen t = _n xtset id t gen u = invnorm(uniform()) gen v = invnorm(uniform()) gen y = invnorm(uniform()) gen x = invnorm(uniform()) more by id: replace y = y - 0.05*t + 0.01*id + u if t==1 by id: replace x = x + 0.04*t + 0.02*id + v if t==1 forvalues i = 2/10 { by id: replace y = 0.9*L.y - 0.2*L.x - 0.05*t + 0.01*id + u if t==`i' by id: replace x = 0.5*L.x - 0.2*L.y + 0.04*t + 0.02*id + v if t==`i' } more xtreg y L.y L.x t, fe more return scalar b = _b[L.ln_y] more end more simulate beta = r(b), reps(2000): ar1sim program drop _all sum beta

Is this code actually implementing the DGP as per above?

Thanks a lot!

Luca
Tags: None

Fei Wang

Join Date: Oct 2021
Posts: 726

05 Jul 2022, 19:53

Luca, 0.01*id in your code is a linear trend of id, not an id-specific fixed effect. Below is a way of defining FEs.

Code:

clear

cap program drop arlsim

program define arlsim
    drop _all
    set obs 182
    gen id = _n
    gen fe_y = rnormal()    // FE of y
    gen fe_x = rnormal()    // FE of x
    expand 10
    bys id: gen t = _n
    
    gen u = rnormal()
    gen v = rnormal()
    gen y = -0.05 + fe_y + u if t == 1
    gen x = 0.04 + fe_x + v if t == 1
    
    forvalues i = 2(1)10 {
        bys id (t): replace y = 0.9*y[_n-1] - 0.2*x[_n-1] - 0.05*t + fe_y + u if t == `i'
        bys id (t): replace x = 0.5*x[_n-1] - 0.2*y[_n-1] + 0.04*t + fe_x + v if t == `i'
    }
    
    xtset id t
    xtreg y L.y L.x t, fe
end

simulate _b[L1.x], reps(2000): arlsim

sum _sim_1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      _sim_1 |      2,000   -.2372499    .0242774  -.3089076  -.1394651

Comment

Luca Uberti

Join Date: Aug 2016

Posts: 33
#3

06 Jul 2022, 09:21

Dear Fei,

Thank you for your comment and your useful code! (which is much more elegantly written than mine)

- What is the problem with defining the FE the way I define them? id (in my code) is still common to all the observations within each group, and I don't think it can be thought of as a trend since the groups (id) do not have a natural order
- Also, in your code, how could you set the degree to which x and y are correlated with fe_x, and fe_y?

(I still get puzzling results, anyway)

Best,

Luca
Comment

Fei Wang

Join Date: Oct 2021
Posts: 726

06 Jul 2022, 10:27

Luca, in the equation below, y is indeed a linear function of id: if id increases by one unit, then y would increase by 0.01 unit, other terms being constant. For example, subject 4's FE is larger than subject 3's FE by 0.01, and subject 5's FE is also 0.01 larger than subject 4. The FEs of the three subjects are equally distant -- apparently some unnecessary restrictions have been imposed.

Code:

y = ... + 0.01*id + ...

One way of allowing fe_x and fe_y to be correlated is to let them have common terms. In the example below, the correlation coefficient of fe_y and fe_x would be 0.2 by construction.

Code:

clear

cap program drop arlsim

program define arlsim
    drop _all
    set obs 182
    gen id = _n
    gen fe_u = 0.5*rnormal()
    gen fe_y = rnormal() + fe_u    // FE of y
    gen fe_x = rnormal() + fe_u    // FE of x
    expand 10
    bys id: gen t = _n
    
    gen u = rnormal()
    gen v = rnormal()
    gen y = -0.05 + fe_y + u if t == 1
    gen x = 0.04 + fe_x + v if t == 1
    
    forvalues i = 2(1)10 {
        bys id (t): replace y = 0.9*y[_n-1] - 0.2*x[_n-1] - 0.05*t + fe_y + u if t == `i'
        bys id (t): replace x = 0.5*x[_n-1] - 0.2*y[_n-1] + 0.04*t + fe_x + v if t == `i'
    }
    
    xtset id t
    xtreg y L.y L.x t, fe
end

simulate _b[L1.x], reps(2000): arlsim

Comment

Luca Uberti

Join Date: Aug 2016

Posts: 33
#5

07 Jul 2022, 10:35

Dear Fei,

Thank you so much again for your extremely useful input.

"some unnecessary restrictions have been imposed" - I see the point now, regarding the FE. The approach you propose is indeed more general.

Regarding the correlation between the fe - your suggestion makes perfect sense. Thank you. Can I ask for two additional clarifications?

1. In your example, do you meant that the AVERAGE correlation coefficient of fe_y and fe_x ACROSS REPEATED SAMPLES is 0.2 by construction?

2. In my replication exercise, I am actually interested in setting (and then varying) the correlation between x[_n-1] and fe_y (rather than between fe_x and fe_y? How can I accomplish this?

Thanks again

Best,

Luca
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#6

07 Jul 2022, 18:36

Luca, I computed the correlation based on its formula. You may empirically regard it as the cross-sectional correlation of fe_x and fe_y among subjects.

Code:

Corr(fe_x, fe_y) = Cov(fe_x, fe_y)/(Var(fe_x)*Var(fe_y))^0.5 = 0.25/1.25 = 0.2

Your second question is more like a theoretical question. Start with the initial value of X.

Code:

X_0 = 0 Y_0 = 0 X_1 = 0.04 + fe_x + v_1 Y_1 = -0.05 + fe_y + u_1 X_2 = 0.5*X_1 -0.2*Y_1 + 0.04*2 + fe_x + v_2 = 0.11 + 1.5*fe_x - 0.2*fe_y + 0.5*v_1 - 0.2*u_1 + v_2 ...

If we assume u and v are independent across individuals and over time, then you may find that X_1 can be correlated with fe_y when fe_x and fe_y are correlated. In the expression of X_2, there are both fe_x and fe_y. So X_2 is correlated with fe_y by construction and this part of correlation depends on the coefficient of fe_y (-0.2) and the distribution of fe_y. If you allow fe_x and fe_y to be correlated, then there is a second source for the correlation of X_2 and fe_y.

Eventually, X_(t-1) for any t >= 2 can be correlated with fe_y if you allow fe_x and fe_y to be correlated. BUT the correlation varies by t, and depends on the coefficients of your true model, the distribution of fe_y, and the assumed correlation of fe_x and fe_y. Finding how the correlation of fe_x and fe_y is passed to the correlation of X_(t-1) and fe_y needs the theoretical expression of X(t-1).
Comment

Announcement