Create my own panel data by specifing the underlying DGP

Jannic Cutura

Join Date: Apr 2025

Posts: 125
#1

Create my own panel data by specifing the underlying DGP

14 Mar 2017, 09:59

Dear All,
I am trying to generate my own panel data. I know how to create purely cross sectional data:

Code:

clear set obs 10000 gen x1 = rnormal(1,2) gen x2= rnormal(0,3) gen eps = rnormal(0,4) gen y = 3*x1+2*x2+eps reg y x1 x2

Basically I create the GDP and can use regression commands to recover the true coeffcients. I would like to do the same thing but with panel data. Googling only led to the sample command which is used to sue a subsample of a given dataset, but I cannot find anything that helps me create (random) data based on a specified GDP.

Can anyone point me to the correct commands?

Thanks in advance!

Best,
Tags: None

1 like
Jeph Herrin

Join Date: Apr 2014

Posts: 332
#2

14 Mar 2017, 10:15

You aren't very clear about what sort of attributes you want your panel data to have; but suppose you want these to be countries with panels of years. In addition, you want the distribution across countries to be N(0.1). Then something like:

Code:

set seed 20170314 set obs 100 gen id_country=_n gen gamma=rnormal(0,1) // panel level effect expand 20 bys id_country : gen id_year=_n gen x1=rnormal(1,2) gen x2=rnormal(0,3) gen eps = rnormal(0,4) gen y=3*x1+2*x2+eps + gamma mixed y x1 x2 || id_country:

will set up panel data where -gamma- is a random effect across countries, constant within countries.

hth,
Jeph
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17671

14 Mar 2017, 10:25

Jannic:
perhaps something along the following lines?

Code:

. set obs 10
number of observations (_N) was 0, now 10

. gen x1 = rnormal(1,2)

. gen x2= rnormal(0,3)

. gen u = rnormal(0,2)

. gen eps = rnormal(0,4)

. g id=1

. expand 4
(30 observations created)

. replace id=2 in 11/20
(10 real changes made)

. replace id=3 in 21/30
(10 real changes made)

. replace id=4 in 31/40
(10 real changes made)

. bysort id: g year=_n

. gen y = 3*x1+2*x2+eps+u

. xtset id year
       panel variable:  id (strongly balanced)
        time variable:  year, 1 to 10
                delta:  1 unit

. xtreg y x1 x2, fe

Fixed-effects (within) regression               Number of obs     =         40
Group variable: id                              Number of groups  =          4

R-sq:                                           Obs per group:
     within  = 0.8188                                         min =         10
     between = 0.2409                                         avg =       10.0
     overall = 0.7755                                         max =         10

                                                F(2,34)           =      76.82
corr(u_i, Xb)  = -0.0472                        Prob > F          =     0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   3.260157    .270717    12.04   0.000     2.709994     3.81032
          x2 |   1.288454   .2803894     4.60   0.000     .7186338    1.858273
       _cons |  -.7372457   .7182126    -1.03   0.312    -2.196829    .7223379
-------------+----------------------------------------------------------------
     sigma_u |  2.2398954
     sigma_e |  3.6437468
         rho |  .27424977   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(3, 34) = 3.71                       Prob > F = 0.0207

. reg y x1 x2 i.id

      Source |       SS           df       MS      Number of obs   =        40
-------------+----------------------------------   F(5, 34)        =     33.56
       Model |  2227.92613         5  445.585226   Prob > F        =    0.0000
    Residual |   451.41429        34  13.2768909   R-squared       =    0.8315
-------------+----------------------------------   Adj R-squared   =    0.8067
       Total |  2679.34042        39  68.7010364   Root MSE        =    3.6437

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   3.260157    .270717    12.04   0.000     2.709994     3.81032
          x2 |   1.288454   .2803894     4.60   0.000     .7186338    1.858273
             |
          id |
          2  |  -3.165829    1.63639    -1.93   0.061    -6.491373    .1597147
          3  |   1.676938    1.64513     1.02   0.315    -1.666367    5.020244
          4  |   1.488891   1.671284     0.89   0.379    -1.907567    4.885349
             |
       _cons |  -.7372457    1.22947    -0.60   0.553    -3.235829    1.761338
------------------------------------------------------------------------------

Due to the limited number of clusters, I deliberatley omitted clustered standard errors in pooled OLS.

PS: Crossed in the cybers space with Jeph's reply, who took a different road.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35405

14 Mar 2017, 10:37

For synthetic examples, I often use egen's seq() to create panel identifiers and time variables, but its slightly bizarre syntax may take some getting used to.

Code:

. clear

. set obs 200
number of observations (_N) was 0, now 200

. egen id = seq(), block(20)

. egen t = seq(), to(20)

. l id t in 1/40, sepby(id)

     +---------+
     | id    t |
     |---------|
  1. |  1    1 |
  2. |  1    2 |
  3. |  1    3 |
  4. |  1    4 |
  5. |  1    5 |
  6. |  1    6 |
  7. |  1    7 |
  8. |  1    8 |
  9. |  1    9 |
 10. |  1   10 |
 11. |  1   11 |
 12. |  1   12 |
 13. |  1   13 |
 14. |  1   14 |
 15. |  1   15 |
 16. |  1   16 |
 17. |  1   17 |
 18. |  1   18 |
 19. |  1   19 |
 20. |  1   20 |
     |---------|
 21. |  2    1 |
 22. |  2    2 |
 23. |  2    3 |
 24. |  2    4 |
 25. |  2    5 |
 26. |  2    6 |
 27. |  2    7 |
 28. |  2    8 |
 29. |  2    9 |
 30. |  2   10 |
 31. |  2   11 |
 32. |  2   12 |
 33. |  2   13 |
 34. |  2   14 |
 35. |  2   15 |
 36. |  2   16 |
 37. |  2   17 |
 38. |  2   18 |
 39. |  2   19 |
 40. |  2   20 |
     +---------+

Comment

Jannic Cutura

Join Date: Apr 2025

Posts: 125
#5

15 Mar 2017, 01:53

Dear All,
thanks your your help. Using these commands I should be able to do what I had in mind.

Basically I am trying to understand how difference-in-difference estimations work, if the date of treatment implementation is blurry.

Best,
Comment

Announcement