Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create my own panel data by specifing the underlying DGP

    Dear All,
    I am trying to generate my own panel data. I know how to create purely cross sectional data:

    Code:
    clear
    set obs 10000
    gen x1 = rnormal(1,2)
    gen x2= rnormal(0,3)
    gen eps = rnormal(0,4)
    gen y = 3*x1+2*x2+eps
    reg y x1 x2
    Basically I create the GDP and can use regression commands to recover the true coeffcients. I would like to do the same thing but with panel data. Googling only led to the sample command which is used to sue a subsample of a given dataset, but I cannot find anything that helps me create (random) data based on a specified GDP.

    Can anyone point me to the correct commands?

    Thanks in advance!

    Best,

  • #2
    You aren't very clear about what sort of attributes you want your panel data to have; but suppose you want these to be countries with panels of years. In addition, you want the distribution across countries to be N(0.1). Then something like:

    Code:
    set seed 20170314
    set obs 100
    gen id_country=_n
    gen gamma=rnormal(0,1)   // panel level effect
    expand 20
    bys id_country : gen id_year=_n
    gen x1=rnormal(1,2)
    gen x2=rnormal(0,3)
    gen eps = rnormal(0,4)
    gen y=3*x1+2*x2+eps + gamma
    mixed y x1 x2 || id_country:
    will set up panel data where -gamma- is a random effect across countries, constant within countries.

    hth,
    Jeph

    Comment


    • #3
      Jannic:
      perhaps something along the following lines?
      Code:
      . set obs 10
      number of observations (_N) was 0, now 10
      
      . gen x1 = rnormal(1,2)
      
      . gen x2= rnormal(0,3)
      
      . gen u = rnormal(0,2)
      
      . gen eps = rnormal(0,4)
      
      . g id=1
      
      . expand 4
      (30 observations created)
      
      . replace id=2 in 11/20
      (10 real changes made)
      
      . replace id=3 in 21/30
      (10 real changes made)
      
      . replace id=4 in 31/40
      (10 real changes made)
      
      . bysort id: g year=_n
      
      . gen y = 3*x1+2*x2+eps+u
      
      . xtset id year
             panel variable:  id (strongly balanced)
              time variable:  year, 1 to 10
                      delta:  1 unit
      
      . xtreg y x1 x2, fe
      
      Fixed-effects (within) regression               Number of obs     =         40
      Group variable: id                              Number of groups  =          4
      
      R-sq:                                           Obs per group:
           within  = 0.8188                                         min =         10
           between = 0.2409                                         avg =       10.0
           overall = 0.7755                                         max =         10
      
                                                      F(2,34)           =      76.82
      corr(u_i, Xb)  = -0.0472                        Prob > F          =     0.0000
      
      ------------------------------------------------------------------------------
                 y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                x1 |   3.260157    .270717    12.04   0.000     2.709994     3.81032
                x2 |   1.288454   .2803894     4.60   0.000     .7186338    1.858273
             _cons |  -.7372457   .7182126    -1.03   0.312    -2.196829    .7223379
      -------------+----------------------------------------------------------------
           sigma_u |  2.2398954
           sigma_e |  3.6437468
               rho |  .27424977   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      F test that all u_i=0: F(3, 34) = 3.71                       Prob > F = 0.0207
      
      . reg y x1 x2 i.id
      
            Source |       SS           df       MS      Number of obs   =        40
      -------------+----------------------------------   F(5, 34)        =     33.56
             Model |  2227.92613         5  445.585226   Prob > F        =    0.0000
          Residual |   451.41429        34  13.2768909   R-squared       =    0.8315
      -------------+----------------------------------   Adj R-squared   =    0.8067
             Total |  2679.34042        39  68.7010364   Root MSE        =    3.6437
      
      ------------------------------------------------------------------------------
                 y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                x1 |   3.260157    .270717    12.04   0.000     2.709994     3.81032
                x2 |   1.288454   .2803894     4.60   0.000     .7186338    1.858273
                   |
                id |
                2  |  -3.165829    1.63639    -1.93   0.061    -6.491373    .1597147
                3  |   1.676938    1.64513     1.02   0.315    -1.666367    5.020244
                4  |   1.488891   1.671284     0.89   0.379    -1.907567    4.885349
                   |
             _cons |  -.7372457    1.22947    -0.60   0.553    -3.235829    1.761338
      ------------------------------------------------------------------------------
      Due to the limited number of clusters, I deliberatley omitted clustered standard errors in pooled OLS.

      PS: Crossed in the cybers space with Jeph's reply, who took a different road.
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        For synthetic examples, I often use egen's seq() to create panel identifiers and time variables, but its slightly bizarre syntax may take some getting used to.

        Code:
        . clear
        
        . set obs 200
        number of observations (_N) was 0, now 200
        
        . egen id = seq(), block(20)
        
        . egen t = seq(), to(20)
        
        . l id t in 1/40, sepby(id)
        
             +---------+
             | id    t |
             |---------|
          1. |  1    1 |
          2. |  1    2 |
          3. |  1    3 |
          4. |  1    4 |
          5. |  1    5 |
          6. |  1    6 |
          7. |  1    7 |
          8. |  1    8 |
          9. |  1    9 |
         10. |  1   10 |
         11. |  1   11 |
         12. |  1   12 |
         13. |  1   13 |
         14. |  1   14 |
         15. |  1   15 |
         16. |  1   16 |
         17. |  1   17 |
         18. |  1   18 |
         19. |  1   19 |
         20. |  1   20 |
             |---------|
         21. |  2    1 |
         22. |  2    2 |
         23. |  2    3 |
         24. |  2    4 |
         25. |  2    5 |
         26. |  2    6 |
         27. |  2    7 |
         28. |  2    8 |
         29. |  2    9 |
         30. |  2   10 |
         31. |  2   11 |
         32. |  2   12 |
         33. |  2   13 |
         34. |  2   14 |
         35. |  2   15 |
         36. |  2   16 |
         37. |  2   17 |
         38. |  2   18 |
         39. |  2   19 |
         40. |  2   20 |
             +---------+

        Comment


        • #5
          Dear All,
          thanks your your help. Using these commands I should be able to do what I had in mind.

          Basically I am trying to understand how difference-in-difference estimations work, if the date of treatment implementation is blurry.

          Best,

          Comment

          Working...
          X