Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining cross-sectional and time series data into one panel

    Dear Stata users,

    I have cross-sectional data of funds in one dataset and time series data of these funds in another. I need to generate a combined panel dataset of these both (to run regressions on the time series variables while keeping some cross-sectional variables constant). The funds can be uniquely identified.

    I assume it requires a series of merge and reshape commands, but I wasn't able to figure out the solution myself.
    Below are short examples of the two datasets and the intended result (created manually to illustrate the desired outcome). The real datasets have around 100 cross-sectional variables and contain time series data for around 1.000 funds .

    1. Cross-sectional data

    Code:
    clear
    input byte(fund rating1 rating2) double(score1 score2) byte style
    1 1 2 .56    2 0
    2 1 2 .55  2.3 0
    3 4 3 .55 1.95 1
    4 5 3 .81 2.23 1
    5 5 5  .8  2.1 0
    end
    2. Time series data

    Code:
    clear
    input int Day byte t str4 period double fund1 int(fund2 fund3 fund4 fund5) double(return1 return2 return3 return4 return5) int index
    21550 1 "pre"    87 2000 100 341 320                   .                    .                    .                   .                    . 100
    21551 2 "pre"    88 2080  98 341 335 .011494252873563315  .040000000000000036 -.020000000000000018                   0              .046875 103
    21552 3 "post" 88.5 2054  97 289 341 .005681818181818121 -.012499999999999956 -.010204081632653073 -.15249266862170086  .017910447761193993 104
    21553 4 "post"   90 2180 103 295 329 .016949152542372836   .06134371957156759  .061855670103092786  .02076124567474058 -.035190615835777095 102
    end
    format %tddd-Mon-YY Day

    3. Desired result as panel


    Code:
    clear
    input int Day byte(t fund) double return int index str4 period byte(rating1 rating2) double(score1 score2) byte style
    21550 1 1                    . 100 "pre"  1 2 .56    2 0
    21550 1 2                    . 100 "pre"  1 2 .55  2.3 0
    21550 1 3                    . 100 "pre"  4 3 .55 1.95 1
    21550 1 4                    . 100 "pre"  5 3 .81 2.23 1
    21550 1 5                    . 100 "pre"  5 5  .8  2.1 0
    21551 2 1  .011494252873563315 103 "pre"  1 2 .56    2 0
    21551 2 2  .040000000000000036 103 "pre"  1 2 .55  2.3 0
    21551 2 3 -.020000000000000018 103 "pre"  4 3 .55 1.95 1
    21551 2 4                    0 103 "pre"  5 3 .81 2.23 1
    21551 2 5              .046875 103 "pre"  5 5  .8  2.1 0
    21552 3 1  .005681818181818121 104 "post" 1 2 .56    2 0
    21552 3 2 -.012499999999999956 104 "post" 1 2 .55  2.3 0
    21552 3 3 -.010204081632653073 104 "post" 4 3 .55 1.95 1
    21552 3 4  -.15249266862170086 104 "post" 5 3 .81 2.23 1
    21552 3 5  .017910447761193993 104 "post" 5 5  .8  2.1 0
    21553 4 1  .016949152542372836 102 "post" 1 2 .56    2 0
    21553 4 2   .06134371957156759 102 "post" 1 2 .55  2.3 0
    21553 4 3  .061855670103092786 102 "post" 4 3 .55 1.95 1
    21553 4 4   .02076124567474058 102 "post" 5 3 .81 2.23 1
    21553 4 5 -.035190615835777095 102 "post" 5 5  .8  2.1 0
    end
    format %tdnn/dd/CCYY Day
    Best regards,
    Daniel

  • #2
    You need to reshape long the dataset with returns. If it is too large, see tolong from SSC which is faster in reshaping.

    Code:
    clear
    input byte(fund rating1 rating2) double(score1 score2) byte style
    1 1 2 .56    2 0
    2 1 2 .55  2.3 0
    3 4 3 .55 1.95 1
    4 5 3 .81 2.23 1
    5 5 5  .8  2.1 0
    end
    
    rename fund id
    tempfile tomerge
    save `tomerge'
    
    
    clear
    input int Day byte t str4 period double fund1 int(fund2 fund3 fund4 fund5) double(return1 return2 return3 return4 return5) int index
    21550 1 "pre"    87 2000 100 341 320                   .                    .                    .                   .                    . 100
    21551 2 "pre"    88 2080  98 341 335 .011494252873563315  .040000000000000036 -.020000000000000018                   0              .046875 103
    21552 3 "post" 88.5 2054  97 289 341 .005681818181818121 -.012499999999999956 -.010204081632653073 -.15249266862170086  .017910447761193993 104
    21553 4 "post"   90 2180 103 295 329 .016949152542372836   .06134371957156759  .061855670103092786  .02076124567474058 -.035190615835777095 102
    end
    format %tddd-Mon-YY Day
    
    reshape long fund return, i(Day) j(id)
    merge m:1 id using `tomerge', nogen
    Res.:

    Code:
    . sort id t
    
    . l, sepby(id)
    
         +------------------------------------------------------------------------------------------------------+
         |      Day   id   t   period   fund       return   index   rating1   rating2   score1   score2   style |
         |------------------------------------------------------------------------------------------------------|
      1. | 1-Jan-19    1   1      pre     87            .     100         1         2      .56        2       0 |
      2. | 2-Jan-19    1   2      pre     88    .01149425     103         1         2      .56        2       0 |
      3. | 3-Jan-19    1   3     post   88.5    .00568182     104         1         2      .56        2       0 |
      4. | 4-Jan-19    1   4     post     90    .01694915     102         1         2      .56        2       0 |
         |------------------------------------------------------------------------------------------------------|
      5. | 1-Jan-19    2   1      pre   2000            .     100         1         2      .55      2.3       0 |
      6. | 2-Jan-19    2   2      pre   2080          .04     103         1         2      .55      2.3       0 |
      7. | 3-Jan-19    2   3     post   2054       -.0125     104         1         2      .55      2.3       0 |
      8. | 4-Jan-19    2   4     post   2180    .06134372     102         1         2      .55      2.3       0 |
         |------------------------------------------------------------------------------------------------------|
      9. | 1-Jan-19    3   1      pre    100            .     100         4         3      .55     1.95       1 |
     10. | 2-Jan-19    3   2      pre     98         -.02     103         4         3      .55     1.95       1 |
     11. | 3-Jan-19    3   3     post     97   -.01020408     104         4         3      .55     1.95       1 |
     12. | 4-Jan-19    3   4     post    103    .06185567     102         4         3      .55     1.95       1 |
         |------------------------------------------------------------------------------------------------------|
     13. | 1-Jan-19    4   1      pre    341            .     100         5         3      .81     2.23       1 |
     14. | 2-Jan-19    4   2      pre    341            0     103         5         3      .81     2.23       1 |
     15. | 3-Jan-19    4   3     post    289   -.15249267     104         5         3      .81     2.23       1 |
     16. | 4-Jan-19    4   4     post    295    .02076125     102         5         3      .81     2.23       1 |
         |------------------------------------------------------------------------------------------------------|
     17. | 1-Jan-19    5   1      pre    320            .     100         5         5       .8      2.1       0 |
     18. | 2-Jan-19    5   2      pre    335      .046875     103         5         5       .8      2.1       0 |
     19. | 3-Jan-19    5   3     post    341    .01791045     104         5         5       .8      2.1       0 |
     20. | 4-Jan-19    5   4     post    329   -.03519062     102         5         5       .8      2.1       0 |
         +------------------------------------------------------------------------------------------------------+
    
    .

    Comment


    • #3
      Dear Andrew,
      thanks a lot!
      This works great for my example above and I will implement it for the real dataset shortly. I very much appreciate your contribution to this forum and the very fast help provided here.
      best, Daniel

      Comment


      • #4
        Hi Andrew,

        just to follow up on the implementation with the large dataset: It works fine. It takes a moment to run but that is not an issue for me. Thanks again. Best, Daniel

        Comment

        Working...
        X