Combining cross-sectional and time series data into one panel

Daniel Gilcher

Join Date: Apr 2022
Posts: 9

Combining cross-sectional and time series data into one panel

13 Apr 2022, 00:39

Dear Stata users,

I have cross-sectional data of funds in one dataset and time series data of these funds in another. I need to generate a combined panel dataset of these both (to run regressions on the time series variables while keeping some cross-sectional variables constant). The funds can be uniquely identified.

I assume it requires a series of merge and reshape commands, but I wasn't able to figure out the solution myself.
Below are short examples of the two datasets and the intended result (created manually to illustrate the desired outcome). The real datasets have around 100 cross-sectional variables and contain time series data for around 1.000 funds .

1. Cross-sectional data

Code:

clear
input byte(fund rating1 rating2) double(score1 score2) byte style
1 1 2 .56    2 0
2 1 2 .55  2.3 0
3 4 3 .55 1.95 1
4 5 3 .81 2.23 1
5 5 5  .8  2.1 0
end

2. Time series data

Code:

clear
input int Day byte t str4 period double fund1 int(fund2 fund3 fund4 fund5) double(return1 return2 return3 return4 return5) int index
21550 1 "pre"    87 2000 100 341 320                   .                    .                    .                   .                    . 100
21551 2 "pre"    88 2080  98 341 335 .011494252873563315  .040000000000000036 -.020000000000000018                   0              .046875 103
21552 3 "post" 88.5 2054  97 289 341 .005681818181818121 -.012499999999999956 -.010204081632653073 -.15249266862170086  .017910447761193993 104
21553 4 "post"   90 2180 103 295 329 .016949152542372836   .06134371957156759  .061855670103092786  .02076124567474058 -.035190615835777095 102
end
format %tddd-Mon-YY Day

3. Desired result as panel

Code:

clear
input int Day byte(t fund) double return int index str4 period byte(rating1 rating2) double(score1 score2) byte style
21550 1 1                    . 100 "pre"  1 2 .56    2 0
21550 1 2                    . 100 "pre"  1 2 .55  2.3 0
21550 1 3                    . 100 "pre"  4 3 .55 1.95 1
21550 1 4                    . 100 "pre"  5 3 .81 2.23 1
21550 1 5                    . 100 "pre"  5 5  .8  2.1 0
21551 2 1  .011494252873563315 103 "pre"  1 2 .56    2 0
21551 2 2  .040000000000000036 103 "pre"  1 2 .55  2.3 0
21551 2 3 -.020000000000000018 103 "pre"  4 3 .55 1.95 1
21551 2 4                    0 103 "pre"  5 3 .81 2.23 1
21551 2 5              .046875 103 "pre"  5 5  .8  2.1 0
21552 3 1  .005681818181818121 104 "post" 1 2 .56    2 0
21552 3 2 -.012499999999999956 104 "post" 1 2 .55  2.3 0
21552 3 3 -.010204081632653073 104 "post" 4 3 .55 1.95 1
21552 3 4  -.15249266862170086 104 "post" 5 3 .81 2.23 1
21552 3 5  .017910447761193993 104 "post" 5 5  .8  2.1 0
21553 4 1  .016949152542372836 102 "post" 1 2 .56    2 0
21553 4 2   .06134371957156759 102 "post" 1 2 .55  2.3 0
21553 4 3  .061855670103092786 102 "post" 4 3 .55 1.95 1
21553 4 4   .02076124567474058 102 "post" 5 3 .81 2.23 1
21553 4 5 -.035190615835777095 102 "post" 5 5  .8  2.1 0
end
format %tdnn/dd/CCYY Day

Best regards,
Daniel

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10089

13 Apr 2022, 02:39

You need to reshape long the dataset with returns. If it is too large, see tolong from SSC which is faster in reshaping.

Code:

clear
input byte(fund rating1 rating2) double(score1 score2) byte style
1 1 2 .56    2 0
2 1 2 .55  2.3 0
3 4 3 .55 1.95 1
4 5 3 .81 2.23 1
5 5 5  .8  2.1 0
end

rename fund id
tempfile tomerge
save `tomerge'


clear
input int Day byte t str4 period double fund1 int(fund2 fund3 fund4 fund5) double(return1 return2 return3 return4 return5) int index
21550 1 "pre"    87 2000 100 341 320                   .                    .                    .                   .                    . 100
21551 2 "pre"    88 2080  98 341 335 .011494252873563315  .040000000000000036 -.020000000000000018                   0              .046875 103
21552 3 "post" 88.5 2054  97 289 341 .005681818181818121 -.012499999999999956 -.010204081632653073 -.15249266862170086  .017910447761193993 104
21553 4 "post"   90 2180 103 295 329 .016949152542372836   .06134371957156759  .061855670103092786  .02076124567474058 -.035190615835777095 102
end
format %tddd-Mon-YY Day

reshape long fund return, i(Day) j(id)
merge m:1 id using `tomerge', nogen

Res.:

Code:

. sort id t

. l, sepby(id)

     +------------------------------------------------------------------------------------------------------+
     |      Day   id   t   period   fund       return   index   rating1   rating2   score1   score2   style |
     |------------------------------------------------------------------------------------------------------|
  1. | 1-Jan-19    1   1      pre     87            .     100         1         2      .56        2       0 |
  2. | 2-Jan-19    1   2      pre     88    .01149425     103         1         2      .56        2       0 |
  3. | 3-Jan-19    1   3     post   88.5    .00568182     104         1         2      .56        2       0 |
  4. | 4-Jan-19    1   4     post     90    .01694915     102         1         2      .56        2       0 |
     |------------------------------------------------------------------------------------------------------|
  5. | 1-Jan-19    2   1      pre   2000            .     100         1         2      .55      2.3       0 |
  6. | 2-Jan-19    2   2      pre   2080          .04     103         1         2      .55      2.3       0 |
  7. | 3-Jan-19    2   3     post   2054       -.0125     104         1         2      .55      2.3       0 |
  8. | 4-Jan-19    2   4     post   2180    .06134372     102         1         2      .55      2.3       0 |
     |------------------------------------------------------------------------------------------------------|
  9. | 1-Jan-19    3   1      pre    100            .     100         4         3      .55     1.95       1 |
 10. | 2-Jan-19    3   2      pre     98         -.02     103         4         3      .55     1.95       1 |
 11. | 3-Jan-19    3   3     post     97   -.01020408     104         4         3      .55     1.95       1 |
 12. | 4-Jan-19    3   4     post    103    .06185567     102         4         3      .55     1.95       1 |
     |------------------------------------------------------------------------------------------------------|
 13. | 1-Jan-19    4   1      pre    341            .     100         5         3      .81     2.23       1 |
 14. | 2-Jan-19    4   2      pre    341            0     103         5         3      .81     2.23       1 |
 15. | 3-Jan-19    4   3     post    289   -.15249267     104         5         3      .81     2.23       1 |
 16. | 4-Jan-19    4   4     post    295    .02076125     102         5         3      .81     2.23       1 |
     |------------------------------------------------------------------------------------------------------|
 17. | 1-Jan-19    5   1      pre    320            .     100         5         5       .8      2.1       0 |
 18. | 2-Jan-19    5   2      pre    335      .046875     103         5         5       .8      2.1       0 |
 19. | 3-Jan-19    5   3     post    341    .01791045     104         5         5       .8      2.1       0 |
 20. | 4-Jan-19    5   4     post    329   -.03519062     102         5         5       .8      2.1       0 |
     +------------------------------------------------------------------------------------------------------+

.

Comment

Daniel Gilcher

Join Date: Apr 2022

Posts: 9
#3

13 Apr 2022, 02:54

Dear Andrew,
thanks a lot!
This works great for my example above and I will implement it for the real dataset shortly. I very much appreciate your contribution to this forum and the very fast help provided here.
best, Daniel
Comment
Daniel Gilcher

Join Date: Apr 2022

Posts: 9
#4

24 Apr 2022, 23:23

Hi Andrew,

just to follow up on the implementation with the large dataset: It works fine. It takes a moment to run but that is not an issue for me. Thanks again. Best, Daniel
Comment

Announcement

Combining cross-sectional and time series data into one panel

Comment

Comment

Comment