Hi everyone,
I am stuck with the reshaping command of my dataset. My data has 7284 unique caseids, with 60 observations for each unique caseids (437 040 observations in total) with 135 variables. I am working with a case-crossover dataset where each unique_caseid has one case period (14 days) and three control periods (14 days each, for a total of 60 days per individual). My goal is to reshape the data from long to wide so that each unique_caseid is represented by a single row, with separate columns for each control_lag and day_num.
Right now, my dataset looks like this:
What I tried:
1. [reshape wide depression anxiety dep_case_control anx_case_control interview_date, i(unique_caseid daily_date) j(control_lag)][/Reshaping using control_lag]
This resulted in number of observations: 437,040 --> 262,224 and number of variables 130 --> 144
2.[bysort unique_caseid control_lag (daily_date): gen day_num = _n][/Generate sequential day numbers // assigning numbers 1 to 15 within each control_lag]
[reshape wide depression anxiety dep_case_control anx_case_control, i(unique_caseid control_lag) j(day_num)][/reshaping using day_num instead]
This resulted in 29,136 observations, with each unique_caseid having 4 rows for the case and three control periods. (Number of variables increased from 136 to 386)
My questions:
How can I properly reshape the dataset so that each unique_caseid is represented by one row, with separate columns for each combination of control_lag and day_num?
Should I include all time-varying variables in the reshape, or is there a better approach?
Do I need to use a second reshape command following one of those that I had tried?
And, is it feasible to do this in Stata?
Any advice would be greatly appreciated!
I am stuck with the reshaping command of my dataset. My data has 7284 unique caseids, with 60 observations for each unique caseids (437 040 observations in total) with 135 variables. I am working with a case-crossover dataset where each unique_caseid has one case period (14 days) and three control periods (14 days each, for a total of 60 days per individual). My goal is to reshape the data from long to wide so that each unique_caseid is represented by a single row, with separate columns for each control_lag and day_num.
Right now, my dataset looks like this:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str18 unique_caseid float(control_lag dep_case_control depression anx_case_control anxiety daily_date) double(wbt max_temp) " 1 9 01_MZ" 0 1 1 1 1 22843 15.0600004196167 21.829999923706055 " 1 9 01_MZ" 0 1 1 1 1 22844 16.149999618530273 22.8799991607666 " 1 9 01_MZ" 0 1 1 1 1 22845 17.049999237060547 24.549999237060547 " 1 9 01_MZ" 0 1 1 1 1 22846 16.770000457763672 24.8700008392334 " 1 9 01_MZ" 0 1 1 1 1 22847 16.270000457763672 24.360000610351563 " 1 9 01_MZ" 0 1 1 1 1 22848 16.31999969482422 23.81999969482422 " 1 9 01_MZ" 0 1 1 1 1 22849 16.530000686645508 24.18000030517578 " 1 9 01_MZ" 0 1 1 1 1 22850 16.850000381469727 24.860000610351563 " 1 9 01_MZ" 0 1 1 1 1 22851 16.829999923706055 23.8700008392334 " 1 9 01_MZ" 0 1 1 1 1 22852 14.539999961853027 24.139999389648438 " 1 9 01_MZ" 0 1 1 1 1 22853 15.789999961853027 23.110000610351563 " 1 9 01_MZ" 0 1 1 1 1 22854 14.380000114440918 23.290000915527344 " 1 9 01_MZ" 0 1 1 1 1 22855 13.989999771118164 22.739999771118164 " 1 9 01_MZ" 0 1 1 1 1 22856 15.069999694824219 24.34000015258789 " 1 9 01_MZ" 0 1 1 1 1 22857 16.15999984741211 25.139999389648438 " 1 9 01_MZ" 1 0 0 0 0 22836 15.600000381469727 23.459999084472656 " 1 9 01_MZ" 1 0 0 0 0 22837 15.859999656677246 22.670000076293945 " 1 9 01_MZ" 1 0 0 0 0 22838 15.920000076293945 23.309999465942383 " 1 9 01_MZ" 1 0 0 0 0 22839 15.979999542236328 23.719999313354492 " 1 9 01_MZ" 1 0 0 0 0 22840 15.960000038146973 24.030000686645508 " 1 9 01_MZ" 1 0 0 0 0 22841 16.18000030517578 24.420000076293945 " 1 9 01_MZ" 1 0 0 0 0 22842 14.609999656677246 21.959999084472656 " 1 9 01_MZ" 1 0 0 0 0 22843 15.0600004196167 21.829999923706055 " 1 9 01_MZ" 1 0 0 0 0 22844 16.149999618530273 22.8799991607666 " 1 9 01_MZ" 1 0 0 0 0 22845 17.049999237060547 24.549999237060547 " 1 9 01_MZ" 1 0 0 0 0 22846 16.770000457763672 24.8700008392334 " 1 9 01_MZ" 1 0 0 0 0 22847 16.270000457763672 24.360000610351563 " 1 9 01_MZ" 1 0 0 0 0 22848 16.31999969482422 23.81999969482422 " 1 9 01_MZ" 1 0 0 0 0 22849 16.530000686645508 24.18000030517578 " 1 9 01_MZ" 1 0 0 0 0 22850 16.850000381469727 24.860000610351563 " 1 9 01_MZ" 2 0 0 0 0 22829 15.75 23.59000015258789 " 1 9 01_MZ" 2 0 0 0 0 22830 16.040000915527344 23.700000762939453 " 1 9 01_MZ" 2 0 0 0 0 22831 15.520000457763672 22.969999313354492 " 1 9 01_MZ" 2 0 0 0 0 22832 16.020000457763672 23.829999923706055 " 1 9 01_MZ" 2 0 0 0 0 22833 16.100000381469727 23.90999984741211 " 1 9 01_MZ" 2 0 0 0 0 22834 15.899999618530273 22.959999084472656 " 1 9 01_MZ" 2 0 0 0 0 22835 14.9399995803833 23.34000015258789 " 1 9 01_MZ" 2 0 0 0 0 22836 15.600000381469727 23.459999084472656 " 1 9 01_MZ" 2 0 0 0 0 22837 15.859999656677246 22.670000076293945 " 1 9 01_MZ" 2 0 0 0 0 22838 15.920000076293945 23.309999465942383 " 1 9 01_MZ" 2 0 0 0 0 22839 15.979999542236328 23.719999313354492 " 1 9 01_MZ" 2 0 0 0 0 22840 15.960000038146973 24.030000686645508 " 1 9 01_MZ" 2 0 0 0 0 22841 16.18000030517578 24.420000076293945 " 1 9 01_MZ" 2 0 0 0 0 22842 14.609999656677246 21.959999084472656 " 1 9 01_MZ" 2 0 0 0 0 22843 15.0600004196167 21.829999923706055 " 1 9 01_MZ" 3 0 0 0 0 22822 15.880000114440918 22.1200008392334 " 1 9 01_MZ" 3 0 0 0 0 22823 15.640000343322754 22.31999969482422 " 1 9 01_MZ" 3 0 0 0 0 22824 15.739999771118164 22.260000228881836 " 1 9 01_MZ" 3 0 0 0 0 22825 16.75 22.829999923706055 " 1 9 01_MZ" 3 0 0 0 0 22826 16.780000686645508 22.729999542236328 " 1 9 01_MZ" 3 0 0 0 0 22827 16.65999984741211 22.40999984741211 " 1 9 01_MZ" 3 0 0 0 0 22828 16.520000457763672 23.469999313354492 " 1 9 01_MZ" 3 0 0 0 0 22829 15.75 23.59000015258789 " 1 9 01_MZ" 3 0 0 0 0 22830 16.040000915527344 23.700000762939453 " 1 9 01_MZ" 3 0 0 0 0 22831 15.520000457763672 22.969999313354492 " 1 9 01_MZ" 3 0 0 0 0 22832 16.020000457763672 23.829999923706055 " 1 9 01_MZ" 3 0 0 0 0 22833 16.100000381469727 23.90999984741211 " 1 9 01_MZ" 3 0 0 0 0 22834 15.899999618530273 22.959999084472656 " 1 9 01_MZ" 3 0 0 0 0 22835 14.9399995803833 23.34000015258789 " 1 9 01_MZ" 3 0 0 0 0 22836 15.600000381469727 23.459999084472656 " 1 11 4_NP" 0 0 0 0 0 22664 1 7.929999828338623 " 1 11 4_NP" 0 0 0 0 0 22665 .7799999713897705 4.28000020980835 " 1 11 4_NP" 0 0 0 0 0 22666 .25 5.019999980926514 " 1 11 4_NP" 0 0 0 0 0 22667 -.2199999988079071 5.590000152587891 " 1 11 4_NP" 0 0 0 0 0 22668 0 4.71999979019165 " 1 11 4_NP" 0 0 0 0 0 22669 -.5699999928474426 5.050000190734863 " 1 11 4_NP" 0 0 0 0 0 22670 -.1000000014901161 5.550000190734863 " 1 11 4_NP" 0 0 0 0 0 22671 -1.370000004768372 4.53000020980835 " 1 11 4_NP" 0 0 0 0 0 22672 -2.259999990463257 3.480000019073486 " 1 11 4_NP" 0 0 0 0 0 22673 -2.009999990463257 6.389999866485596 " 1 11 4_NP" 0 0 0 0 0 22674 -2.099999904632568 8.369999885559082 " 1 11 4_NP" 0 0 0 0 0 22675 -.6800000071525574 9.640000343322754 " 1 11 4_NP" 0 0 0 0 0 22676 0 7.170000076293945 " 1 11 4_NP" 0 0 0 0 0 22677 -.699999988079071 4.440000057220459 " 1 11 4_NP" 0 0 0 0 0 22678 -1.480000019073486 7.880000114440918 " 1 11 4_NP" 1 0 0 0 0 22657 .4799999892711639 7.099999904632568 " 1 11 4_NP" 1 0 0 0 0 22658 .1899999976158142 7.380000114440918 " 1 11 4_NP" 1 0 0 0 0 22659 -.2199999988079071 7.860000133514404 " 1 11 4_NP" 1 0 0 0 0 22660 -.6899999976158142 7.21999979019165 " 1 11 4_NP" 1 0 0 0 0 22661 -2.180000066757202 9.960000038146973 " 1 11 4_NP" 1 0 0 0 0 22662 -1.350000023841858 9.4399995803833 " 1 11 4_NP" 1 0 0 0 0 22663 .5400000214576721 7.639999866485596 " 1 11 4_NP" 1 0 0 0 0 22664 1 7.929999828338623 " 1 11 4_NP" 1 0 0 0 0 22665 .7799999713897705 4.28000020980835 " 1 11 4_NP" 1 0 0 0 0 22666 .25 5.019999980926514 " 1 11 4_NP" 1 0 0 0 0 22667 -.2199999988079071 5.590000152587891 " 1 11 4_NP" 1 0 0 0 0 22668 0 4.71999979019165 " 1 11 4_NP" 1 0 0 0 0 22669 -.5699999928474426 5.050000190734863 " 1 11 4_NP" 1 0 0 0 0 22670 -.1000000014901161 5.550000190734863 " 1 11 4_NP" 1 0 0 0 0 22671 -1.370000004768372 4.53000020980835 " 1 11 4_NP" 2 0 0 0 0 22650 1.559999942779541 9.119999885559082 " 1 11 4_NP" 2 0 0 0 0 22651 2.720000028610229 8.579999923706055 " 1 11 4_NP" 2 0 0 0 0 22652 2.700000047683716 9.710000038146973 " 1 11 4_NP" 2 0 0 0 0 22653 3.380000114440918 10.52000045776367 " 1 11 4_NP" 2 0 0 0 0 22654 2.529999971389771 10.13000011444092 " 1 11 4_NP" 2 0 0 0 0 22655 2.559999942779541 8.140000343322754 " 1 11 4_NP" 2 0 0 0 0 22656 .449999988079071 9.34000015258789 " 1 11 4_NP" 2 0 0 0 0 22657 .4799999892711639 7.099999904632568 " 1 11 4_NP" 2 0 0 0 0 22658 .1899999976158142 7.380000114440918 " 1 11 4_NP" 2 0 0 0 0 22659 -.2199999988079071 7.860000133514404 end format %td daily_date
1. [reshape wide depression anxiety dep_case_control anx_case_control interview_date, i(unique_caseid daily_date) j(control_lag)][/Reshaping using control_lag]
This resulted in number of observations: 437,040 --> 262,224 and number of variables 130 --> 144
2.[bysort unique_caseid control_lag (daily_date): gen day_num = _n][/Generate sequential day numbers // assigning numbers 1 to 15 within each control_lag]
[reshape wide depression anxiety dep_case_control anx_case_control, i(unique_caseid control_lag) j(day_num)][/reshaping using day_num instead]
This resulted in 29,136 observations, with each unique_caseid having 4 rows for the case and three control periods. (Number of variables increased from 136 to 386)
My questions:
How can I properly reshape the dataset so that each unique_caseid is represented by one row, with separate columns for each combination of control_lag and day_num?
Should I include all time-varying variables in the reshape, or is there a better approach?
Do I need to use a second reshape command following one of those that I had tried?
And, is it feasible to do this in Stata?
Any advice would be greatly appreciated!
Comment