Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transpose dataset with variable dates

    Hi,

    I would like to transpose the following dataset, in a way that I change the format from long to wide (output with 3 variables: date, SP500_w and VIX_w). I would like to have dates recognized as Stata daily dates.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 DATATYPE double(__7_8_2020 __7_1_2020 _6_24_2020 _6_17_2020 _6_10_2020 __6_3_2020 _5_27_2020 _5_20_2020 _5_13_2020)
    "SP500_w" 6496.14 6383.76 6247.66 6375.54 6529.42 6389.67 6209.38 6075.76 5761.7
    "VIX_w"     28.08   28.62   33.84   33.47   27.57   25.66   27.62   27.99  35.28
    end

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 DATATYPE double(__7_8_2020 __7_1_2020 _6_24_2020 _6_17_2020 _6_10_2020 __6_3_2020 _5_27_2020 _5_20_2020 _5_13_2020)
    "SP500_w" 6496.14 6383.76 6247.66 6375.54 6529.42 6389.67 6209.38 6075.76 5761.7
    "VIX_w"     28.08   28.62   33.84   33.47   27.57   25.66   27.62   27.99  35.28
    end
    
    reshape long _, i(DATATYPE) j(date) string 
    reshape wide _, i(date) j(DATATYPE) string
    rename _* *
    gen ddate = daily(subinstr(date, "_", " ", .), "MDY")
    drop date
    rename ddate date
    format date %td
    
    list 
    
         +-----------------------------+
         | SP500_w   VIX_w        date |
         |-----------------------------|
      1. |  5761.7   35.28   13may2020 |
      2. | 6075.76   27.99   20may2020 |
      3. | 6209.38   27.62   27may2020 |
      4. | 6529.42   27.57   10jun2020 |
      5. | 6375.54   33.47   17jun2020 |
         |-----------------------------|
      6. | 6247.66   33.84   24jun2020 |
      7. | 6389.67   25.66   03jun2020 |
      8. | 6383.76   28.62   01jul2020 |
      9. | 6496.14   28.08   08jul2020 |
         +-----------------------------+
    I am fairly fluent with reshape but I wouldn't want you supposing that I can just write down code like that (which doesn't rule out the possibility that some people can do it).

    I work with examples and go one step at a time and at each stage list the data and make little mistakes that aren't mirrored here.

    Several little tricks and examples are bundled together at https://www.stata.com/support/faqs/d...-with-reshape/

    Comment


    • #3
      Elegant solution Nick, thank you so much.

      I'm actually facing further problem trying to adapt you code to transpose similar data. Here is the initial dataset:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str6 Code str8 DATATYPE double(__7_8_2020 __7_1_2020 _6_24_2020 _6_17_2020 _6_10_2020)
      "130062" "NOSH"    62241    62241    62241    62241    62241
      "130062" "RI"    1235.32  1235.79  1221.13   1246.2  1209.89
      "130062" "VA"   539505.4 665368.6   567506 698412.8 804628.6
      "130062" "VO"     2534.2   3245.3   2696.1   3427.6   3705.6
      "130088" "NOSH"   115630   115630   115630   115630   115630
      "130088" "RI"   10681.35 10450.57 10345.02  9829.89  9553.43
      "130088" "VA"   383969.9 711588.2  1082466   742027 774348.9
      "130088" "VO"     2867.9     5434   8415.6   6098.8   6354.5
      "130298" "NOSH"   106319   106319   106319   106319   106319
      "130298" "RI"    1664.62  1624.26  1565.75  1690.68  1781.63
      "130298" "VA"   359615.1 388041.2 374222.1 449372.5 556525.3
      "130298" "VO"     4563.9   5147.2   4877.2   5798.5   6668.2
      "130502" "NOSH"   140731   140731   140731   140731   140731
      "130502" "RI"    2408.26  2498.59  2356.59  2530.26  2664.13
      "130502" "VA"   643893.7   741344  1279203  1088418  1130438
      "130502" "VO"     4089.8   4827.5   8180.4   6681.7   6640.4
      "130591" "NOSH"   234777   234777   234777   234777   234777
      "130591" "RI"   13981.53 13881.43 13343.17 14334.36 14670.76
      "130591" "VA"   420396.1 889664.2 688763.2 591583.8 787584.4
      "130591" "VO"     4948.3  10909.4   8167.9   6860.1   8733.8
      "131844" "NOSH"    65417    65417    65417    65417    65417
      "131844" "RI"    3843.79  4054.68  3823.11  4080.42  4369.12
      "131844" "VA"   260664.1 599888.3   810109 710148.6 711030.9
      "131844" "VO"     1123.2   2619.5   3450.4   2880.8   2685.7
      "132808" "NOSH"   148864   148864   148864   148864   148864
      "132808" "RI"    1247.89  1294.24   1207.2  1304.91  1378.94
      "132808" "VA"   136387.5 177070.5 265120.3 238095.6 253315.6
      "132808" "VO"     3547.4   4758.2   6971.1   6069.5   6013.3
      "134057" "NOSH"   343416   343416   343416   343416   343416
      "134057" "RI"    3685.79     3847  3729.72  3955.53   3897.2
      "134057" "VA"   432393.3 711989.3  1646955  1027290  1005960
      "134057" "VO"     7233.5  11969.3  26701.2  16709.9  16140.1
      "13466Q" "NOSH"    78763    78763    78763    78763    78763
      "13466Q" "RI"    1608.03  1581.75  1488.23  1514.57  1545.47
      "13466Q" "VA"   697042.4  1138761  1102434  1247989  1849232
      "13466Q" "VO"     2485.1   4294.1   4203.2     4932   6614.3
      "13471D" "NOSH"   299106   299106   299106   299106   299106
      "13471D" "RI"    3755.47  3823.93   3721.8  4035.32  4210.14
      "13471D" "VA"    1095560  3614513  1625630  1770625  1589028
      "13471D" "VO"     6415.9  21891.3   9245.1   9812.8   8476.6
      "134784" "NOSH"   162116   162116   162116   162116   162116
      "134784" "RI"    1115.62  1164.55  1161.84  1191.86   1191.4
      "134784" "VA"   640644.9  1144186 835340.6 932536.8  1056524
      "134784" "VO"     4309.6   7628.9   5411.9   6174.5   6571.3
      "134843" "NOSH"   455310   455310   455310   455310   455310
      "134843" "RI"    1500.03  1483.17  1599.48  1746.33  1754.39
      "134843" "VA"   744345.5  1498403  1336062  1681192  2378160
      "134843" "VO"    12145.4  23904.2  19492.4  23960.1    30605
      "135448" "NOSH"   285722   285722   285722   285722   285722
      "135448" "RI"    2117.19  2293.02  2278.78  2337.06  2207.26
      "135448" "VA"   357973.8   609731 677992.2 663941.8 734588.9
      "135448" "VO"    10871.8  18188.6  18871.4  19792.2  20762.8
      "13967E" "NOSH"  1427464  1427464  1427464  1427464  1427464
      "13967E" "RI"     404.27      403   397.29   413.53   416.67
      "13967E" "VA"   962949.3  1571509  1511934  1732693  2283975
      "13967E" "VO"    18712.2  31033.3  28894.8  33521.2  43531.7
      "13967H" "NOSH"   128763   128763   128736   128736   128736
      "13967H" "RI"     695.67   681.74   664.04   686.65   692.78
      "13967H" "VA"   457232.6 793560.3  1005953 664288.4 740561.7
      "13967H" "VO"       2315   4048.3   5211.4   3487.5   3648.7
      "14015T" "NOSH"   617834   617834   617834   617834   617834
      "14015T" "RI"    1357.74  1358.33  1313.08   1375.5  1397.96
      "14015T" "VA"    1681775  3255523  2691214  2874547  2503024
      "14015T" "VO"    12280.2  24619.5  19660.3  20824.8  17846.1
      "141405" "NOSH"   144668   144668   144668   144668   144668
      "141405" "RI"   33016.46  32697.8 31155.94 29535.27 28524.49
      "141405" "VA"   437521.6 881680.9 752459.8 628094.3 552660.6
      "141405" "VO"     2265.7   4698.7   4234.8     3806   3394.4
      "141545" "NOSH"   190289   190289   190289   190289   190289
      "141545" "RI"     822.39   829.54   833.83   875.73   932.66
      "141545" "VA"   267035.3 653242.6 352065.8 469674.1 521545.3
      "141545" "VO"     4540.3  11435.5   5824.4   7633.1   7487.9
      "14352P" "NOSH"   663705   663705   663705   663705   637000
      "14352P" "RI"    1902.71  1865.34  1754.51  1783.71  1820.48
      "14352P" "VA"    1626901  3329760  2717709  2549899  2274901
      "14352P" "VO"     7475.4    15548    13350    12697  10963.8
      "14361V" "NOSH"   206802   206802   206802   206802   206802
      "14361V" "RI"     430.77    432.3   427.35   462.96   485.44
      "14361V" "VA"   718086.6  1467717 887172.9  1008142  1080796
      "14361V" "VO"     6040.4  12590.8   7201.8     7985   8001.1
      "14698C" "NOSH"   273868   273868   273868   273868   273868
      "14698C" "RI"     311.74   299.01   294.39   319.03   330.13
      "14698C" "VA"   252438.7 395968.6 430797.3 360771.6 443029.9
      "14698C" "VO"     6057.1   9868.7  10343.1   8528.3   9512.4
      "14737P" "NOSH"   252116   252116   252116   252116   252116
      "14737P" "RI"    1490.21  1525.06  1460.53  1508.26  1637.79
      "14737P" "VA"    1024766  1857121  1658660  2208064  2212400
      "14737P" "VO"     3840.6   7138.6   6268.2   8298.1   7612.9
      "14806F" "NOSH"    69101    69101    69101    69101    69101
      "14806F" "RI"    1036.85  1078.64  1086.62  1127.57  1064.91
      "14806F" "VA"   477624.3 585249.3 845558.4 713409.4 567354.9
      "14806F" "VO"     3468.5   4132.1   5705.2   5101.4   3930.1
      "14861V" "NOSH"   395000   395000   395000   395000   395000
      "14861V" "RI"     330.61    320.6   324.79   347.21    369.3
      "14861V" "VA"   557128.1  1216360  1266214  1091709  1634402
      "14861V" "VO"     9232.6  20643.1  20185.9  17161.3  23527.9
      "14863U" "NOSH"   579129   579129   579129   579129   579129
      "14863U" "RI"    4576.04  4563.48  4317.21  4432.32  4493.72
      "14863U" "VA"   742887.7  2512052  1508385  1356909  1307599
      "14863U" "VO"      11209  40853.8  23776.6  21769.3  20036.1
      end
      Then I follow the pattern you provided and check there are no duplicates for the next step (none detected):
      Code:
      reshape long _, i(Code DATATYPE) j(date) string 
      drop if Code==""
      gsort Code  date DATATYPE
      by Code  date DATATYPE :  gen dup = cond(_N==1,0,_n)
      browse if dup>0
      drop dup
      I then run the following, which outputs an error. I am probably not adapting you code properly. Do you know where the problem is from?

      Code:
      reshape wide _, i(date) j(Code DATATYPE) string
      Code:
      . reshape wide _, i(date) j(Code DATATYPE) string
      values of variable Code not unique within date
          Your data are currently long.  You are performing a reshape wide.  You specified i(date) and j(Code).  There are observations within i(date) with the same value of j(Code).  In the long data, variables i() and j() together must uniquely
          identify the observations.
      
               long                                wide
              +---------------+                   +------------------+
              | i   j   a   b |                   | i   a1 a2  b1 b2 |
              |---------------| <--- reshape ---> |------------------|
              | 1   1   1   2 |                   | 1   1   3   2  4 |
              | 1   2   3   4 |                   | 2   5   7   6  8 |
              | 2   1   5   6 |                   +------------------+
              | 2   2   7   8 |
              +---------------+
          Type reshape error for a list of the problem variables.
      r(9);
      
      end of do-file

      Comment


      • #4
        As you suspect, the reshape wide is incorrect. j() is what identifies distinct variables in what you want and i() is what identifies distinct observations.


        Code:
        reshape long _, i(Code DATATYPE) j(date) string
        isid Code date DATATYPE
        
        reshape wide _, i(date Code) j(DATATYPE) string
        
        rename _* *
        gen ddate = daily(subinstr(date, "_", " ", .), "MDY")
        drop date
        rename ddate date
        format date %td
        
        list
        Last edited by Nick Cox; 16 Jul 2020, 09:41.

        Comment


        • #5
          Many many thanks Nick Cox

          Comment

          Working...
          X