Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata Mixed Date Format in a single Column

    I have a very hard problem to solve.

    Background:

    In excel you must write dates before 1900 a certain way and after 1900 a certain way. My dataset which tracks the price of the S&P500(I am not a financial service worker, just a student trying to learn stata before my class on econometrics) has dates before 1900, and after 1900 all the way to 2000. Due to this, the data has different date formats.


    It's a very large dataset, but using dataex here it is.

    For before 1900 dates

    "1871-01-01" 4.44 .26 .4 12.46 5.32 89 5.21 8.02 .
    "1871-02-01" 4.5 .26 .4 12.84 5.32 87.53 5.06 7.78 .
    "1871-03-01" 4.61 .26 .4 13.03 5.33 88.36 4.98 7.67 .
    "1871-04-01" 4.74 .26 .4 12.56 5.33 94.29 5.17 7.96 .
    "1871-05-01" 4.86 .26 .4 12.27 5.33 98.93 5.29 8.14 .
    "1871-06-01" 4.82 .26 .4 12.08 5.34 99.66 5.38 8.27 .

    For after 1900 dates

    date sp500 dividend earnings consumerpriceindex longinterestrate realprice realdividend realearnings pe10
    12/1/1916 9.8 .56 1.53 11.6 4.21 211.07 12.06 32.95 11.41
    1/1/1917 9.57 .57 1.51 11.7 4.23 204.35 12.19 32.22 10.99
    2/1/1917 9.03 .58 1.49 12 4.26 188 12.11 30.98 10.06
    3/1/1917 9.31 .59 1.47 12 4.29 193.83 12.34 30.56 10.33
    4/1/1917 9.17 .6 1.45 12.6 4.32 181.82 11.96 28.69 9.64
    5/1/1917 8.86 .61 1.43 12.8 4.34 172.93 11.99 27.83 9.14
    6/1/1917 9.04 .63 1.41 13 4.37 173.73 12.01 27 9.15
    7/1/1917 8.79 .64 1.38 12.8 4.4 171.57 12.41 27.01 9
    8/1/1917 8.53 .65 1.36 13 4.43 163.93 12.43 26.19 8.57
    9/1/1917 8.12 .66 1.34 13.3 4.46 152.53 12.35 25.23 7.95
    10/1/1917 7.68 .67 1.32 13.5 4.49 142.13 12.37 24.47 7.39
    11/1/1917 7.04 .68 1.3 13.5 4.51 130.28 12.57 24.08 6.75
    12/1/1917 6.8 .69 1.28 13.7 4.54 124.01 12.58 23.34 6.41

    How would you make sure that the dates from before 1900, and after 1900 are in one format ex. MMDDYYYY.
    Last edited by Gary Yu; 08 Apr 2021, 21:13.

  • #2
    Welcome to Statalist. Thanks for the data example. For future posts, please familiarize yourself with presenting data examples using the dataex command (see FAQ Advice #12). Here is a way that looks at the beginning of the string and specifies how to create the date variable.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str29 date float(sp500 dividend earnings cpi intrate realprice realdividend realearnings pe10)
    "12/1/1916"   9.8 .56 1.53  11.6 4.21 211.07 12.06 32.95 11.41
    "1/1/1917"   9.57 .57 1.51  11.7 4.23 204.35 12.19 32.22 10.99
    "2/1/1917"   9.03 .58 1.49    12 4.26    188 12.11 30.98 10.06
    "3/1/1917"   9.31 .59 1.47    12 4.29 193.83 12.34 30.56 10.33
    "4/1/1917"   9.17  .6 1.45  12.6 4.32 181.82 11.96 28.69  9.64
    "5/1/1917"   8.86 .61 1.43  12.8 4.34 172.93 11.99 27.83  9.14
    "6/1/1917"   9.04 .63 1.41    13 4.37 173.73 12.01    27  9.15
    "7/1/1917"   8.79 .64 1.38  12.8  4.4 171.57 12.41 27.01     9
    "8/1/1917"   8.53 .65 1.36    13 4.43 163.93 12.43 26.19  8.57
    "9/1/1917"   8.12 .66 1.34  13.3 4.46 152.53 12.35 25.23  7.95
    "10/1/1917"  7.68 .67 1.32  13.5 4.49 142.13 12.37 24.47  7.39
    "11/1/1917"  7.04 .68  1.3  13.5 4.51 130.28 12.57 24.08  6.75
    "12/1/1917"   6.8 .69 1.28  13.7 4.54 124.01 12.58 23.34  6.41
    "1871-02-01"  4.5 .26   .4 12.84 5.32  87.53  5.06  7.78     .
    "1871-03-01" 4.61 .26   .4 13.03 5.33  88.36  4.98  7.67     .
    "1871-04-01" 4.74 .26   .4 12.56 5.33  94.29  5.17  7.96     .
    "1871-05-01" 4.86 .26   .4 12.27 5.33  98.93  5.29  8.14     .
    "1871-06-01" 4.82 .26   .4 12.08 5.34  99.66  5.38  8.27     .
    end
    
    gen Date= cond(ustrregexm(date, "^\d{4}-"), date(date, "YMD"), date(date, "MDY"))
    format Date %td
    Res.:

    Code:
    . l date Date, sep(0)
    
         +------------------------+
         |       date        Date |
         |------------------------|
      1. |  12/1/1916   01dec1916 |
      2. |   1/1/1917   01jan1917 |
      3. |   2/1/1917   01feb1917 |
      4. |   3/1/1917   01mar1917 |
      5. |   4/1/1917   01apr1917 |
      6. |   5/1/1917   01may1917 |
      7. |   6/1/1917   01jun1917 |
      8. |   7/1/1917   01jul1917 |
      9. |   8/1/1917   01aug1917 |
     10. |   9/1/1917   01sep1917 |
     11. |  10/1/1917   01oct1917 |
     12. |  11/1/1917   01nov1917 |
     13. |  12/1/1917   01dec1917 |
     14. | 1871-02-01   01feb1871 |
     15. | 1871-03-01   01mar1871 |
     16. | 1871-04-01   01apr1871 |
     17. | 1871-05-01   01may1871 |
     18. | 1871-06-01   01jun1871 |
         +------------------------+
    Last edited by Andrew Musau; 09 Apr 2021, 03:17.

    Comment


    • #3
      #2 is very helpful, but these all look like monthly dates masquerading as daily dates to me. If so the older dates are YMD not YDM

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str29 date float(sp500 dividend earnings cpi intrate realprice realdividend realearnings pe10)
      "12/1/1916"   9.8 .56 1.53  11.6 4.21 211.07 12.06 32.95 11.41
      "1/1/1917"   9.57 .57 1.51  11.7 4.23 204.35 12.19 32.22 10.99
      "2/1/1917"   9.03 .58 1.49    12 4.26    188 12.11 30.98 10.06
      "3/1/1917"   9.31 .59 1.47    12 4.29 193.83 12.34 30.56 10.33
      "4/1/1917"   9.17  .6 1.45  12.6 4.32 181.82 11.96 28.69  9.64
      "5/1/1917"   8.86 .61 1.43  12.8 4.34 172.93 11.99 27.83  9.14
      "6/1/1917"   9.04 .63 1.41    13 4.37 173.73 12.01    27  9.15
      "7/1/1917"   8.79 .64 1.38  12.8  4.4 171.57 12.41 27.01     9
      "8/1/1917"   8.53 .65 1.36    13 4.43 163.93 12.43 26.19  8.57
      "9/1/1917"   8.12 .66 1.34  13.3 4.46 152.53 12.35 25.23  7.95
      "10/1/1917"  7.68 .67 1.32  13.5 4.49 142.13 12.37 24.47  7.39
      "11/1/1917"  7.04 .68  1.3  13.5 4.51 130.28 12.57 24.08  6.75
      "12/1/1917"   6.8 .69 1.28  13.7 4.54 124.01 12.58 23.34  6.41
      "1871-02-01"  4.5 .26   .4 12.84 5.32  87.53  5.06  7.78     .
      "1871-03-01" 4.61 .26   .4 13.03 5.33  88.36  4.98  7.67     .
      "1871-04-01" 4.74 .26   .4 12.56 5.33  94.29  5.17  7.96     .
      "1871-05-01" 4.86 .26   .4 12.27 5.33  98.93  5.29  8.14     .
      "1871-06-01" 4.82 .26   .4 12.08 5.34  99.66  5.38  8.27     .
      end
      
      gen Date= cond(ustrregexm(date, "^\d{4}-"), daily(date, "YMD"), daily(date, "MDY"))
      replace Date = mofd(Date) 
      format Date %tm 
      sort Date 
      list Date , noobs 
      
        +---------+
        |    Date |
        |---------|
        |  1871m2 |
        |  1871m3 |
        |  1871m4 |
        |  1871m5 |
        |  1871m6 |
        |---------|
        | 1916m12 |
        |  1917m1 |
        |  1917m2 |
        |  1917m3 |
        |  1917m4 |
        |---------|
        |  1917m5 |
        |  1917m6 |
        |  1917m7 |
        |  1917m8 |
        |  1917m9 |
        |---------|
        | 1917m10 |
        | 1917m11 |
        | 1917m12 |
        +---------+

      Comment

      Working...
      X