Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Substring Command

    Hi,

    I have a dataset with 2 million observations that has 9000 dates string coded as 20010000 (year,month,date), and I would like to change the dates to 2001 07 01; and some are string coded 20050600 and I would like to change those dates to 2005 06 15.

    I have been trying to extract the years separately from the month and day etc using :

    gen year = substr (date ,1,2,3,4)
    with the intent to use the replace comand and then join the year, month and date but I keep getting the r(198) message

    I am using Stata 13.1

    What is the correct way to do this?
    Thanks in advance

  • #2
    The substr() function(!) takes exactly three arguments, so substr (date ,1,2,3,4) is invalid syntax. You probably wanted substr (date,1, 4).

    I am pretty sure there are cleverer solutions, but this should work for you

    Code:
    g fixed_date = substr(date, 1, 4) + ///
    cond(substr(date, 5, 2) != "00", substr(date, 5, 2), "07") + ///
    cond(substr(date, 7, 2) != "00", substr(date, 7, 2), ///
    cond(substr(date, 5, 2) != "00", "15", "01"))

    Best
    Daniel

    Comment


    • #3
      I think you would benefit from creating a proper date variable (numeric value: number of days since 01jan1960). You can format it in several useful ways, and you can use it for calculations. Much of the following does the same as Daniel's example, but in a less compact, and, I think, more transparent style:

      Code:
      clear
      input str8 olddate
      "20010000"
      "20050600"
      "20140819"
      end
      
      gen year = real(substr(olddate,1,4))
      gen month = real(substr(olddate,5,2))
      gen day = real(substr(olddate,7,2))
      replace day = 1 if month==0
      replace month = 7 if month==0
      replace day = 15 if day==0
      gen newdate = mdy(month,day,year)
      format newdate %tdCCYY.NN.DD
        . list, clean
              olddate   year   month   day      newdate 
        1.   20010000   2001       7     1   2001.07.01 
        2.   20050600   2005       6    15   2005.06.15 
        3.   20140819   2014       8    19   2014.08.19

      Comment


      • #4
        Thank you very much to both of you!

        Comment

        Working...
        X