Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing dates from "dd/mm/yyyy 00:00" to "yyyy mm"

    Dear all,

    I am working with a large dataset of nearly 3 million observations on house prices in the UK.
    The data has date in the following format as an example in excel: "28/01/2020 00:00".
    And when importing to Stata, the format changes to the following format as an example: "28jan2020 00:00".

    But, I need to sort the data into the following format: "2020 m1" or "jan2020" etc.

    I have already tried to use the following code from a previous post
    Code:
    gen monthly_date = mofd(Date)
    format monthly_date %tm
    but when running it, the command "tab monthly_date" gives no observations.

    The link to the post which may be relevant is here.
    https://www.statalist.org/forums/for...o-monthly-data

    If anyone has any suggestions on how to go about this I'd greatly appreciate it.

    I am using Stata 15.1 for Windows 10.

    Many thanks in advance,
    James.
    Last edited by James Clarke; 02 Jun 2021, 05:47.

  • #2
    My guess -- please note the verb, as there is no data example in #1 -- is that Stata converted your Excel date-times to a clock variable, in which case you need more than mofd() to extract a monthly date variable.

    Here is some technique. If it doesn't solve your problem, please read https://www.statalist.org/forums/help#stata and use dataex -- which, as explained there, is especially important with date variables.

    Code:
    . clear
    
    . set obs 1
    Number of observations (_N) was 0, now 1.
    
    . gen double Date = clock("28jan2020 00:00", "DMY hm")
    
    . format Date %tc
    
    . l
    
         +--------------------+
         |               Date |
         |--------------------|
      1. | 28jan2020 00:00:00 |
         +--------------------+
    
    . gen wanted = mofd(dofc(Date))
    
    . format wanted %tm
    
    . l
    
         +-----------------------------+
         |               Date   wanted |
         |-----------------------------|
      1. | 28jan2020 00:00:00   2020m1 |
         +-----------------------------+

    Comment


    • #3
      Dear Nick,

      Thank you for your reply.

      I tried using the code you have suggested and I believe the problem still persists.

      Below is a copy of the first 100 observations using the command dataex for my variable named "dateoftransfer"

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double dateoftransfer
      1.8540576e+12
        1.85112e+12
       1.856736e+12
      1.8565632e+12
      1.8475776e+12
      1.8569952e+12
      1.8549216e+12
       1.856736e+12
      1.8406656e+12
      1.8572544e+12
      1.8494784e+12
      1.8561312e+12
      1.8555264e+12
      1.8564768e+12
      1.8578592e+12
      1.8569952e+12
      1.8555264e+12
      1.8583776e+12
      1.8555264e+12
      1.8585504e+12
      1.8573408e+12
      1.8603648e+12
      1.8564768e+12
      1.8565632e+12
      1.8555264e+12
      1.8560448e+12
       1.856736e+12
      1.8583776e+12
      1.8585504e+12
      1.8561312e+12
      1.8561312e+12
      1.8569952e+12
      1.8555264e+12
      1.8551808e+12
      1.8547488e+12
       1.858896e+12
         1.8576e+12
       1.857168e+12
      1.8577728e+12
      1.8531072e+12
      1.8569952e+12
      1.8533664e+12
      1.8565632e+12
      1.8555264e+12
      1.8565632e+12
      1.8561312e+12
       1.852416e+12
      1.8573408e+12
      1.8573408e+12
      1.8523296e+12
       1.856736e+12
      1.8555264e+12
      1.8578592e+12
      1.8565632e+12
      1.8560448e+12
      1.8569952e+12
       1.858464e+12
      1.8555264e+12
      1.8566496e+12
      1.8566496e+12
      1.8563904e+12
      1.8561312e+12
      1.8565632e+12
      1.8564768e+12
      1.8499104e+12
      1.8561312e+12
       1.856736e+12
      1.8549216e+12
      1.8573408e+12
        1.85544e+12
      1.8494784e+12
      1.8418752e+12
      1.8427392e+12
      1.8573408e+12
      1.8561312e+12
      1.8557856e+12
      1.8561312e+12
      1.8505152e+12
      1.8566496e+12
       1.854144e+12
      1.8569952e+12
         1.8576e+12
      1.8577728e+12
      1.8553536e+12
      1.8565632e+12
      1.8549216e+12
       1.856736e+12
      1.8548352e+12
      1.8505152e+12
      1.8561312e+12
      1.8566496e+12
      1.8565632e+12
      1.8564768e+12
      1.8546624e+12
      1.8566496e+12
      1.8577728e+12
      1.8406656e+12
      1.8561312e+12
      1.8539712e+12
      1.8552672e+12
      end
      format %tcnn/dd/ccYY_hh:MM dateoftransfer
      Apologies I didn't include a data example last time. I hope this gives more clarification to my problem.

      Best regards,
      James.

      Comment


      • #4
        Thanks for the example, which supports my guess. I can't see your grounds for saying that the problem persists, as you don't show any results.


        Taking your first value as an example

        Code:
        . di %tm mofd(dofc(1.8540576e+12))
        2018m10
        it is a date-time corresponding to a monthly date October 2018.

        Comment


        • #5
          Thank you for your reply.

          Using the code you suggested I get this output:

          Code:
          . gen double Date = clock("dateoftransfer", "DMY hm")
          (2,860,540 missing values generated)
          
          . format Date %tc
          
          . gen wanted = mofd(dofc(Date))
          (2,860,540 missing values generated)
          
          . format wanted %tm
          
          . di %tm mofd(dofc(dateoftransfer))
          2018m10
          Yet I still need to generate a variable (say, Date2) such that each observation in dateoftransfer corresponds 1:1 with a value in Date2. Where Date2 gives each observation a unique output in the same format as 2018m10 above.

          The following displays my issue that the observations in dateoftransfer have not been assigned to the new variable.

          Code:
          . tab wanted
          no observations
          
          . tab Date
          no observations
          Again, I appreciate your generous responses.

          Best regards,
          James.

          Comment


          • #6
            Those results are easily explained. First off, dateoftransfer is already a clock variable; there is nothing to convert there.

            Second off, presenting the literal string dateoftransfer in double quotes necessarily results in all missing. It's not date information any more than a literal string "interesting problem" would be. Perhaps you had some logic like this: clock() needs to see a string argument, so I will give it one. But "dateoftransfer" in double quotation marks would not be the way to convert a numeric variable to string, even if you needed to do that; But you don't.

            Third off, as your new variable is all missing, nothing different can be extracted from it.

            All you need is something like the syntax already given in #2.

            Code:
            gen wanted = mofd(dofc(dateoftransfer)) 
            format wanted %tm

            Comment


            • #7
              Dear Nick,

              Thank you so much for your time and responses.
              The code suggested in #6 works perfectly for what I am trying to achieve.
              I will bear in mind these tips you have given in the future.

              Thanks again,
              James.

              Comment

              Working...
              X