Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to easily create categories from String Date Variable (Month/ Day/ Year/ Time)?

    I have a string variable for time that has the timestamp. For example, one entry would look like: "4/25/2022 17:26". I have over 1,000 observations ranging from over 2 years ago to now. I want to create one category per month since the start of my data (25 categories). I need to categorize the dates (like time period A, time period B., time period C..) so I thought I would first make the string variable a continuous/ numeric variable so that I can do an if...then statement, creating a new, categorical variable, where I can efficiently say if a date is within a certain range it would go to the new categorical variable for the time period.

    I have used the clock() function but don't know how to categorize it after that.

    I was also trying to use the "strpos" function but was not successful.

    I also know I might be planning this all wrong, any suggestions?

  • #2
    I'm not sure I understand what you want to do. But you have already made a start in the right direction by creating a clock variable from the string variable. You don't show example data or code, so I don't know what you call that clock variable. I'll assume it's called timestamp. If you run -gen mdate = mofd(dofc(timestamp))- and -format mdate %tm- the new variable mdate will now give the month to which each timestamp belongs. That sounds like it might be what you want. Or if not, perhaps you can take it from there.

    If you need more specific advice, please use the -dataex- command to show example data when posting back, and also show the exact code you have tried so far. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      An essentially similar question was posted and answered at https://stackoverflow.com/questions/...month-day-year

      Cross-posting can waste people's time if not flagged. We ask that you tell us about cross-posting. See https://www.statalist.org/forums/help#crossposting

      For example, the reason why strpos() does not work with a numeric variable has already been explained on Stack Overflow (SO),

      Your original post on SO didn't explain what categorical variable you wanted. That has been added in an update.

      The monthly date can be extracted from the first "word" of your time stamp, ignoring the time of day. Here is an example. Naturally you may need or prefer different variable names.


      Code:
      clear
      set obs 1 
      gen timestamp = "4/25/2022 17:26"
      
      gen wanted = mofd(daily(word(timestamp, 1), "MDY")) 
      format wanted %tm 
      
      list 
      
          +--------------------------+
           |       timestamp   wanted |
           |--------------------------|
        1. | 4/25/2022 17:26   2022m4 |
           +--------------------------+
      It's terminology but I wouldn't call monthly date a categorical variable: it's an integer count with an arbitrary base.

      If this isn't what you want, please back up, read the FAQ Advice, especially https://www.statalist.org/forums/help#stata and give enough information to make clear what you do want.

      Comment


      • #4
        Code:
        . // well that is a weird number
        . di clock("4/25/2022 17:26",  "MD19Y hm" )
        1.967e+12
        
        .
        . // we can display it as a date and time, and now it works better
        . di %tc  clock("4/25/2022 17:26",  "MD19Y hm" )
        25apr2022 17:26:00
        
        .
        . // we can get just the date part
        . di %td dofc(clock("4/25/2022 17:26",  "MD19Y hm" ))
        25apr2022
        
        .
        . // now we can get the year, month, and day
        . di  year(dofc(clock("4/25/2022 17:26",  "MD19Y hm" )))
        2022
        
        . di  month(dofc(clock("4/25/2022 17:26",  "MD19Y hm" )))
        4
        
        . di  day(dofc(clock("4/25/2022 17:26",  "MD19Y hm" )))
        25
        
        .
        . // so combine all that to create our desired variable
        .
        . // some example data
        . clear
        
        . input str15 strdate
        
                     strdate
          1. "4/15/2022 17:26"
          2. "4/16/2022 17:26"
          3. "4/17/2022 17:26"
          4. "5/15/2022 17:26"
          5. "5/16/2022 17:26"
          6. "5/17/2022 17:26"
          7. "6/15/2022 17:26"
          8. end
        
        .
        . // create the stata variables
        . // notice the double for the date-time variable
        . // these numbers get really large, so you need the extra precision
        . gen double st_date_time = clock(strdate, "MD19Y hm")
        
        . gen st_date = dofc(st_date_time)
        
        . gen month = month(st_date)
        
        .
        . // admire the result
        . format st_date_time %tc
        
        . format st_date %td
        
        . list
        
             +----------------------------------------------------------+
             |         strdate         st_date_time     st_date   month |
             |----------------------------------------------------------|
          1. | 4/15/2022 17:26   15apr2022 17:26:00   15apr2022       4 |
          2. | 4/16/2022 17:26   16apr2022 17:26:00   16apr2022       4 |
          3. | 4/17/2022 17:26   17apr2022 17:26:00   17apr2022       4 |
          4. | 5/15/2022 17:26   15may2022 17:26:00   15may2022       5 |
          5. | 5/16/2022 17:26   16may2022 17:26:00   16may2022       5 |
             |----------------------------------------------------------|
          6. | 5/17/2022 17:26   17may2022 17:26:00   17may2022       5 |
          7. | 6/15/2022 17:26   15jun2022 17:26:00   15jun2022       6 |
             +----------------------------------------------------------+
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Thank you so much! The format made a huge difference. I was looking at other examples that used a different format and it wasn't working but with the -format mdate %tm- I now have them all categorized by year and month which is what I wanted!

          Comment


          • #6
            This response makes me nervous. Remember that the format is only cosmetic. Formating a variable only changes the way it is displayed on screen, it does not change the actual values.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              I think "it wasn't working" in #5 just means "I did not understand what it did". If you had a monthly date that was say 747, then most people need a monthly date display format to understand that means April 2022.

              Comment

              Working...
              X