Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time variable: how to change from string to numeric

    Dear stata community,

    I am using STATA 18.

    I have 20 time variables in HH:MM: SS format with no date. Except for 6, others are strings.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 wakeup str11(start end) str8(starttime1 stoptime1 starttime2 stoptime2)
    ""        "11:15:19 AM" "1:03:12 PM" ""         ""         ""        ""        
    "7:00:00" "2:29:37 PM"  "3:39:30 PM" "7:00:00"  "21:00:00" ""        ""        
    "7:05:00" "4:55:34 PM"  "7:37:57 PM" "10:00:00" "18:05:00" ""        ""        
    "6:00:00" "2:41:35 PM"  "9:50:39 PM" "6:45:00"  "9:30:00"  "9:00:00" "10:00:00"
    ""        "11:49:16 AM" "1:49:32 PM" ""         ""         ""        ""        
    ""        "12:55:35 PM" "1:12:24 PM" ""         ""         ""        ""        
    ""        "5:08:28 PM"  "5:17:44 PM" ""         ""         ""        ""        
    ""        "2:59:45 PM"  "3:27:39 PM" ""         ""         ""        ""        
    "8:00:00" "1:53:49 PM"  "5:32:27 PM" "9:00:00"  "17:00:00" ""        ""        
    end

    I want to calculate the duration by taking the difference between stop and start times. So, I need to format those variables as time/numeric to take the time duration. However, due to the type mismatch, I cannot do that. I have tried the clock function, but the result produces a time format as YMDhms, which is wrong because I have only hms.

    *****The time command I am using-
    generate double starttime_1=clock(starttime1,"hms") \\ Gives wrong milliseconds
    format starttime_1 %tc

    ***My goal is the following-

    forvalues x = 1/16 {
    gen duration_`x'= (stoptime`x' - starttime`x')/60000
    }

    Please guide me to make the string variables as time/numeric variables?

    Thanks.

  • #2
    However, due to the type mismatch, I cannot do that.
    There is no type mismatch. Your starttime* and stoptime* variables are all proper string representations of times that the -clock()- function can convert to Stata internal format datetime variables.

    I have tried the clock function, but the result produces a time format as YMDhms
    No, the -clock()- function doesn't produce any format at all: it just produces a number. For example, the starttime1 value "7:00:00" will produce starttime_1 = 25200000. The YMDhms format you refer to comes from your -format- statement. You have just used the default %tc format, so you get YMDhms as a result. But that's just not the right choice for what you want to see. You should use, instead, -format starttime_1 %tcHH:MM:SS.sss-. Then you will see it displayed as hours, minutes, seconds, and milliseconds.

    I do not understand your comment "Gives wrong milliseconds." None of the example times you show has any milliseconds, and so the milliseconds part will always be 0. Maybe in your full data set there are some times that have milliseconds included. I am quite sure that Stata will get those right as long as you remember to create the variable as a double.

    By the way, your loop will calculate the durations in minutes. Are you sure that's what you want? The durations from starttime1 to stoptime1 are several hours long--denominating them in minutes is going to lead to some large, unfamiliar numbers. Up to you.

    Assuming you are using the current Stata version (18), instead of that calculation, you can do:
    Code:
    gen duration_`x' = clockdiff(starttime_`x', stoptime_`x', "min")
    which you are less likely to mistype omitting a zero or adding an extra zero. And if you do change your mind about the units, you can change this code just be replacing "min" with "hour", or whatever. See -help clockdiff()-.

    Comment


    • #3
      Hi Clyde,

      Many thanks for your quick response and clarification on my problem. So, there are two steps to solving this issue, right? 1. Change the string variables to time by using the—clock ()—function and format as %tcHH:MM:SS. 2. Get the duration by using the gen command with the -clockdiff()- function.

      If correct, I have another follow-up question on step 1 about creating a loop. I am using the command below to change the string variables but keep getting the message"1 invalid name". Is it because there are 4 'byte' variables? I tried without those byte variables but still got the same message.

      forvalues x = 1/16 {
      gen double starttime_ `x'=clock(starttime `x',"hms")
      format starttime_ `x' %tcHH:MM:SS
      }

      Thanks for reminding me about the min and hour issue. Yes, I should use hours.

      I appreciate your help. Thanks.

      Comment


      • #4
        If correct, I have another follow-up question on step 1 about creating a loop. I am using the command below to change the string variables but keep getting the message"1 invalid name".
        This is because you have a blank space between starttime or starttime_ and `x' in these commands. Get rid of all those blank spaces and everything will be fine.
        Code:
        forvalues x = 1/16 {
            gen double starttime_`x'=clock(starttime`x',"hms")
            format starttime_`x' %tcHH:MM:SS
        }

        Comment


        • #5
          It worked. You saved me today! I am very grateful for pointing out this error. Thank you!

          Comment


          • #6
            Hi Clyde,

            I am sorry to bother you again, but I have two time variables with a different format than HH:MM:SS. For example, one observation has 17:03:00.000+03:00, so the previous command didn't work for it. Could you please help separate the timezone (+03:00) from the time so I can take the difference between the stop and start time? Thank you.


            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str18(starttime16 stoptime16)
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            "06:30:00.000+03:00" "07:00:00.000+03:00"
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            ""                   ""                  
            end

            Comment


            • #7
              This data was collected from Kenya (GMT+3) if this helps.

              Comment


              • #8
                Code:
                foreach x in starttime stoptime {
                    split `x'16, parse("+") gen(part)
                    gen double `x'_16 = clock(part1, "hms")
                    format `x'_16 %tcHH:MM:SS.sss
                    drop part1
                    gen double `x'_gmt_16 = clock(part2, "hm")
                    format `x'_gmt_16 %tcHH:MM
                    rename part2 start_gmt`x'
                }
                
                gen double time_diff_16 = (clockdiff_frac(starttime_16 + starttime_gmt_16, ///
                    stoptime_16 + stoptime_gmt_16, "hour"))
                By the way, notice I used clockdiff_frac(), not clockdiff() to calculate the time difference here. In fact, I should have done that with all the others as well. For minutes it doesn't really matter which you use because these times are all whole numbers of minutes. But in the current example, the actual time difference is a half hour. -clockdiff()- will report that simply as 0 hours because it truncates to the next lowest integer value. clockdiff_frac() does not drop parts of time units but reports it as a full real number.

                Comment


                • #9
                  Clyde Schechter

                  Many thanks for your quick answer. I could convert time 16 to HH:MM:SS format. However, I have two concerns.

                  1. The data is from 2023. Why do I get 1960?
                  2. There are some time differences or durations that seem wrong. For example, isn't the difference between 21:32:00 (start time) and 05:32:00(stop time) 8 hours instead of -16 hours? However, the difference is calculated as 5.32 AM (start time) and 9:32 PM(stop time). I suspect other time variables also have this problem because I see many values with a negative sign.

                  After running the command you gave me, I copied and pasted these variables from the data editor.

                  starttime_16 starttime_gmt_16 start_gmtstoptime stoptime_16 stoptime_gmt_16 duration_16
                  01jan1960 21:32:00 01jan1960 03:00:00 03:00 01jan1960 05:32:00 01jan1960 03:00:00 -16




                  Thanks again for your time and help.

                  Comment


                  • #10
                    Originally posted by Nabittun Nahar View Post
                    I have two concerns.
                    Both of these arise because you have only time information in the string variables that you've been showing all along. You need the date information, too, if you want to accomplish these two objectives.

                    Do you have the date information in other variables in your dataset or elsewhere that correspond to each of these time variables? If not, then your simply outa luck.

                    Comment


                    • #11
                      Originally posted by Joseph Coveney View Post
                      your simply outa luck.
                      you're

                      Comment


                      • #12
                        Thanks so much, Joseph. Unfortunately, I do not have the date information.

                        Comment


                        • #13
                          Hi Joseph,
                          We have an idea to get the correct time difference by adding 12 hours to the stoptime when the stoptime is the next day. Please correct me if I am wrong.

                          The command is below-


                          /* Adding 12 hours (12*60*60000 ms) to get the correct time difference*/
                          forvalues x = 1/11 {
                          gen double SStoptime_`x'=stoptime_`x'+12*60*60000 if stoptime_`x'<starttime_`x'
                          }

                          Thank you.

                          Comment

                          Working...
                          X