time variable for hours

Stephen Ch

Join Date: Apr 2022

Posts: 67
#1

time variable for hours

13 Feb 2024, 15:45

Hi all,

I have a time variable that is in strings and looks like:

time
"2023-12-29 08"
"2023-12-29 11"
"2023-12-29 23"

which corresponds to year-month-day hour, and this is a panel data.

I would like to declare -xtset id time-, but I do not know how to properly turn the time (string) variable to a datetime variable.

Any suggestions?

Thanks!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#2

13 Feb 2024, 15:59

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str15 time `""2023-12-29 08""' `""2023-12-29 11""' `""2023-12-29 23""' end gen double wanted = clock(time, "YMDh") assert missing(time) == missing(wanted) format wanted %tcCCYYMonDD_HH:MM:SS

In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#3

14 Feb 2024, 09:38

Hi Clyde,

Thank you for the suggested solution! In fact, dataex is already part in my case since I use Stata 17.

The code does work for conversion.

However, I face another issue when declaring as a panel data set:

Code:

xtset id wanted repeated time values within panel r(451);

I do understand why this is the case, because for each ID, I have unique timestamps by hours. So, on the same day, the hours differ, so wouldn't this not cause the repeated time values within panel?

I am a bit confused.

I was thinking of doing this, but wouldn't the following command make me loose any subsequent hour observations of a given day?

Code:

duplicates drop id wanted, force

Any advice would be greatly appreciated!

Originally posted by Clyde Schechter View Post

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str15 time `""2023-12-29 08""' `""2023-12-29 11""' `""2023-12-29 23""' end gen double wanted = clock(time, "YMDh") assert missing(time) == missing(wanted) format wanted %tcCCYYMonDD_HH:MM:SS

In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#4

14 Feb 2024, 11:25

I do understand why this is the case, because for each ID, I have unique timestamps by hours. So, on the same day, the hours differ, so wouldn't this not cause the repeated time values within panel?

I'm having difficulty understanding these sentences. If you have distinct timestamps by hours, the variable wanted will also be distinct, and you should not encounter this error. Since you did get this error, the conclusion is that you do not in fact have all distinct timestamps.

I was thinking of doing [duplicates drop id wanted, force], but wouldn't the following command make me loose any subsequent hour observations of a given day?

No, if you did this (which you should not, see below) it would go through all the observations having the same value of id and wanted, and then delete all but one of them. It would not affect later timestamps (unless they, too, appear in duplicate.)

You may believe sincerely, even strongly, that you do not have any duplicate values of id and wanted. But I have been on Statalist for just under 30 years now, and this kind of question comes up pretty much every day. And in all that time, I never once seen a situation where it turned out that Stata was wrong in saying that you do have duplicate values.

Using -duplicates drop id wanted, force- is a bad idea, because you don't know what you're going to be dropping. The fact that you have these duplicates that you don't think should be there means that either there is something wrong with your data or there is something wrong with your understanding of your data. Either way,you have to straighten out that underlying problem before you start blindly discarding part of your data. (Brief digression: use of -force- options is, in general, unwise unless you have verified that you know exactly what will happen and you know that what will happen is correct for your purposes.)

The first thing you should do is find the offending observations and inspect them:

Code:

duplicates tag id wanted, gen(flag) sort id wanted browse if flag

Then inspect what Stata shows you. There are two different possibilities here.
It may be that all of the observations in every id wanted group are exact duplicates on all variables. In that case, you could run -duplicates drop-, and you will be rid of them with no loss of information. However, I don't recommend doing that right away either. You have the expectation that these duplicates shouldn't exist. The fact that they do suggests that there was an error in the data management leading up to this data set. So rather than just deleting the unwanted surplus observations, you really should have a review of the code that created this data, and find and fix the error(s) that caused these extra observations to creep into the data set. They might be in the data management code, or they might be problems in the source data that you were given as a starting point. And, by the way, in the process of doing that, don't be shocked if you find other errors as well. Better to fix them all now.

You may find that in some or all of the id wanted groups, the observations, although agreeing on id and wanted, disagree on the values of other variables. This is a more serious problem because now you have contradictory data. And you can't fix that with a simple tool like -duplicates drop- because first you have to find out which of the observations, if any, is actually the correct one. Here you need a really thorough investigation into the genesis of this data to root out the errors. Since there are so many possibilities, I can't even begin to enumerate them here.
1 like
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#5

14 Feb 2024, 11:59

Hi Clyde,

This is extremely helpful, and I appreciate your insight.

I did perform the flag-process before doing anything, and the duplicates were only a handful associated with the data-collection process. So, it was safe to use the -duplicates drop id wanted, force- in this case.

But, I do agree with you anything with -force- and dropping must be carried out with extreme caution.

Thanks!
Comment

Announcement

time variable for hours

Comment

Comment

Comment

Comment