Time series and error code 451

Leslie Edwards

Join Date: Feb 2022
Posts: 23

Time series and error code 451

11 Feb 2022, 13:33

Hello - I'm trying to declare my data as time series for 30 study participants with data recordings every 15 seconds to 2 minutes but continually get the error 451 code. .Thank you in advance for your advice on this.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float IDnum double(Session dateT dateT2) float(doy hr min)
1 3      1898207739000      1898207739000 55 23 55
1 3 1898207708999.9998 1898207708999.9998 55 23 55
1 3 1898207799999.9998 1898207799999.9998 55 23 56
1 3 1898207770000.0002 1898207770000.0002 55 23 56
1 3 1898207829999.9998 1898207829999.9998 55 23 57
1 3 1898207861000.0002 1898207861000.0002 55 23 57
1 3      1898208288000      1898208288000 56  0  4
1 3 1898208256999.9998 1898208256999.9998 56  0  4
1 3 1898208622999.9998 1898208622999.9998 56  0 10
end
format %tc dateT
format %tC dateT2

The error code r(451) appears after entering the first 3 tsset commands and this error warning appeared as well after the third tsset command "warning: Variable dateT2 had been formatted %tC (a period), and you asked for a clocktime period. Are you sure that is what you want? Format has been changed. dateT2 is now formatted %tc:
following code
i

Code:

tsset ID_Sess dateT, clocktime delta(1 seconds)

Code:

tsset ID_Sess dateT2, clocktime

Code:

format %tC dateT2

tsset ID_Sess dateT2, clocktime

Tags: None

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

11 Feb 2022, 14:53

I'm do not know how you created your dateT and dateT2 variables, but I think you got them wrong. Clock times are supposed to be an integer number of milliseconds since 1jan1960 midnight, but your example data shows fractional portions of milliseconds. This is not a good thing.

If you

Code:

replace dateT = round(dateT)

you will make the problem vanish, but without knowing more about the source of your data, it's difficult to know if the results will in fact be correct.
Comment

Leslie Edwards

Join Date: Feb 2022
Posts: 23

11 Feb 2022, 15:38

William thank you for your advice. This is panel data and I am not sure why the dateT time display looked the way that it did, dateT and dateT2 are the same variable with DMY hms structure. Both are double type with format %tc. I've pasted the table below.

ID entered Date Session AFreq TimeActivity dateT PM25

Kat 001 Yes 04oct2019 1 30 seconds Car 04oct2019 17:02:33 19

Kat 001 Yes 04oct2019 1 30 seconds Car 04oct2019 17:04:34 21
Kat 001 Yes 04oct2019 1 30 seconds Car 04oct2019 17:06:35 21
Kat 001 Yes 04oct2019 1 30 seconds Car 04oct2019 17:08:35 19
Kat 001 Yes 04oct2019 1 30 seconds Car 04oct2019 17:10:35 20
Kat 001 Yes 04oct2019 1 30 seconds Car 04oct2019 17:12:36 21
Kat 001 Yes 04oct2019 1 30 seconds Car 04oct2019 17:14:37 19

When I use the code that you shared no changes were made.

Code:

     replace dateT2 = round(dateT2)
(0 real changes made)

And when trying the tsset command again this is the message

Code:

     tsset ID_Sess dateT2, clocktime
repeated time values within panel
r(451);

Also ID_Sess is a variable represents ID number and Session number and it is expected that the same time (10:01:30 on Jan 2, 2020) may repeat for different ID_Sess but not within the same ID_Sess. This is air pollution sampling data with a sample collected every 1 minute or every 15 seconds. Thanks so much!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

11 Feb 2022, 17:15

"repeated time values within panel" means exactly what it says. In order to -tsset panelvar timevar- it must be the case that there is only one observation at any given time for a given panel. Now, you say that

it is expected that the same time (10:01:30 on Jan 2, 2020) may repeat for different ID_Sess but not within the same ID_Sess.

The error message says, however, that it has found some value(s) of ID_Sess where the same time repeats. I have never known Stata to be wrong about this kind of thing. The inescapable conclusion is that your data are not what you think they are. That is, in fact, very common. Even data sets from highly-reputed curators often contain errors.

So the first step is to find those unexpected observations:

Code:

duplicates tag ID_Sess dateT2, gen(flag) browse if flag

Then you have to figure out what to do about them. You have already asserted that there should not be any such observations, so clearly your data is erroneous and needs to be fixed. If the observations that duplicate ID_Sess and dateT2 also agree on all other variables, then you can eliminate them easily with just -duplicates drop-. But don't be too quick to do that. Why are those surplus observations there? How did they get there? It suggests some failure in the data management that created your data set. If you, or someone in your shop, created it, then you need to review the data management process to find out where those crept in, and find the mistake that led to them. While doing that, look for other mistakes: where one mistake is found, others may lurk nearby still unseen. Better to find and fix them now than get tripped up by them later, or, worse, get wrong results and not even realize you have a problem until someone uses the results for something it blows up! Once you have completed your code review, regenerate the data set, verify that there are no longer surplus observations, and proceed. If the data set was provided by somebody else, you might just go ahead and drop the duplicates, but you should bring the issue to their attention and ask them to look into it.

There is the possibility, as well, that when you look at these deviant observations they will not agree on the other variables. In that case, you will need to figure out how to select which observation to keep and which ones to remove (or devise a plan to suitably combine the conflicting data into a resolved single variable.)
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

12 Feb 2022, 08:13

Let me point out the importance of giving the actual error message (as was done in post #3) rather than just the error code (as was done in post #1). I use

Code:

help rc 451

to find out what return code 451 signifies, and am given the default error message

Code:

------------------------------------------------------------------------------- search for rc 451 (manual: [R] search) ------------------------------------------------------------------------------- Search of official help files, FAQs, Examples, and Stata Journals [P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 451 invalid values for time variable For instance, you specified mytime as a time variable, and mytime contains noninteger values.

which in fact is not what is shown in post #3, which narrows down the problem substantially more to being a duplicated time value.

To Clyde's explanation in post #3, let me add that we have no idea how you constructed ID_Sess from ID's like "Kat 001" and Session like "1". Unless you used something like

Code:

egen ID_Sess = group(ID Sess)

it is possible that in fact distinct combinations of ID and Session yielded the same value of ID_Sess, so that what you have are not repeated times within some Session for an ID, but rather duplicated ID_Sess values for two different IDs and/or Sessions - which then leads to duplicated time values.
Comment
Leslie Edwards

Join Date: Feb 2022

Posts: 23
#6

15 Feb 2022, 11:48

Thanks so much to everyone for the advice in dealing with time series declaration or tsset. This is code that I used to solve the problem. I used the sting variable date T (format DMY hms) to generate the new variable named second and used second as the time variable in the tsset.

Code:

generate second= (dateT-tc(16sep2019 00:00:00)) sort dateT recast double second format %20.0g second tsset ID_Sess second
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

15 Feb 2022, 12:10

That code contains an error. Once you have created second as a float variable, losing some precision, it is not possible to regain that precision by recasting it to double. The code should have been

Code:

generate double second= (dateT-tc(16sep2019 00:00:00))
sort dateT
format %20.0g second
tsset ID_Sess second

Here is an example that demonstrates the problem.

Code:

. set obs 1
Number of observations (_N) was 0, now 1.

. generate double dateT = tc(04oct2019 17:02:33)

. format %tc dateT

.
. generate second= (dateT-tc(16sep2019 00:00:00))

. generate float fsecond= (dateT-tc(16sep2019 00:00:00))

. generate double dsecond= (dateT-tc(16sep2019 00:00:00))

. format %20.0g second fsecond dsecond

. list

     +-----------------------------------------------------------+
     |              dateT       second      fsecond      dsecond |
     |-----------------------------------------------------------|
  1. | 04oct2019 17:02:33   1616552960   1616552960   1616553000 |
     +-----------------------------------------------------------+

. recast double second

. list

     +-----------------------------------------------------------+
     |              dateT       second      fsecond      dsecond |
     |-----------------------------------------------------------|
  1. | 04oct2019 17:02:33   1616552960   1616552960   1616553000 |
     +-----------------------------------------------------------+

.

As you see, recast double cannot magically restore the 40 milliseconds that were lost.

If I were doing this, I'd divide the correct calculation of second by 1000 to give actual seconds rather than the milliseconds that Stata stores for clock times.

Announcement

Time series and error code 451

Comment

Comment

Comment

Comment

Comment

Comment