from milliseconds to minutes or hours, time series

glenn_magerman

Join Date: Feb 2015

Posts: 20
#1

from milliseconds to minutes or hours, time series

08 Sep 2018, 03:31

I'm trying to get OHLCV exchange rate data from SIF in milliseconds to minutes in Stata for panel analysis.

The structure of the dataset is: exchange, base, quote, fromunix, tounix, open, high, low, close, volume. base and quote together identify the pair traded. fromunix and tounix are the open and close time of the candles in Unix Time Code (UTC) in milliseconds. In the end I create a large panel, but for the purpose at hand, will focus on 1 time series only.

The first step is to convert time stamps from UTC to Stata SIF:

Code:

foreach x in from to { replace `x'unix = round(`x'unix, 60*1000) // round to nearest minute gen double `x' = `x'unix + mdyhms(1,1,1970,0,0,0) // from UTC to SIF in ms format `x' %tc }

If needed, I can use tsfill to impute gaps in the OHLCV data with whatever imputation method is deemed appropriate.

Then, I want to construct new candlestick information at higher levels of aggregation (e.g. hours, days) by collapsing:

Code:

collapse (first) from open (max) high (min) low (last) close, by(_timeframe_)

and then xt-setting the data again at higher levels.

It's easy to deal with going to daily (or weekly/monthly) info by using the datetime commands in Stata. I might be missing something in the Stata documentation on datetime and xtset, but aggregating to time intervals smaller than days seems not directly possible.

My current approach is first generating a new variable:

Code:

gen double time = floor(from/60*60*1000))

and then collapse by that new variable. Since I keep the from variable from before, I can easily use the xtset commands like before:

Code:

xtset pair from, delta(1 hour)

However, using this time variable is quite clumsy, as it does not have to line up with clock hours, and so the beginning and end candles can be less than 1 hour. Is there an easier way to collapse this type of data?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

08 Sep 2018, 11:26

Your post is full of abbreviations and jargon that I don't understand, so I'm not sure I really understand what you want to do here. But it sounds like you might want to look at -help datetime functions-, and in particular the -hours()-, -minutes()-, and -seconds()- functions.
Comment
glenn_magerman

Join Date: Feb 2015

Posts: 20
#3

11 Sep 2018, 09:18

Thank you for the response!
some clarifications on abbreviations:
- OHLCV refers to Open, High, Low, Close, Volume variables that are recorded for a particular time interval in financial analysis. Candles are a way to represent all this OHLC (excluding volume) in a graphical way, but also refer to time interval lengths.
- SIF refers to Stata Internal Form
Regarding the functions hours(), minutes(), and seconds(), these perform the same function as ms/3,600,000 (as in the generated variable time above).

I would like to have a time interval in tsset, that is referring to a clock hour (e.g. 09jan1960, 23:00) and that is "smooth" in the sense that Stata time series functions see sequential hours as lag/current/lead (i.e. using L., F., D. functions etc).
This is partially working and I can use 1 hour time intervals to use the time series functions.
However, it is possible that the defined hour using the method above does not necessarily line up with a clock hour. E.g. in my data, some hours start at 09jan1960, 23:30) instead of 09jan1960, 23:00. Time stamps easily "snap to grid" using days, weeks, months etc. using dofc(), dofw() etc, but I could not find similar functions for within-day time divisions.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29956

11 Sep 2018, 10:58

So maybe you mean something like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double dttm
1.7799825e+12
1750676640000
1760969820000
1782297780000
1782297840000
1782651420000
1.7621703e+12
1783272480000
1783272720000
1783272960000
1781256720000
1781257080000
1781271420000
1781276940000
1765389660000
1765389780000
1799342820000
1799342940000
1767622620000
1803894180000
1803894540000
1.7790384e+12
1779041640000
1764013980000
1.7675484e+12
1769974680000
1776178020000
1756314840000
1756315440000
1765976580000
1.7944017e+12
end
format %tc dttm

gen double hour_snap_dttm = msofhours(1)*floor(dttm/msofhours(1))
format hour_snap_dttm %tc
list, noobs clean

There are also a -msofminutes()- and a msofseconds()- function that can be used here as suggestive notation instead of "magic numbers" like 3,600,000.

But you are right, there are no functions directly analogous to -dofc()-, etc. targeting hours, minutes, or seconds.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35433
#5

11 Sep 2018, 10:59

I haven't read this all, but my eye fell on a a bug in #1. It may not bite you but the code isn't going in a very helpful direction.

To illustrate, let's focus on a time 12 hours after Stata's time origin. That's 43.2 million ms after the origin as a display with appropriate format shows us. To show we understand what Stata's doing, we can check by converting this time to hours and then minutes after the origin.

Code:

. scalar from = clock("1jan1960 12:00:00", "DMY hms") . di %tc from 01jan1960 12:00:00 . di %20.0f from 43200000 . di %20.0f from/(60*60*1000) 12 . di %20.0f from/(60*1000) 720

In contrast consider the code in #1: dividing by 60 and then multiplying by 60 (those two operations cancel each other) and then multiplying by 1000 gives you a date-time number 1000 times larger than Stata's date-time. It's the number of microseconds since the time origin.

Code:

. di %20.0f from/60*60*1000 43200000000

Otherwise put, parentheses are needed to override Stata's precedence rules (which are just standard rules for arithmetic).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#6

11 Sep 2018, 11:07

Nice point, Nick. I didn't see that. That error is another reason to use the built-in -msofseconds(), msofminutes(), msofhours()- functions instead of writing out the factors.
Comment
glenn_magerman

Join Date: Feb 2015

Posts: 20
#7

11 Sep 2018, 11:56

Thanks both,
very much appreciated.
On Clyde's suggestion: this is perfect, thank you. And glad to get confirmed no counterparts for dofc() on smaller time frames. Any idea why this might be missing in Stata?
To Nick: thanks for pointing that out, very sharp eye as always! It was actually a typo in this post, (there are 2 brackets closing, I forgot to add the first open bracket after the division sign). In my original code, that typo is not there.

Last edited by glenn_magerman; 11 Sep 2018, 12:50.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#8

11 Sep 2018, 12:43

I don't know. I would surmise that there isn't much demand for it, but only a StataCorp developer could really tell you why.
Comment

Announcement

from milliseconds to minutes or hours, time series

Comment

Comment

Comment

Comment

Comment

Comment

Comment