Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • from milliseconds to minutes or hours, time series

    I'm trying to get OHLCV exchange rate data from SIF in milliseconds to minutes in Stata for panel analysis.

    The structure of the dataset is: exchange, base, quote, fromunix, tounix, open, high, low, close, volume. base and quote together identify the pair traded. fromunix and tounix are the open and close time of the candles in Unix Time Code (UTC) in milliseconds. In the end I create a large panel, but for the purpose at hand, will focus on 1 time series only.

    The first step is to convert time stamps from UTC to Stata SIF:
    Code:
     
     foreach x in from to {       replace `x'unix = round(`x'unix, 60*1000)           // round to nearest minute     gen double `x' = `x'unix + mdyhms(1,1,1970,0,0,0)   // from UTC to SIF in ms     format `x' %tc }
    If needed, I can use tsfill to impute gaps in the OHLCV data with whatever imputation method is deemed appropriate.

    Then, I want to construct new candlestick information at higher levels of aggregation (e.g. hours, days) by collapsing:
    Code:
     
     collapse (first) from open (max) high (min) low (last) close, by(_timeframe_)
    and then xt-setting the data again at higher levels.

    It's easy to deal with going to daily (or weekly/monthly) info by using the datetime commands in Stata. I might be missing something in the Stata documentation on datetime and xtset, but aggregating to time intervals smaller than days seems not directly possible.

    My current approach is first generating a new variable:
    Code:
     
     gen double time = floor(from/60*60*1000))
    and then collapse by that new variable. Since I keep the from variable from before, I can easily use the xtset commands like before:
    Code:
     
     xtset pair from, delta(1 hour)
    However, using this time variable is quite clumsy, as it does not have to line up with clock hours, and so the beginning and end candles can be less than 1 hour. Is there an easier way to collapse this type of data?

  • #2
    Your post is full of abbreviations and jargon that I don't understand, so I'm not sure I really understand what you want to do here. But it sounds like you might want to look at -help datetime functions-, and in particular the -hours()-, -minutes()-, and -seconds()- functions.

    Comment


    • #3
      Thank you for the response!
      some clarifications on abbreviations:
      - OHLCV refers to Open, High, Low, Close, Volume variables that are recorded for a particular time interval in financial analysis. Candles are a way to represent all this OHLC (excluding volume) in a graphical way, but also refer to time interval lengths.
      - SIF refers to Stata Internal Form
      Regarding the functions hours(), minutes(), and seconds(), these perform the same function as ms/3,600,000 (as in the generated variable time above).

      I would like to have a time interval in tsset, that is referring to a clock hour (e.g. 09jan1960, 23:00) and that is "smooth" in the sense that Stata time series functions see sequential hours as lag/current/lead (i.e. using L., F., D. functions etc).
      This is partially working and I can use 1 hour time intervals to use the time series functions.
      However, it is possible that the defined hour using the method above does not necessarily line up with a clock hour. E.g. in my data, some hours start at 09jan1960, 23:30) instead of 09jan1960, 23:00. Time stamps easily "snap to grid" using days, weeks, months etc. using dofc(), dofw() etc, but I could not find similar functions for within-day time divisions.

      Comment


      • #4
        So maybe you mean something like this:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double dttm
        1.7799825e+12
        1750676640000
        1760969820000
        1782297780000
        1782297840000
        1782651420000
        1.7621703e+12
        1783272480000
        1783272720000
        1783272960000
        1781256720000
        1781257080000
        1781271420000
        1781276940000
        1765389660000
        1765389780000
        1799342820000
        1799342940000
        1767622620000
        1803894180000
        1803894540000
        1.7790384e+12
        1779041640000
        1764013980000
        1.7675484e+12
        1769974680000
        1776178020000
        1756314840000
        1756315440000
        1765976580000
        1.7944017e+12
        end
        format %tc dttm
        
        gen double hour_snap_dttm = msofhours(1)*floor(dttm/msofhours(1))
        format hour_snap_dttm %tc
        list, noobs clean
        There are also a -msofminutes()- and a msofseconds()- function that can be used here as suggestive notation instead of "magic numbers" like 3,600,000.

        But you are right, there are no functions directly analogous to -dofc()-, etc. targeting hours, minutes, or seconds.

        Comment


        • #5
          I haven't read this all, but my eye fell on a a bug in #1. It may not bite you but the code isn't going in a very helpful direction.

          To illustrate, let's focus on a time 12 hours after Stata's time origin. That's 43.2 million ms after the origin as a display with appropriate format shows us. To show we understand what Stata's doing, we can check by converting this time to hours and then minutes after the origin.

          Code:
          . scalar from = clock("1jan1960 12:00:00", "DMY hms")
          
          . di %tc from
          01jan1960 12:00:00
          
          . di %20.0f from
                      43200000
          
          . di %20.0f from/(60*60*1000)
                            12
          
          . di %20.0f from/(60*1000)
                           720
          In contrast consider the code in #1: dividing by 60 and then multiplying by 60 (those two operations cancel each other) and then multiplying by 1000 gives you a date-time number 1000 times larger than Stata's date-time. It's the number of microseconds since the time origin.

          Code:
           
          . di %20.0f from/60*60*1000
                   43200000000
          Otherwise put, parentheses are needed to override Stata's precedence rules (which are just standard rules for arithmetic).

          Comment


          • #6
            Nice point, Nick. I didn't see that. That error is another reason to use the built-in -msofseconds(), msofminutes(), msofhours()- functions instead of writing out the factors.

            Comment


            • #7
              Thanks both,
              very much appreciated.
              On Clyde's suggestion: this is perfect, thank you. And glad to get confirmed no counterparts for dofc() on smaller time frames. Any idea why this might be missing in Stata?
              To Nick: thanks for pointing that out, very sharp eye as always! It was actually a typo in this post, (there are 2 brackets closing, I forgot to add the first open bracket after the division sign). In my original code, that typo is not there.
              Last edited by glenn_magerman; 11 Sep 2018, 12:50.

              Comment


              • #8
                I don't know. I would surmise that there isn't much demand for it, but only a StataCorp developer could really tell you why.

                Comment

                Working...
                X