I'm trying to get OHLCV exchange rate data from SIF in milliseconds to minutes in Stata for panel analysis.
The structure of the dataset is: exchange, base, quote, fromunix, tounix, open, high, low, close, volume. base and quote together identify the pair traded. fromunix and tounix are the open and close time of the candles in Unix Time Code (UTC) in milliseconds. In the end I create a large panel, but for the purpose at hand, will focus on 1 time series only.
The first step is to convert time stamps from UTC to Stata SIF:
If needed, I can use tsfill to impute gaps in the OHLCV data with whatever imputation method is deemed appropriate.
Then, I want to construct new candlestick information at higher levels of aggregation (e.g. hours, days) by collapsing:
and then xt-setting the data again at higher levels.
It's easy to deal with going to daily (or weekly/monthly) info by using the datetime commands in Stata. I might be missing something in the Stata documentation on datetime and xtset, but aggregating to time intervals smaller than days seems not directly possible.
My current approach is first generating a new variable:
and then collapse by that new variable. Since I keep the from variable from before, I can easily use the xtset commands like before:
However, using this time variable is quite clumsy, as it does not have to line up with clock hours, and so the beginning and end candles can be less than 1 hour. Is there an easier way to collapse this type of data?
The structure of the dataset is: exchange, base, quote, fromunix, tounix, open, high, low, close, volume. base and quote together identify the pair traded. fromunix and tounix are the open and close time of the candles in Unix Time Code (UTC) in milliseconds. In the end I create a large panel, but for the purpose at hand, will focus on 1 time series only.
The first step is to convert time stamps from UTC to Stata SIF:
Code:
foreach x in from to { replace `x'unix = round(`x'unix, 60*1000) // round to nearest minute gen double `x' = `x'unix + mdyhms(1,1,1970,0,0,0) // from UTC to SIF in ms format `x' %tc }
Then, I want to construct new candlestick information at higher levels of aggregation (e.g. hours, days) by collapsing:
Code:
collapse (first) from open (max) high (min) low (last) close, by(_timeframe_)
It's easy to deal with going to daily (or weekly/monthly) info by using the datetime commands in Stata. I might be missing something in the Stata documentation on datetime and xtset, but aggregating to time intervals smaller than days seems not directly possible.
My current approach is first generating a new variable:
Code:
gen double time = floor(from/60*60*1000))
Code:
xtset pair from, delta(1 hour)
Comment