Thanks to Kit Baum's relentless work on uploading new packages into the SSC, Stata now has auto-ARIMA! It's based on the same algorithm as arima.auto in R but uses different unit root tests.
There are two commands.
The time-series command
arimaauto is de facto an "augmented" Mata-written sister program to Kit Baum's ARMA-limited arimasel with mutually consistent output, allowing for ARIMA(p,d,q) and multiplicative seasonal ARIMA(p,d,q)(P,D,Q) models, selecting the best model based on the LLF, AIC or SIC, and returning its estimates at the same time. However, unlike arimasel, the selection is by default performed with the help of the Hyndman-Khandakar algorithm, first implemented in the auto.arima function (part of the "forecast" package) in the R.
Stata-adjusted Hyndman-Khandakar algorithm:
The model selection algorithm described in Hyndman and Khandakar (2008) is based on a combination of a modified Canova-Hansen seasonal unit root test (with an empirical formula for calculation of its critical values) and of the KPSS unit root test, aimed at avoiding (alleged) overdifferencing caused by tests which assume unit root in their null hypothesis such as hegy and [TS] dfuller. Since the Canova-Hansen test was unavailable in Stata 17 and its implementation would have been a feat of its own, the algorithm was "inverted" to work with more powerful GLS-based hegy and [TS] dfgls unit root tests with a correction by the KPSS unit root test to prevent the mentioned overdifferencing aka large #d in ARIMA(p,d,q) and ARIMA(p,d,q)(P,D,Q) models. The user can disable GLS in hegy and pass additional options to all the three tests; see the help file.
arimaauto has two modes
Bulk estimation:
The bulk estimation (activated with the nostepwise option in command (see the help file) is based on a large model space generated from combinations of vectors of p, q, P and Q with values lying in the range <0, limit>. For example, the default non-seasonal "bulk" model space includes 36 models and the seasonal one 324 models already (!). Therefore, caution and the use of the option maxmodels(#) is advisable.
NB Some models may take a long time to converge or the optimizer may even become stuck on flat regions with repeated "(backed up)" messages (if trace(2) was specified). The user is advised to press the break key in such cases.
PS To match the arimasel command, the user should not forget to increase the inverse characteristic root limit to 1 with the help of the option invroot(1).
Stepwise traversing:
Both the Stata-adjusted and the original Hyndman-Khandakar algorithm consist of two steps, the second of which is iterated.
Step 1: Four initial models are considered as the model space unless options arima(#p,#d,#q) and/or sarima(#P,#D,#Q,#s) are specified:
• ARIMA(2,d,2) if #s = 0 and ARIMA(2,d,2)(1,D,1) if #s ≥ 4
• ARIMA(0,d,0) if #s = 0 and ARIMA(0,d,0)(0,D,0) if #s ≥ 4
• ARIMA(1,d,0) if #s = 0 and ARIMA(1,d,0)(1,D,0) if #s ≥ 4
• ARIMA(0,d,1) if #s = 0 and ARIMA(0,d,1)(0,D,1) if #s ≥ 4
Otherwise, the algorithm starts with eventual combinations of the "specified" and default terms or with a single model. If d + D ≤ 1, the model(s) is(are) fitted with a constant or else the constant is omitted.
Step 2: Out of the model space, the model with the biggest LLF, smallest AIC or smallest SIC (based on what is set in options) is selected and is called the "current" model, of which thirteen variations are considered:
• where one of p, q, P and Q varies by ±1 from the "current" model;
• where p and q both vary by ±1 from the "current" model;
• where P and Q both vary by ±1 from the "current" model;
• where the constant is excluded/included if present/absent in the "current" model.
This step is iterated until no better "current" model can be found.
Default limits:
The default limits of the Hyndman-Khandakar algorithm are p ≤ 5, q ≤ 5, P ≤ 2, Q ≤ 2, every characteristic root ≥ 1.001 (in absolute value), and an error-free fit of the model, which can be changed with the help of arimaauto's options.
NB To use arimaauto, you will need to install thehegy command from Stata Journal's archive and Kit Baum's kpss command either from the SJ or from the SSC (arimaauto will prompt you if they are missing).
The panel command
xtarimau is a panel wrapper for arimaauto which allows to run arimaauto, pre-estimation and post-estimation command(s) for each time series in a panel and export estimates. xtarimau can be used as an estimation command if a panel proves to be too heterogeneous after a unit root test and after comparing statistics for individual time series (i.e. each time series is too different to be considered as a whole). xtarimau can also be used as an inter- and extrapolation tool to the xtmipolateu command (passing predict, predictnl, forecast, and irf commands to each time series).
Mata class ARIMAauto
The commands are based on this Mata class which can be used separately.
Given the popularity of auto-ARIMA, I'd be glad if users posted any mistakes or bugs they find into this thread.
I'll try to incorporate all changes ASAP.
There are two commands.
The time-series command
arimaauto is de facto an "augmented" Mata-written sister program to Kit Baum's ARMA-limited arimasel with mutually consistent output, allowing for ARIMA(p,d,q) and multiplicative seasonal ARIMA(p,d,q)(P,D,Q) models, selecting the best model based on the LLF, AIC or SIC, and returning its estimates at the same time. However, unlike arimasel, the selection is by default performed with the help of the Hyndman-Khandakar algorithm, first implemented in the auto.arima function (part of the "forecast" package) in the R.
Stata-adjusted Hyndman-Khandakar algorithm:
The model selection algorithm described in Hyndman and Khandakar (2008) is based on a combination of a modified Canova-Hansen seasonal unit root test (with an empirical formula for calculation of its critical values) and of the KPSS unit root test, aimed at avoiding (alleged) overdifferencing caused by tests which assume unit root in their null hypothesis such as hegy and [TS] dfuller. Since the Canova-Hansen test was unavailable in Stata 17 and its implementation would have been a feat of its own, the algorithm was "inverted" to work with more powerful GLS-based hegy and [TS] dfgls unit root tests with a correction by the KPSS unit root test to prevent the mentioned overdifferencing aka large #d in ARIMA(p,d,q) and ARIMA(p,d,q)(P,D,Q) models. The user can disable GLS in hegy and pass additional options to all the three tests; see the help file.
arimaauto has two modes
Bulk estimation:
The bulk estimation (activated with the nostepwise option in command (see the help file) is based on a large model space generated from combinations of vectors of p, q, P and Q with values lying in the range <0, limit>. For example, the default non-seasonal "bulk" model space includes 36 models and the seasonal one 324 models already (!). Therefore, caution and the use of the option maxmodels(#) is advisable.
NB Some models may take a long time to converge or the optimizer may even become stuck on flat regions with repeated "(backed up)" messages (if trace(2) was specified). The user is advised to press the break key in such cases.
PS To match the arimasel command, the user should not forget to increase the inverse characteristic root limit to 1 with the help of the option invroot(1).
Stepwise traversing:
Both the Stata-adjusted and the original Hyndman-Khandakar algorithm consist of two steps, the second of which is iterated.
Step 1: Four initial models are considered as the model space unless options arima(#p,#d,#q) and/or sarima(#P,#D,#Q,#s) are specified:
• ARIMA(2,d,2) if #s = 0 and ARIMA(2,d,2)(1,D,1) if #s ≥ 4
• ARIMA(0,d,0) if #s = 0 and ARIMA(0,d,0)(0,D,0) if #s ≥ 4
• ARIMA(1,d,0) if #s = 0 and ARIMA(1,d,0)(1,D,0) if #s ≥ 4
• ARIMA(0,d,1) if #s = 0 and ARIMA(0,d,1)(0,D,1) if #s ≥ 4
Otherwise, the algorithm starts with eventual combinations of the "specified" and default terms or with a single model. If d + D ≤ 1, the model(s) is(are) fitted with a constant or else the constant is omitted.
Step 2: Out of the model space, the model with the biggest LLF, smallest AIC or smallest SIC (based on what is set in options) is selected and is called the "current" model, of which thirteen variations are considered:
• where one of p, q, P and Q varies by ±1 from the "current" model;
• where p and q both vary by ±1 from the "current" model;
• where P and Q both vary by ±1 from the "current" model;
• where the constant is excluded/included if present/absent in the "current" model.
This step is iterated until no better "current" model can be found.
Default limits:
The default limits of the Hyndman-Khandakar algorithm are p ≤ 5, q ≤ 5, P ≤ 2, Q ≤ 2, every characteristic root ≥ 1.001 (in absolute value), and an error-free fit of the model, which can be changed with the help of arimaauto's options.
NB To use arimaauto, you will need to install thehegy command from Stata Journal's archive and Kit Baum's kpss command either from the SJ or from the SSC (arimaauto will prompt you if they are missing).
Code:
. ssc install arimaauto . arimaauto . help arimaauto
The panel command
xtarimau is a panel wrapper for arimaauto which allows to run arimaauto, pre-estimation and post-estimation command(s) for each time series in a panel and export estimates. xtarimau can be used as an estimation command if a panel proves to be too heterogeneous after a unit root test and after comparing statistics for individual time series (i.e. each time series is too different to be considered as a whole). xtarimau can also be used as an inter- and extrapolation tool to the xtmipolateu command (passing predict, predictnl, forecast, and irf commands to each time series).
Code:
. ssc install xarimau . help xarimau
Mata class ARIMAauto
The commands are based on this Mata class which can be used separately.
Given the popularity of auto-ARIMA, I'd be glad if users posted any mistakes or bugs they find into this thread.
I'll try to incorporate all changes ASAP.
Comment