Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Auto-ARIMA is now available in Stata! The new arimaauto and xtarimau commands

    Thanks to Kit Baum's relentless work on uploading new packages into the SSC, Stata now has auto-ARIMA! It's based on the same algorithm as arima.auto in R but uses different unit root tests.
    There are two commands.

    The time-series command

    arimaauto is de facto an "augmented" Mata-written sister program to Kit Baum's ARMA-limited arimasel with mutually consistent output, allowing for ARIMA(p,d,q) and multiplicative seasonal ARIMA(p,d,q)(P,D,Q) models, selecting the best model based on the LLF, AIC or SIC, and returning its estimates at the same time. However, unlike arimasel, the selection is by default performed with the help of the Hyndman-Khandakar algorithm, first implemented in the auto.arima function (part of the "forecast" package) in the R.

    Stata-adjusted Hyndman-Khandakar algorithm:

    The model selection algorithm described in Hyndman and Khandakar (2008) is based on a combination of a modified Canova-Hansen seasonal unit root test (with an empirical formula for calculation of its critical values) and of the KPSS unit root test, aimed at avoiding (alleged) overdifferencing caused by tests which assume unit root in their null hypothesis such as hegy and [TS] dfuller. Since the Canova-Hansen test was unavailable in Stata 17 and its implementation would have been a feat of its own, the algorithm was "inverted" to work with more powerful GLS-based hegy and [TS] dfgls unit root tests with a correction by the KPSS unit root test to prevent the mentioned overdifferencing aka large #d in ARIMA(p,d,q) and ARIMA(p,d,q)(P,D,Q) models. The user can disable GLS in hegy and pass additional options to all the three tests; see the help file.

    arimaauto has two modes

    Bulk estimation:

    The bulk estimation (activated with the nostepwise option in command (see the help file) is based on a large model space generated from combinations of vectors of p, q, P and Q with values lying in the range <0, limit>. For example, the default non-seasonal "bulk" model space includes 36 models and the seasonal one 324 models already (!). Therefore, caution and the use of the option maxmodels(#) is advisable.

    NB Some models may take a long time to converge or the optimizer may even become stuck on flat regions with repeated "(backed up)" messages (if trace(2) was specified). The user is advised to press the break key in such cases.

    PS To match the arimasel command, the user should not forget to increase the inverse characteristic root limit to 1 with the help of the option invroot(1).

    Stepwise traversing:

    Both the Stata-adjusted and the original Hyndman-Khandakar algorithm consist of two steps, the second of which is iterated.

    Step 1: Four initial models are considered as the model space unless options arima(#p,#d,#q) and/or sarima(#P,#D,#Q,#s) are specified:

    ARIMA(2,d,2) if #s = 0 and ARIMA(2,d,2)(1,D,1) if #s ≥ 4
    ARIMA(0,d,0) if #s = 0 and ARIMA(0,d,0)(0,D,0) if #s ≥ 4
    ARIMA(1,d,0) if #s = 0 and ARIMA(1,d,0)(1,D,0) if #s ≥ 4
    ARIMA(0,d,1) if #s = 0 and ARIMA(0,d,1)(0,D,1) if #s ≥ 4

    Otherwise, the algorithm starts with eventual combinations of the "specified" and default terms or with a single model. If d + D ≤ 1, the model(s) is(are) fitted with a constant or else the constant is omitted.

    Step 2: Out of the model space, the model with the biggest LLF, smallest AIC or smallest SIC (based on what is set in options) is selected and is called the "current" model, of which thirteen variations are considered:

    • where one of p, q, P and Q varies by ±1 from the "current" model;
    • where p and q both vary by ±1 from the "current" model;
    • where P and Q both vary by ±1 from the "current" model;
    • where the constant is excluded/included if present/absent in the "current" model.

    This step is iterated until no better "current" model can be found.

    Default limits:

    The default limits of the Hyndman-Khandakar algorithm are p ≤ 5, q ≤ 5, P ≤ 2, Q ≤ 2, every characteristic root ≥ 1.001 (in absolute value), and an error-free fit of the model, which can be changed with the help of arimaauto's options.


    NB To use arimaauto, you will need to install thehegy command from Stata Journal's archive and Kit Baum's kpss command either from the SJ or from the SSC (arimaauto will prompt you if they are missing).

    Code:
    . ssc install arimaauto
    . arimaauto
    . help arimaauto

    The panel command

    xtarimau is a panel wrapper for arimaauto which allows to run arimaauto, pre-estimation and post-estimation command(s) for each time series in a panel and export estimates. xtarimau can be used as an estimation command if a panel proves to be too heterogeneous after a unit root test and after comparing statistics for individual time series (i.e. each time series is too different to be considered as a whole). xtarimau can also be used as an inter- and extrapolation tool to the xtmipolateu command (passing predict, predictnl, forecast, and irf commands to each time series).

    Code:
    . ssc install xarimau
    . help xarimau

    Mata class ARIMAauto

    The commands are based on this Mata class which can be used separately.


    Given the popularity of auto-ARIMA, I'd be glad if users posted any mistakes or bugs they find into this thread.
    I'll try to incorporate all changes ASAP.
    Last edited by Ilya Bolotov; 05 Feb 2022, 15:47.

  • #2
    @Ilya Bolotov

    Thank you for publishing this -- I've been waiting for a Stata implementation of auto arima. In the helpfile and the Mata class ARIMAauto thread you mention

    PSS All estimations in ARIMAAuto are performed under version 13.
    However, I am getting the following error

    Code:
    . sysuse gnp96.dta, clear
    
    .  arimaauto gnp96
    (ARIMAAuto() in larimaauto, compiled by Stata 17.0, is too new to be run by this version of Stata and so was ignored)
                     <istmt>:  3499  ARIMAAuto() not found
    I am using Stata 16.1. Is there something I'm doing wrong here?

    Code:
    . which kpss
    /Users/justinniakamal/Library/Application Support/Stata/ado/plus/k/kpss.ado
    *! version 1.2.2     25jun2006     C F Baum
    
    . which arimaauto
    ./arimaauto.ado
    *! version 1.0.0  31jan2022
    *! requires st0453.pkg and sts15_2.pkg from net
    
    . which hegy
    /Users/justinniakamal/Library/Application Support/Stata/ado/plus/h/hegy.ado
    
    about
    
    Stata/SE 16.1 for Mac (Intel 64-bit)
    Revision 07 Dec 2021
    Copyright 1985-2019 StataCorp LLC

    Comment


    • #3
      Dear Justin,

      Many thanks for your feedback! It seems that the Mata compiler disrespects the Stata version indicated in my mata file.
      I re-compiled the mlib file, this time specifying the version 13.0 as prefix to the compiler command.

      Could you please download the mlib file (just click on the link) and put it into your personal folder?
      You can find the personal folder with
      Code:
      . sysdir
      https://github.com/econcz/stata-arim....mlib?raw=true
      Let's make a trial run before contacting Kit Baum to update the archives.

      If the error keeps popping up, please, download the mata file itself and run it.
      https://raw.githubusercontent.com/ec...arimaauto.mata
      It will automatically create or rewrite the mlib file and everything should work.

      Please let me know the result, so we can fix this bug for all potential users.
      Cheers.

      Comment


      • #4
        Hi Ilya,

        I added mlib to my personal folder but received the same error message. I can confirm after running the mata file I am now able to run the examples in the help file. Thank you for your help and for writing an auto arima command in Stata!
        Last edited by Justin Niakamal; 06 Feb 2022, 21:59.

        Comment


        • #5
          Hi both,
          Unfortunately, in this context, you cannot use the "version" command in the way that you would like. Stata 16 will always compile Mata code as Stata 16, such that it can only ever be understood by versions 16 or greater. I've been caught out by the same issue myself in the past. My own solution was to include in my package two versions of the code: (1) as compiled Mata code for more recent Stata versions; and (2) as Mata code within an ado-file which can be compiled on-the-fly by older Stata versions. Others may have alternative suggestions.
          Thanks,
          David.

          ETA: of course, my option (2) above is, indirectly, the same approach that Ilya and Justin used to resolve the issue in this thread: instead of using the supplied, compiled Mata code, Justin compiled the code himself using his own (older) version of Stata.
          Last edited by David Fisher; 07 Feb 2022, 06:18.

          Comment


          • #6
            Originally posted by David Fisher View Post
            Hi both,
            Unfortunately, in this context, you cannot use the "version" command in the way that you would like. Stata 16 will always compile Mata code as Stata 16, such that it can only ever be understood by versions 16 or greater. I've been caught out by the same issue myself in the past. My own solution was to include in my package two versions of the code: (1) as compiled Mata code for more recent Stata versions; and (2) as Mata code within an ado-file which can be compiled on-the-fly by older Stata versions. Others may have alternative suggestions.
            Thanks,
            David.

            ETA: of course, my option (2) above is, indirectly, the same approach that Ilya and Justin used to resolve the issue in this thread: instead of using the supplied, compiled Mata code, Justin compiled the code himself using his own (older) version of Stata.
            Dear David,

            Thank you for your valuable remark. How do you simultaneously specify Mata code for older Stata versions in your ado files?
            Does one simply add
            Code:
            clear mata
            and then re-define the classes and functions? Or is there something else one should know?

            Any advice will be much appreciated!
            Ilya

            Comment


            • #7
              Firstly: as I understand it, Mata code included at the end of a do/ado-file is "local" to that file. So there should be no need for clear mata; you simply include the Mata code at the end (in a similar way that you might write a Stata subroutine), and then the Mata functions are available to be called from within the Stata code in that same do/ado-file.

              My own solution was to use a switch based on the internal macro c(stata_version), as follows (where the text between the asterisks is, of course, pseudo-code):

              Code:
              // If v16.1+, use pre-complied Mata library; otherwise use on-the-fly Mata code
              local v161 = 0
              if "`c(stata_version)'"!="" {
                  if c(stata_version) >= 16.1 local v161 = 1
              }
              if `v161' {
                 * If Stata v16.1 + :  run a subroutine within the same ado-file, which refers to the compiled Mata code in the relevant "mlib" file *
              }
              else {
                 * If older Stata:  run a different ado-file, which contains Mata code at the end which can be referred to and compiled on-the-fly *
              }
              But my use-case was fairly straightforward: I just needed to have a set of simple Mata functions to hand when carrying out particular analyses. I had no need for complex systems, classes etc. As such, I could easily have done without the compiled "mlib" code completely, and just used the "in-line" code for all users regardless of their Stata version -- I don't think anyone would notice or care!

              There are many other threads on StataList which deal with similar scenarios, including discussions of more complicated use-cases. For example: https://www.statalist.org/forums/for...rent-ado-files

              I hope that helps!
              BW,
              David.
              Last edited by David Fisher; 08 Feb 2022, 04:55.

              Comment


              • #8
                Dear Ilya and everyone

                I am recently trying to run a project using autoarima, and there seems to be a package missing when I tried to install autoarima package. I am using Stata15.

                Code:
                ssc install arimaauto
                checking arimaauto consistency and verifying not already installed...
                installation complete.
                 . arimaauto
                please type:
                . net install st0453.pkg 
                r(601);
                and when I inputted the
                Code:
                net install
                command, it showed that the file is missing.
                Code:
                 net install st0453.pkg
                file http://fmwww.bc.edu/repec/bocode/a/st0453.pkg not found
                could not load st0453.pkg from http://fmwww.bc.edu/repec/bocode/a/
                r(601);
                The same happened to
                Code:
                ssc install xarimau
                where it says
                Code:
                ssc install: "xarimau" not found at SSC, type search xarimau
                (To find all packages at SSC that start with x, type ssc describe x)
                r(601);
                Is there any way to fix this please?

                Cheng

                Comment


                • #9
                  Dear Cheng,

                  Code:
                  net install st0453.pkg
                  is some kind of bug within the net install command. I'm experiencing it as well with infrequent regularity. Simply search the "hegy" test using the search command and install it manually (it must be hegy, not hegy4).

                  Code:
                  ssc install xarimau
                  It should be xtarimau...

                  Hopefully it helps :-)

                  Comment


                  • #10
                    I received the following error: <istmt>: 3499 ARIMAAuto() not found
                    r(3499);


                    I tried to add the mata file directly to my personal folder though I'm not totally sure I did that correctly. There was no personal folder listed in the location that was specified by sysdir. I added a folder and placed it in there.

                    I then opened in Stata and ran the file. and received the following error:


                    ------------------------------------------------- mata (type end to exit) --------------------------------------------
                    : mata set matastrict on

                    :
                    : `CC' ARIMAAuto extends AssociativeArray /* class */
                    class AssociativeArray undefined
                    (747 lines skipped)


                    I'm probably making some simple mistake.

                    Last edited by ShawnG DuBravac; 05 Jun 2022, 20:56.

                    Comment


                    • #11
                      Originally posted by ShawnG DuBravac View Post
                      I received the following error: <istmt>: 3499 ARIMAAuto() not found
                      r(3499);


                      I tried to add the mata file directly to my personal folder though I'm not totally sure I did that correctly. There was no personal folder listed in the location that was specified by sysdir. I added a folder and placed it in there.

                      I then opened in Stata and ran the file. and received the following error:


                      ------------------------------------------------- mata (type end to exit) --------------------------------------------
                      : mata set matastrict on

                      :
                      : `CC' ARIMAAuto extends AssociativeArray /* class */
                      class AssociativeArray undefined
                      (747 lines skipped)


                      I'm probably making some simple mistake.
                      Dear Shawn,

                      Which Stata version are you using?

                      Comment


                      • #12
                        Originally posted by Ilya Bolotov View Post

                        Dear Shawn,

                        Which Stata version are you using?
                        13.1

                        Comment


                        • #13
                          Dear Ilya,

                          I'm using Stata 15.1 and I received the following error:

                          Code:
                          . sysuse gnp96.dta, clear
                          
                          . arimaauto gnp96
                                  ARIMAAuto::put():  3301  subscript invalid
                                           <istmt>:     -  function returned error
                          Is there anything I can do to fix the problem?

                          Thank you
                          Last edited by hee sun; 05 Jul 2022, 21:09.

                          Comment


                          • #14
                            Hi Ilya Bolotov ! Thank you for bringing auto.arima from R into Stata. I am using Stata 17.0 and I encountered an error when I use the "if" condition to run arimaauto on a train set. Here is the example using the "data_uk.dta" provided with the hegy package.

                            Code:
                            . sysuse data_uk.dta
                            // Create a training set indicator
                            
                            . gen insample = (time<=tm(2010m12))
                            
                            // The regular arimaauto works fine without the "if" condition
                            
                            arimaauto luk
                            
                            // Below I use the "if" condition to run autoarima on the training set
                            
                            . arimaauto luk if insample==1
                            Error: convergence failure or at least one eigenvalue is at least .999001
                            
                            // I receive the  above error .
                            I tried the running arimaauto on other datasets too and I receive the error. However, the R command `auto.arima()` allows to run on a train set.

                            Looking forward to your help with this issue.

                            Thanks
                            Last edited by Akash Issar; 12 Jul 2022, 10:32.

                            Comment


                            • #15
                              Hi Ilya,
                              I am using Version 16.1, and am getting the following error message when running arimaauto. I was under the impression that arimaauto was run using Version 13, is there a way to run arimaauto using 16.1 at all?
                              Thanks!

                              Code:
                              . arimaauto smk
                              (ARIMAAuto::new() in larimaauto, compiled by Stata 17.0, is too new to be run by this version of Stata and so was ignored)
                              ARIMAAuto(): 3499 member ARIMAAuto::new() not found
                              <istmt>: - function returned error

                              Comment

                              Working...
                              X