How to save the residuals from a Dickey-Fuller test (dfuller)?

John Whymer

Join Date: May 2018

Posts: 12
#1

How to save the residuals from a Dickey-Fuller test (dfuller)?

27 Jul 2018, 07:20

I'm not sure how to find and save the residuals from a Dickey-Fuller test, because I find no information whether or where these residuals are stored.
So am I wrong if I suppose, it's similar as in a linear regression and write:

PHP Code:

dfuller RealGDP, lags(4) predict resid, residuals

According to my assumption the residuals should now be stored in "resid". If I list "resid", I receive indeed a time series that looks like residuals.

But I want to be sure, since I do not find any information in the Stata Manual entry for "dfuller" that tells me wether a dfuller test stores the residuals.

Thanks a lot for any clarification!

John
Tags: None
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#2

27 Jul 2018, 07:38

The dfuller command is not an estimation command and the predict postestimation command thus does not work. Normally, you would receive an error message saying

Code:

last estimates not found r(301);

If that is not the case and predict actually generates a new variable, then these must be the residuals from an earlier estimation that you have run before the dfuller command.

You can use the regress option of the dfuller command to display the underlying regression estimates. However, this does not store the estimation results in the memory such that predict would still not work. However, it shows you how the regression model looks like and you can easily reestimate the model with the regress command, and subsequently use predict.

https://www.kripfganz.de/stata/
Comment
John Whymer

Join Date: May 2018

Posts: 12
#3

27 Jul 2018, 07:48

Thanks a lot for clarification Sebastian!
But it's sad that the "dfuller" residuals are not stored, since the ADF test requires uncorrelated residuals. So one should check the residuals somehow.
Would a lag selection with "varsoc" make sense?
Comment
John Whymer

Join Date: May 2018

Posts: 12
#4

27 Jul 2018, 09:52

I created now a kind of workaround: I have "viewsource"d the original "dfuller" code an then added a very small modification (which is marked in my code below) to save the ADF regression residuals in a time series called "ADFresiduals", which can be used after the new code called "dfuller_modified" has run. The new command is then "dfuller_modified YourTimeSeriesName". The code has to be saved under "C:\Program Files (x86)\Stata15\ado\base\d" (where the Stata code is stored typically) under the name "dfuller_modified.ado". The code is:

PHP Code:

*! version 1.3.0 10oct2017 *** Modified by John Whymer to store ADF residuals in "ADFresiduals" **** program define dfuller_modified, rclass version 6.0, missing syntax varname(ts) [if] [in] [, TRend noCONstant /* */ DRift Lags(int -1) REGress /* */ CERTIFY ] /* certify is an undocumented option that keeps the dickey-fuller regression results lying around for certification purposes */ if "`drift'" != "" { if "`constan'" != "" { noi di as error "cannot specify drift if constant is excluded" exit 198 } if "`trend'" != "" { noi di as error "cannot specify drift if time trend is included" exit 198 } } if "`constan'" != "" & "`trend'" == "" { local case 1 } else if "`constan'" == "" & "`trend'" == "" & "`drift'" == "" { local case 2 } else if "`constan'" == "" & "`trend'" == "" & "`drift'" != "" { local case 3 } else if "`constan'" == "" & "`trend'" != "" { local case 4 } else { noi di in red "cannot choose trend if constant is excluded" exit 198 } marksample touse _ts tvar panelvar `if' `in', sort onepanel markout `touse' `tvar' local samp "if `touse'==1" tempname usrest // may not be e-class() stuff lying around, so capture this if "`certify'" == "" { version 10: _estimates hold `usrest', copy restore nullok } quietly { if `lags' < 0 { local lags 0 } local mac if `case' == 2 { local mac "c" } if `case' == 4 { local mac "ct" } if "`trend'" != "" { summ `tvar' `samp', meanonly local min = r(min) tempvar tt gen long `tt' = `tvar'-r(min) } if `lags' == 0 { reg D.`varlist' L.`varlist' `tt' `samp', `constan' } else { reg D.`varlist' L.`varlist' DL(1/`lags').`varlist' /* */ `tt' `samp', `constan' local aug "Augmented " } local T = e(N) local n = e(N) - e(df_r) local Zt = _b[L.`varlist'] / _se[L.`varlist'] if "`mac'" != "" { MacP `mac' `Zt' local ztp = `r(p)' } if `case' == 3 { local ztp = 1 - ttail(e(df_r), `Zt') } GetCrit `case' `T' `varname' } *************************************************************** * Modification to save residuals in ADFresiduals by John Whymer: *************************************************************** predict ADFresid, residuals generate ADFresiduals = ADFresid *************************************************************** noi di in gr _n "`aug'Dickey-Fuller test for unit root" /* */ _col(52) "Number of obs = " in ye %9.0g `T' if `case' == 3 { di _n in smcl as text _col(32) /* */ "{hline 11} Z(t) has t-distribution {hline 11}" } else { di _n in smcl in gr _col(32) /* */ "{hline 10} Interpolated Dickey-Fuller {hline 9}" } di in gr _col (19) "Test" /* */ _col(32) "1% Critical" /* */ _col(50) "5% Critical" /* */ _col(67) "10% Critical" di in gr _col (16) "Statistic" /* */ _col(36) "Value" /* */ _col(54) "Value" /* */ _col(72) "Value" di in gr in smcl "{hline 78}" di in gr " Z(t)" /* */ _col(15) in ye %10.3f `Zt' /* */ _col(33) %10.3f `r(Zt1)' /* */ _col(51) %10.3f `r(Zt5)' /* */ _col(69) %10.3f `r(Zt10)' ret scalar cv10 = `r(Zt10)' ret scalar cv5 = `r(Zt5)' ret scalar cv1 = `r(Zt1)' if `case' == 3 { di as text in smcl "{hline 78}" di as text "p-value for Z(t) = " as res %6.4f `ztp' ret scalar p = `ztp' } else if "`ztp'" != "" { di in gr in smcl "{hline 78}" di in gr "MacKinnon approximate p-value for Z(t) = " /* */ in ye %6.4f `ztp' ret scalar p = `ztp' } if "`regress'" != "" { di if "`tt'" != "" { DispReg `tt' `lags' `varlist' } else { regress, nohead } } ret scalar Zt = `Zt' ret scalar N = `T' ret scalar lags = `lags' end program define GetCrit, rclass args case N varlist /* Take care of case 3 first, since easiest */ if `case' == 3 { local zt1 = invttail(e(df_r), 0.99) local zt5 = invttail(e(df_r), 0.95) local zt10 = invttail(e(df_r), 0.90) return scalar Zt1 = `zt1' return scalar Zt5 = `zt5' return scalar Zt10 = `zt10' exit } tempname zt if `case' == 1 { mat `zt' = ( -2.66,-2.62,-2.60,-2.58,-2.58,-2.58\ /* */ -1.95,-1.95,-1.95,-1.95,-1.95,-1.95\ /* */ -1.60,-1.61,-1.61,-1.62,-1.62,-1.62) } else if `case' == 2 { mat `zt' = ( -3.75,-3.58,-3.51,-3.46,-3.44,-3.43\ /* */ -3.00,-2.93,-2.89,-2.88,-2.87,-2.86\ /* */ -2.63,-2.60,-2.58,-2.57,-2.57,-2.57) } else { mat `zt' = ( -4.38,-4.15,-4.04,-3.99,-3.98,-3.96\ /* */ -3.60,-3.50,-3.45,-3.43,-3.42,-3.41\ /* */ -3.24,-3.18,-3.15,-3.13,-3.13,-3.12) } if `N' <= 25 { local zt1 = `zt'[1,1] local zt5 = `zt'[2,1] local zt10 = `zt'[3,1] } else if `N' <= 50 { local zt1 = `zt'[1,1] + (`N'-25)/25 * (`zt'[1,2]-`zt'[1,1]) local zt5 = `zt'[2,1] + (`N'-25)/25 * (`zt'[2,2]-`zt'[2,1]) local zt10 = `zt'[3,1] + (`N'-25)/25 * (`zt'[3,2]-`zt'[3,1]) } else if `N' <= 100 { local zt1 = `zt'[1,2] + (`N'-50)/50 * (`zt'[1,3]-`zt'[1,2]) local zt5 = `zt'[2,2] + (`N'-50)/50 * (`zt'[2,3]-`zt'[2,2]) local zt10 = `zt'[3,2] + (`N'-50)/50 * (`zt'[3,3]-`zt'[3,2]) } else if `N' <= 250 { local zt1 = `zt'[1,3] + (`N'-100)/150 * (`zt'[1,4]-`zt'[1,3]) local zt5 = `zt'[2,3] + (`N'-100)/150 * (`zt'[2,4]-`zt'[2,3]) local zt10 = `zt'[3,3] + (`N'-100)/150 * (`zt'[3,4]-`zt'[3,3]) } else if `N' <= 500 { local zt1 = `zt'[1,4] + (`N'-250)/250 * (`zt'[1,5]-`zt'[1,4]) local zt5 = `zt'[2,4] + (`N'-250)/250 * (`zt'[2,5]-`zt'[2,4]) local zt10 = `zt'[3,4] + (`N'-250)/250 * (`zt'[3,5]-`zt'[3,4]) } else { local zt1 = `zt'[1,6] local zt5 = `zt'[2,6] local zt10 = `zt'[3,6] } return scalar Zt1 = `zt1' return scalar Zt5 = `zt5' return scalar Zt10 = `zt10' end program define MacP, rclass args type tau local stype = lower("`type'") if "`stype'"=="c" { local type 0 } else { local type 1 } local g3=0 local min=. local max=. if `type'==0 { /* no trend but constant in ADF regression */ if `tau'>-1.61 { local min = -9999 local max = 2.74 local g0 = 1.7339 local g1 = 0.93202 local g2 = -0.12745 local g3 = -0.010368 } else { local min = -18.83 local g0 = 2.1659 local g1 = 1.4412 local g2 = 0.038269 local g3 = 0 } } /* type==0 */ else if `type'==1 { /* linear trend and constant in ADF reg.*/ if `tau'>-2.89 { local min = -9999 local max = 0.70 local g0 = 2.5261 local g1 = 0.61654 local g2 = -0.37956 local g3 = -0.060285 } else { local min = -16.18 local g0 = 3.2512 local g1 = 1.6047 local g2 = 0.049588 local g3 = 0 } } /* type==1 */ local h = `g0' + `g1'*`tau' + `g2'*(`tau')^2 + `g3'*(`tau')^3 local p = cond(`tau'<`min',0,cond(`tau'>`max',1,normprob(`h'))) return scalar p = `p' local h = `g0' + `g1'*`tau' + `g2'*(`tau')^2 + `g3'*(`tau')^3 local p = cond(`tau'<`min',0,cond(`tau'>`max',1,normprob(`h'))) return scalar p = `p' end program define DispReg args tt lags dvar di in smcl in gr "{hline 13}{c TT}{hline 64}" di in smcl in gr abbrev("`e(depvar)'",12) _col(14) "{c |}" /* */ _col(21) "Coef." _col(29) "Std. Err." _col(44) "t" /* */ _col(49) "P>|t|" _col(59) "[95% Conf. Interval]" di in smcl in gr "{hline 13}{c +}{hline 64}" di in smcl in gr %12s abbrev("`dvar'",12) _col(14) "{c |}" local vv "L1.`dvar'" local bv "_b[`vv']" local sv "_se[`vv']" di in smcl in gr _col(10) "L1. {c |}" in ye /* */ _col(17) %9.0g `bv' /* */ _col(28) %9.0g `sv' /* */ _col(38) %8.2f `bv'/`sv' /* */ _col(48) %6.3f tprob(e(df_r),`bv'/`sv') /* */ _col(58) %9.0g `bv' - invt(`e(df_r)',$S_level/100)*`sv' /* */ _col(70) %9.0g `bv' + invt(`e(df_r)',$S_level/100)*`sv' local vv "LD.`dvar'" local bv "_b[`vv']" local sv "_se[`vv']" if `lags' >= 1 { di in smcl in gr _col(10) "LD. {c |}" in ye /* */ _col(17) %9.0g `bv' /* */ _col(28) %9.0g `sv' /* */ _col(38) %8.2f `bv'/`sv' /* */ _col(48) %6.3f tprob(e(df_r),`bv'/`sv') /* */ _col(58) %9.0g `bv' - invt(`e(df_r)',/* */ $S_level/100)*`sv' /* */ _col(70) %9.0g `bv' + invt(`e(df_r)',/* */ $S_level/100)*`sv' } local i 2 while `i' <= `lags' { local vv "L`i'D.`dvar'" local bv "_b[`vv']" local sv "_se[`vv']" di in smcl in gr %12s "L`i'D." " {c |}" in ye /* */ _col(17) %9.0g `bv' /* */ _col(28) %9.0g `sv' /* */ _col(38) %8.2f `bv'/`sv' /* */ _col(48) %6.3f tprob(e(df_r),`bv'/`sv') /* */ _col(58) %9.0g `bv' - invt(`e(df_r)',/* */ $S_level/100)*`sv' /* */ _col(70) %9.0g `bv' + invt(`e(df_r)',/* */ $S_level/100)*`sv' local i = `i'+1 } local vv "`tt'" local bv "_b[`vv']" local sv "_se[`vv']" di in smcl in gr %12s "_trend" _col(14) "{c |}" in ye /* */ _col(17) %9.0g `bv' /* */ _col(28) %9.0g `sv' /* */ _col(38) %8.2f `bv'/`sv' /* */ _col(48) %6.3f tprob(e(df_r),`bv'/`sv') /* */ _col(58) %9.0g `bv' - invt(`e(df_r)',$S_level/100)*`sv' /* */ _col(70) %9.0g `bv' + invt(`e(df_r)',$S_level/100)*`sv' local vv "_cons" local bv "_b[`vv']" local sv "_se[`vv']" di in smcl in gr %12s "_cons" _col(14) "{c |}" in ye /* */ _col(17) %9.0g `bv' /* */ _col(28) %9.0g `sv' /* */ _col(38) %8.2f `bv'/`sv' /* */ _col(48) %6.3f tprob(e(df_r),`bv'/`sv') /* */ _col(58) %9.0g `bv' - invt(`e(df_r)',$S_level/100)*`sv' /* */ _col(70) %9.0g `bv' + invt(`e(df_r)',$S_level/100)*`sv' di in smcl in gr "{hline 13}{c BT}{hline 64}" end
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#5

27 Jul 2018, 10:33

Originally posted by John Whymer View Post

Would a lag selection with "varsoc" make sense?

Indeed, the varsoc command is often used in combination with the dfuller command.

https://www.kripfganz.de/stata/
Comment

Sebastian Kripfganz

Join Date: May 2014
Posts: 2575

27 Jul 2018, 10:43

By the way, you could achieve the same goal by using the ardl command. Here is an example:

Code:

. webuse lutkepohl2
(Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1)

. ardl ln_consump, ec

ARDL(4) regression

Sample: 1961q1 - 1982q4                         Number of obs     =         88
                                                R-squared         =     0.1621
                                                Adj R-squared     =     0.1218
Log likelihood =  280.48555                     Root MSE          =     0.0103

------------------------------------------------------------------------------
D.ln_consump |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ADJ          |
  ln_consump |
         L1. |  -.0022516   .0022221    -1.01   0.314    -.0066713    .0021681
-------------+----------------------------------------------------------------
SR           |
  ln_consump |
         LD. |  -.1211548   .1034523    -1.17   0.245    -.3269172    .0846076
        L2D. |   .2429064   .1017705     2.39   0.019     .0404889    .4453239
        L3D. |    .307846   .1048838     2.94   0.004     .0992364    .5164555
             |
       _cons |   .0258569   .0164955     1.57   0.121    -.0069519    .0586658
------------------------------------------------------------------------------

. estat ectest

Pesaran, Shin, and Smith (2001) bounds test

H0: no level relationship                                        F =     1.027
Case 3                                                           t =    -1.013

Finite sample (0 variables, 88 observations, 3 short-run coefficients)

Kripfganz and Schneider (2018) critical values and approximate p-values

   | 10%              | 5%               | 1%               | p-value        
   |    I(0)     I(1) |    I(0)     I(1) |    I(0)     I(1) |    I(0)     I(1)
---+------------------+------------------+------------------+-----------------
 F |   6.581    6.570 |   8.255    8.236 |  12.119   12.071 |   0.742    0.742
 t |  -2.565   -2.569 |  -2.868   -2.874 |  -3.460   -3.470 |   0.735    0.738

do not reject H0 if
    both F and t are closer to zero than critical values for I(0) variables
      (if p-values > desired level for I(0) variables)
reject H0 if
    both F and t are more extreme than critical values for I(1) variables
      (if p-values < desired level for I(1) variables)

. predict resid, residuals
(4 missing values generated)

The regression is the same as the augmented Dickey-Fuller regression with an optimal lag selection automatically applied. The t-statistic reported by the postestimation command estat ectest is the Dickey-Fuller test statistic. (Finite-sample) critical values and approximate p-values are provided as well (choose from the columns labelled I(0)). Finally, predict works in the usual way after ardl.

Compare with the dfuller command:

Code:

. varsoc ln_consump

   Selection-order criteria
   Sample:  1961q1 - 1982q4                     Number of obs      =        88
  +---------------------------------------------------------------------------+
  |lag |    LL      LR      df    p      FPE       AIC      HQIC      SBIC    |
  |----+----------------------------------------------------------------------|
  |  0 | -64.5106                        .2595   1.48888   1.50022   1.51703  |
  |  1 |  273.713  676.45    1  0.000  .000122   -6.1753  -6.15262    -6.119  |
  |  2 |  273.958  .48997    1  0.484  .000124  -6.15815  -6.12412  -6.07369  |
  |  3 |   276.14   4.364    1  0.037  .000121  -6.18501  -6.13964   -6.0724  |
  |  4 |  280.486  8.6903*   1  0.003  .000112* -6.26104* -6.20433* -6.12028* |
  +---------------------------------------------------------------------------+
   Endogenous:  ln_consump
    Exogenous:  _cons

. dfuller ln_consump, lags(3) regress

Augmented Dickey-Fuller test for unit root         Number of obs   =        88

                               ---------- Interpolated Dickey-Fuller ---------
                  Test         1% Critical       5% Critical      10% Critical
               Statistic           Value             Value             Value
------------------------------------------------------------------------------
 Z(t)             -1.013            -3.527            -2.900            -2.585
------------------------------------------------------------------------------
MacKinnon approximate p-value for Z(t) = 0.7484

------------------------------------------------------------------------------
D.ln_consump |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  ln_consump |
         L1. |  -.0022516   .0022221    -1.01   0.314    -.0066713    .0021681
         LD. |  -.1211548   .1034523    -1.17   0.245    -.3269172    .0846076
        L2D. |   .2429064   .1017705     2.39   0.019     .0404889    .4453239
        L3D. |    .307846   .1048838     2.94   0.004     .0992364    .5164555
             |
       _cons |   .0258569   .0164955     1.57   0.121    -.0069519    .0586658
------------------------------------------------------------------------------

Last edited by Sebastian Kripfganz; 27 Jul 2018, 10:47. Reason: dfuller comparison added

https://www.kripfganz.de/stata/

Comment

John Whymer

Join Date: May 2018

Posts: 12
#7

27 Jul 2018, 10:49

OK! Cool! Thanks a lot Sebastian!
Comment

Announcement