Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to find the optimal lag length in VAR(or PVAR) analysis

    Hi Everyone,

    I'm interested in running Panel VAR analysis. One of the procedures was to find the optimal lag length that would affect DV (values for the current period).
    I searched and studied on the internet and found that Stata had modules for PVAR and pvarsoc command does a job finding the optimal lag length.
    But surprisingly, it took significant amount of time (more than an hour as I remember) to run and the result was not clear (most of AIC and BIC values were missing).
    I decided to take the other way; run varsoc (same command but for time series) for each group and get the median values of all the optimal lag length for all the groups.

    Below is the subsample of my data which includes labor, capital, and gdp at a country*year level but for only one country.
    Here my question is that, how should I identify the optimal lag length?
    I tried changing maxlag from 5 to 10 but every different maxlag gave me the different optimal lag length.
    For example, if I set it as 5 and it gave me 4 as the optimal lag length and still gave me 4 even when I changed the maxlag by increasing it one by one, I would confirm and safely choose 4 as the optimal lag length.
    Another question is, if I increase maxlag more and more, the outputs (AIC, BIC) become missing.

    Summary of my question:
    If I increase the maxlag too much, output values become missing.
    If I choose some arbitrary maxlag, the results give me different optimal lag length.
    Based on these situations, how should I choose the optimal lag length?

    I spent a lot of time to find the good answer for this but couldn't find any proper materials or solutions.
    I would appreciate if anyone could help me out and share your opnions!
    Thanks.

    Code:
    varsoc  ln_capital ln_labor ln_gdp if country_num==1, maxlag(5)
    varsoc  ln_capital ln_labor ln_gdp if country_num==1, maxlag(6)
    varsoc  ln_capital ln_labor ln_gdp if country_num==1, maxlag(7)
    varsoc  ln_capital ln_labor ln_gdp if country_num==1, maxlag(8)
    varsoc  ln_capital ln_labor ln_gdp if country_num==1, maxlag(9)
    varsoc  ln_capital ln_labor ln_gdp if country_num==1, maxlag(10)
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long country_num int year float(total_labor_hrs ln_labor gdp ln_gdp capital ln_capital)
    1 1990  51406.98  10.84753  416185.7 12.938887 1053472.3 13.867602
    1 1991  46151.68  10.73969  530883.3 13.182298 1390154.4 14.144925
    1 1992  49013.68 10.799855 551406.44 13.220227 1400269.3 14.152175
    1 1993  47375.19 10.765854  553125.9  13.22334 1423952.3 14.168947
    1 1994  44834.04 10.710723  522478.2  13.16634 1432616.6 14.175014
    1 1995  42386.19 10.654578  520203.2 13.161975 1430593.3   14.1736
    1 1996  39709.59 10.589348  490409.9 13.102997 1412010.8 14.160525
    1 1997  32572.23 10.391215  422775.6 12.954597 1330209.5 14.100847
    1 1998 37470.734 10.531316 454726.75 13.027452 1325770.3 14.097505
    1 1999  39478.44  10.58351    497031 13.116407   1365074  14.12672
    1 2000  40831.18   10.6172  547243.8  13.21265 1436258.4 14.177552
    1 2001  42834.59   10.6651  593230.8  13.29334 1551853.4  14.25496
    1 2002  44132.58 10.694954  646576.2 13.379446 1644787.4 14.313122
    1 2003  44099.77  10.69421  683466.8 13.434934 1716972.6 14.356073
    1 2004  40278.88 10.603582  678409.6 13.427506   1728725 14.362895
    1 2005  40348.53  10.60531  752309.6 13.530903 1770404.4  14.38672
    1 2006  42243.88 10.651215    833239 13.633076   1858362 14.435206
    1 2007  42320.25  10.65302    850283 13.653324   1936431 14.476357
    1 2008  42542.11  10.65825  864276.6 13.669648   2034244 14.525635
    1 2009  42918.76 10.667065  859908.5  13.66458 2139373.8 14.576024
    end
    label values country_num country_num
    label def country_num 1 "Argentina", modify

  • #2
    I would caution against using too deep of a lag length with annual data. When working with monthly data you typically varsoc with a max lag of 12 or with quarterly 4 (in some cases the lag length can be higher, however). With annual data, it's harder to justify higher-order lags.

    If I increase the maxlag too much, output values become missing.
    You're consuming a lot of degrees of freedom, especially in a [P]VAR analysis.

    If I choose some arbitrary maxlag, the results give me different optimal lag length.
    Essentially what's happening is overfitting.

    Based on these situations, how should I choose the optimal lag length?
    Personally, I would look at maxlag of 1-4.
    Last edited by Justin Niakamal; 03 Nov 2020, 14:20.

    Comment

    Working...
    X