Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting time series data into training and testing set

    Hello,

    I am working on a recession probability model using a LASSO probit framework. My goal is to predict recessions out of sample. I want to split my data into a training sample pre 2000 and a testing sample from 2000 until October 2019, currently the ending of my data set. I am having trouble figuring out how to split my data into a training and testing sample. Could anyone provide me some guidance on how to do this? Currently I've gotten my code to run while randomly splitting the sample, but that's not exactly what I want. My data is all monthly. I have posted my code below for consideration.

    Code:
    *format my date variable at set it as my time series
    format %tmMonth_CCYY date2
    tsset date2, monthly
    
    *splitsample
    vl set, categorical(4) uncertain(0)
    vl list vlcategorical
    splitsample, generate(sample) nsplit(2) rseed(1234)
    tabulate sample
    
    *Here, I use LASSO to select a model based off 8 variables
    lasso probit nber_rec spread6monthlag ff6monthlag cape snp6monthlag pmi awhman6monthlag smb6monthslag hml6monthlag if sample == 1
    
    cvplot
    estimates store cv
    
    *display a table of information about each of the models that were fit
    lassoknots, display(nonzero bic)
    
    *Select the model with the chosen lambda
    lassoselect id = 45
    cvplot
    estimates store firstmodel
    
    *View a table of the variables selected
    lassocoef cv firstmodel, sort(coef, standardized)
    
    *Assess the goodness of fit
    lassogof firstmodel, over(sample) postselection
    
    *create recession probabilities based off LASSO model
    predict rec_prob
    summarize rec_prob
    
    twoway (tsline rec_prob nber_rec)
    My problem I believe is in the split sample stage. Any help would be greatly appreciated.
    Last edited by Nicholas Garcia; 04 Apr 2020, 11:10.

  • #2
    The splitsample command splits the data into random samples, which as you've noticed isn't appropriate.

    I want to split my data into a training sample pre 2000 and a testing sample from 2000 until October 2019, currently the ending of my data set.
    You can create this yourself with :
    Code:
    gen sample = (date2  >= tm(2000-1))
    and then use it with your if qualifiers.
    Last edited by Justin Niakamal; 04 Apr 2020, 19:55.

    Comment


    • #3
      Thank you so much! It's always easier than you think it is!

      Comment

      Working...
      X