Splitting time series data into training and testing set

Nicholas Garcia

Join Date: Mar 2020
Posts: 3

Splitting time series data into training and testing set

04 Apr 2020, 11:04

Hello,

I am working on a recession probability model using a LASSO probit framework. My goal is to predict recessions out of sample. I want to split my data into a training sample pre 2000 and a testing sample from 2000 until October 2019, currently the ending of my data set. I am having trouble figuring out how to split my data into a training and testing sample. Could anyone provide me some guidance on how to do this? Currently I've gotten my code to run while randomly splitting the sample, but that's not exactly what I want. My data is all monthly. I have posted my code below for consideration.

Code:

*format my date variable at set it as my time series
format %tmMonth_CCYY date2
tsset date2, monthly

*splitsample
vl set, categorical(4) uncertain(0)
vl list vlcategorical
splitsample, generate(sample) nsplit(2) rseed(1234)
tabulate sample

*Here, I use LASSO to select a model based off 8 variables
lasso probit nber_rec spread6monthlag ff6monthlag cape snp6monthlag pmi awhman6monthlag smb6monthslag hml6monthlag if sample == 1

cvplot
estimates store cv

*display a table of information about each of the models that were fit
lassoknots, display(nonzero bic)

*Select the model with the chosen lambda
lassoselect id = 45
cvplot
estimates store firstmodel

*View a table of the variables selected
lassocoef cv firstmodel, sort(coef, standardized)

*Assess the goodness of fit
lassogof firstmodel, over(sample) postselection

*create recession probabilities based off LASSO model
predict rec_prob
summarize rec_prob

twoway (tsline rec_prob nber_rec)

My problem I believe is in the split sample stage. Any help would be greatly appreciated.

Last edited by Nicholas Garcia; 04 Apr 2020, 11:10.

Tags: None

Justin Niakamal

Join Date: Aug 2017

Posts: 760
#2

04 Apr 2020, 19:51

The splitsample command splits the data into random samples, which as you've noticed isn't appropriate.

I want to split my data into a training sample pre 2000 and a testing sample from 2000 until October 2019, currently the ending of my data set.

You can create this yourself with :

Code:

gen sample = (date2 >= tm(2000-1))

and then use it with your if qualifiers.

Last edited by Justin Niakamal; 04 Apr 2020, 19:55.
Comment
Nicholas Garcia

Join Date: Mar 2020

Posts: 3
#3

05 Apr 2020, 17:58

Thank you so much! It's always easier than you think it is!
Comment

Announcement

Splitting time series data into training and testing set

Comment

Comment