Stata Out of Sample Forecasting

Adriaan Mocke

Join Date: Oct 2016

Posts: 5
#1

Stata Out of Sample Forecasting

21 Oct 2016, 19:04

I am not gettting out of sample forecasting for predict command:
My steps 1) set date format 2) tsappend, add(12) then 3) do the regression 4) then predict say yhat then the forecast just stay in sample and not to the extended 12 future dates

Am I doing something wrong? Any help will be greatly appreciated.

Regards

Adriaan
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

21 Oct 2016, 22:00

For most estimations, and most post-estimation statistics, -predict- will do out-of-sample calculations by default. Emphasis on most in both places. There are some estimation commands that do not support out-of-sample prediction and there are some statistics that cannot be calculated out-of-sample. You need to look at the specifics. Since you tell us neither what kind of regression you ran nor what specific statistic you are trying to -predict-, there isn't anything more that can be said.

There is one other thing to consider: -predict- will only calculate predictions for observations that have non-missing values for all of the variables in the regression model. So if the observations for your 12 future dates are missing some key data, then you would not get out-of-sample predictions for them, even if the particular regression and statistic are, in general, amenable to out-of-sample prediction.
Comment
Adriaan Mocke

Join Date: Oct 2016

Posts: 5
#3

21 Oct 2016, 22:38

Hi Clyde - thank you for your answer. I am trying to predict the Australian Case rate by using monthly data series. My regressions function look like this:
regress rbacashrate l6.housecredit privatesectorcredit unemploymentrate yeargovie m3 indprod loansandadvances uanabc l4.aubusc - (the last two variables australian busines confidence and conditions) This command will be folowed by predict rbacashrate-predict. On household credit and businessconditions I applied 6 and 4 months laggs respectively. My monthly data series starts 30/04/1997 up to 31/05/2016. If I run the predict command it only generates values up to 29/02/2016. Don't know if it has somethiing to do with the lags. If I run the function regress rbacashrate l11.uanabc l11.aubusc (lags of 11 months for both business confidence and conditions) it actuaully does out of sample forecasts for 11 months. So maybe the lagg structures. Thank you once again!
Comment
Adriaan Mocke

Join Date: Oct 2016

Posts: 5
#4

10 Nov 2016, 22:09

I think I figured it out. I investigated the data series again as Clyde suggested and discovered one of the data series had three data points missing which STATA didn't like. I rerun the out of sample forecasting with the predict command and Stata generated forecast values up to the lowest lag. So for example for the regression function : regress rbacashrate l6.housecredit privatesectorcredit unemploymentrate yeargovie m3 indprod loansandadvances aunabc l4.aubusc it will only generate forecast values for four monhts. If for example all the regressors are lagged with 12 periods the forecast period will extend to 12months if the time series was extended with 12 periods - tsappend, add(12)
Comment
Frank Taumann

Join Date: Feb 2017

Posts: 40
#5

02 Oct 2019, 09:36

So how do I know what Stata did? I'm getting predictions for all observations, that doesn't look like an out-of-sample forecast. Predict is silent about this and I'm using an .ado file that does not have any documentation yet.

Did your version to force Stata to predict out of sample work in the end, Adriaan?
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#6

02 Oct 2019, 09:42

Originally posted by Clyde Schechter View Post

For most estimations, and most post-estimation statistics, -predict- will do out-of-sample calculations by default. Emphasis on most in both places. There are some estimation commands that do not support out-of-sample prediction and there are some statistics that cannot be calculated out-of-sample. You need to look at the specifics. Since you tell us neither what kind of regression you ran nor what specific statistic you are trying to -predict-, there isn't anything more that can be said.

There is one other thing to consider: -predict- will only calculate predictions for observations that have non-missing values for all of the variables in the regression model. So if the observations for your 12 future dates are missing some key data, then you would not get out-of-sample predictions for them, even if the particular regression and statistic are, in general, amenable to out-of-sample prediction.

Originally posted by Peter Pan

so how do I know if Statas predict has performed an out-of-sample prediction and when it performs an in-sample prediction?

Allow me to elaborate on Clyde's answer a little more concretely. An estimation command should support something called -e(sample)- which is a function that identifies the specific observations that contributed to the estimation procedure. For example, in a simple regression, any observation with a missing value will not contribute any information. As a consequence, it is possible that the number of useful observations are fewer than the total observations in the dataset.

Consider the following toy example, which is intended to show the estimation of a model in a subset of data, how to to identify those observations that were used in said estimation, and then how to use -predict- for out-of-sample prediction.

Code:

clear * cls input byte (y x train) 6 9 1 1 3 1 8 2 1 4 . 1 5 5 0 end * perform some regression in a sample qui reg y x if train * mark out which observations in the sample contributed to the estimation gen byte used_in_train = e(sample) * show the datasat and flags for inclusion in the training sample & those used in estimation. list, abbrev(16) +------------------------------------------+ | y x train used_in_train pred_y | |------------------------------------------| 1. | 6 9 1 1 5.453488 | 2. | 1 3 1 1 4.825582 | 3. | 8 2 1 1 4.72093 | 4. | 4 . 1 0 . | 5. | 5 5 0 0 5.034883 | +------------------------------------------+ * Predict will use all in- and out-of-sample observations by default. predict pred_y, xb list, abbrev(16) +------------------------------------------+ | y x train used_in_train pred_y | |------------------------------------------| 1. | 6 9 1 1 5.453488 | 2. | 1 3 1 1 4.825582 | 3. | 8 2 1 1 4.72093 | 4. | 4 . 1 0 . | 5. | 5 5 0 0 5.034883 | +------------------------------------------+
Comment
Frank Taumann

Join Date: Feb 2017

Posts: 40
#7

03 Oct 2019, 04:00

Fantastic, thanks. This is also a great minimal example for out of sample prediction in Stata (googling that led me to this page). Just a little typo: I think pred_y shouldn't exist in your first table.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#8

03 Oct 2019, 06:09

Yes you're right, I didn't intend to paste the first table in the output. I'm happy you found it helpful.
Comment
matt Mosh

Join Date: Sep 2019

Posts: 7
#9

04 Oct 2019, 13:50

If anyone who helped here is still subscribed I would like some help with generating the CI about the prediction: https://www.statalist.org/forums/for...iction-with-ci
Comment
matt Mosh

Join Date: Sep 2019

Posts: 7
#10

04 Oct 2019, 13:51

Please help me over at a post about similar topic
If anyone who helped here is still subscribed I would like some help with generating the CI about the prediction: https://www.statalist.org/forums/for...iction-with-ci
Comment

Announcement

Stata Out of Sample Forecasting

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment