Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata Out of Sample Forecasting

    I am not gettting out of sample forecasting for predict command:
    My steps 1) set date format 2) tsappend, add(12) then 3) do the regression 4) then predict say yhat then the forecast just stay in sample and not to the extended 12 future dates

    Am I doing something wrong? Any help will be greatly appreciated.

    Regards

    Adriaan

  • #2
    For most estimations, and most post-estimation statistics, -predict- will do out-of-sample calculations by default. Emphasis on most in both places. There are some estimation commands that do not support out-of-sample prediction and there are some statistics that cannot be calculated out-of-sample. You need to look at the specifics. Since you tell us neither what kind of regression you ran nor what specific statistic you are trying to -predict-, there isn't anything more that can be said.

    There is one other thing to consider: -predict- will only calculate predictions for observations that have non-missing values for all of the variables in the regression model. So if the observations for your 12 future dates are missing some key data, then you would not get out-of-sample predictions for them, even if the particular regression and statistic are, in general, amenable to out-of-sample prediction.

    Comment


    • #3
      Hi Clyde - thank you for your answer. I am trying to predict the Australian Case rate by using monthly data series. My regressions function look like this:
      regress rbacashrate l6.housecredit privatesectorcredit unemploymentrate yeargovie m3 indprod loansandadvances uanabc l4.aubusc - (the last two variables australian busines confidence and conditions) This command will be folowed by predict rbacashrate-predict. On household credit and businessconditions I applied 6 and 4 months laggs respectively. My monthly data series starts 30/04/1997 up to 31/05/2016. If I run the predict command it only generates values up to 29/02/2016. Don't know if it has somethiing to do with the lags. If I run the function regress rbacashrate l11.uanabc l11.aubusc (lags of 11 months for both business confidence and conditions) it actuaully does out of sample forecasts for 11 months. So maybe the lagg structures. Thank you once again!

      Comment


      • #4
        I think I figured it out. I investigated the data series again as Clyde suggested and discovered one of the data series had three data points missing which STATA didn't like. I rerun the out of sample forecasting with the predict command and Stata generated forecast values up to the lowest lag. So for example for the regression function : regress rbacashrate l6.housecredit privatesectorcredit unemploymentrate yeargovie m3 indprod loansandadvances aunabc l4.aubusc it will only generate forecast values for four monhts. If for example all the regressors are lagged with 12 periods the forecast period will extend to 12months if the time series was extended with 12 periods - tsappend, add(12)

        Comment


        • #5
          So how do I know what Stata did? I'm getting predictions for all observations, that doesn't look like an out-of-sample forecast. Predict is silent about this and I'm using an .ado file that does not have any documentation yet.

          Did your version to force Stata to predict out of sample work in the end, Adriaan?

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            For most estimations, and most post-estimation statistics, -predict- will do out-of-sample calculations by default. Emphasis on most in both places. There are some estimation commands that do not support out-of-sample prediction and there are some statistics that cannot be calculated out-of-sample. You need to look at the specifics. Since you tell us neither what kind of regression you ran nor what specific statistic you are trying to -predict-, there isn't anything more that can be said.

            There is one other thing to consider: -predict- will only calculate predictions for observations that have non-missing values for all of the variables in the regression model. So if the observations for your 12 future dates are missing some key data, then you would not get out-of-sample predictions for them, even if the particular regression and statistic are, in general, amenable to out-of-sample prediction.
            Originally posted by Peter Pan
            so how do I know if Statas predict has performed an out-of-sample prediction and when it performs an in-sample prediction?
            Allow me to elaborate on Clyde's answer a little more concretely. An estimation command should support something called -e(sample)- which is a function that identifies the specific observations that contributed to the estimation procedure. For example, in a simple regression, any observation with a missing value will not contribute any information. As a consequence, it is possible that the number of useful observations are fewer than the total observations in the dataset.

            Consider the following toy example, which is intended to show the estimation of a model in a subset of data, how to to identify those observations that were used in said estimation, and then how to use -predict- for out-of-sample prediction.

            Code:
            clear *
            cls
            
            input byte (y x train)
            6 9 1
            1 3 1
            8 2 1
            4 . 1
            5 5 0
            end
            
            * perform some regression in a sample
            qui reg y x if train
            * mark out which observations in the sample contributed to the estimation
            gen byte used_in_train = e(sample)
            * show the datasat and flags for inclusion in the training sample & those used in estimation.
            list, abbrev(16)
            
            
                 +------------------------------------------+
                 | y   x   train   used_in_train     pred_y |
                 |------------------------------------------|
              1. | 6   9       1               1   5.453488 |
              2. | 1   3       1               1   4.825582 |
              3. | 8   2       1               1    4.72093 |
              4. | 4   .       1               0          . |
              5. | 5   5       0               0   5.034883 |
                 +------------------------------------------+
            
            
            * Predict will use all in- and out-of-sample observations by default.
            predict pred_y, xb
            list, abbrev(16)
            
            
                 +------------------------------------------+
                 | y   x   train   used_in_train     pred_y |
                 |------------------------------------------|
              1. | 6   9       1               1   5.453488 |
              2. | 1   3       1               1   4.825582 |
              3. | 8   2       1               1    4.72093 |
              4. | 4   .       1               0          . |
              5. | 5   5       0               0   5.034883 |
                 +------------------------------------------+

            Comment


            • #7
              Fantastic, thanks. This is also a great minimal example for out of sample prediction in Stata (googling that led me to this page). Just a little typo: I think pred_y shouldn't exist in your first table.

              Comment


              • #8
                Yes you're right, I didn't intend to paste the first table in the output. I'm happy you found it helpful.

                Comment


                • #9
                  If anyone who helped here is still subscribed I would like some help with generating the CI about the prediction: https://www.statalist.org/forums/for...iction-with-ci

                  Comment


                  • #10
                    Please help me over at a post about similar topic
                    If anyone who helped here is still subscribed I would like some help with generating the CI about the prediction: https://www.statalist.org/forums/for...iction-with-ci

                    Comment

                    Working...
                    X