Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Prediction intervals for out-of-sample predictions using a model with bootstrapped standard errors

    I have a linear model with bootstrapped standard errors for the parameters. I then used a separate data set to generate out-of-sample forecasts using this model. I would like to make prediction intervals for these forecast values, but when I try to generate the standard errors of the forecasts using the command, "predict PIse,stdf" I get an error that says, "option stdf not allowed after bootstrap estimation r(198);" However, it does allow me to generate the standard error of the prediction using the command, "predict, CIse, stdp" which can be used to make confidence intervals.

    Why am I able to estimate the standard errors of the predictions after bootstrap estimation but not able to estimate the standard errors of the forecasts?

    Is there a mathematical reason why standard errors of the forecasts cannot be calculated after bootstrap estimation, or is it a software limitation?
    Last edited by Dominique Pride; 08 Dec 2014, 17:19.

  • #2
    Welcome to Statalist, Dominique!

    Bottom line: To compute a standard error for the forecast (a new observation \(y^*)\), you need an estimate of the variance for the error or deviation term. OLS assumes that that variance is either a constant \(\sigma\) or a known function of a constant. The bootstrap makes no assumption about the variance of the error terms. As a consequence it provides no information for estimating the out-of-sample variance.

    Longer version:

    If you do a regression, the prediction for a new individual * with covariates \(x\) is the estimated mean \(\widehat{\mu}(x)\).

    However the actual value for a new individual (*) with covariate values $x$, will be the true mean + a deviation or error term:

    \[
    y^* = \mu(x) + e^*
    \]

    Here \(\mu(x)\) is the mean of \(y\) evaluated at the covariates x, and \(e^*\) is the new random part of \(y^*\). The only assumption about \(e^*\) is that it has expectation zero.

    We don't know \(\mu(x)\), so we have to substitute the estimated mean from the regression:

    \[
    y^*(x) =
    \widehat{\mu}(x) + e^*
    \]

    The variance of the new observation is a sum of the variances of the two parts on the left-hand side.

    \[
    \text{var}(y^*(x)) = \text{var}(\widehat{\mu}(x)) + \text{var}(e^*)
    \]

    Both OLS and the bootstrap estimate the first term. But OLS estimates the second only under the assumption of constant error variance. Bootstrapping the regression coefficients alone provides no information about the error variance.


    The modern approach to getting at what is called "prediction error" is cross-validation or a bootstrap analysis, e.g. Efron and Tibshirani, 1997.


    Reference
    Efron, Bradley, and Robert Tibshirani. 1997. Improvements on cross-validation: the 632+ bootstrap method. Journal of the American Statistical Association 92, no. 438: 548-560.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      -
      Last edited by Giuseppe Cascarino; 21 Mar 2017, 10:47.

      Comment

      Working...
      X