Narrow confidence intervals for prediction: xtpoisson

Ali Niazi

Join Date: Feb 2018

Posts: 28
#1

Narrow confidence intervals for prediction: xtpoisson

23 Mar 2022, 20:59

Hi,

I'm trying to find a good model for predicting an outcome with non-negative integer values and autocorrelated.

Code:

xtset id time xtpoisson outcome L(0/4).predictor , pa corr(ar 2) predictnl yhat = predict(), ci(lb ub)

My problem is that I get a very narrow confidence interval, almost indistinguishable from the prediction line. I expected to have more actual data within the confidence intervals of prediction. Does anyone know I'm getting a "right" confidence interval or not?
Tags: None
Ali Niazi

Join Date: Feb 2018

Posts: 28
#2

23 Mar 2022, 21:10

This is what my outcome, prediction, and confidence intervals look like:

Attached Files

Last edited by Ali Niazi; 23 Mar 2022, 21:13.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#3

23 Mar 2022, 21:56

I expected to have more actual data within the confidence intervals of prediction. Does anyone know I'm getting a "right" confidence interval or not?

Your expectation is based on a misunderstanding.

-predictnl- (or -predict-) calculates an expected value for the outcome variable conditional on the predictor(s) in each observation. The 95% CI reflects the reproducibility of that prediction on repeated sampling. If you drew a new data sample many times, about 95% of the "Prediction" lines would surround the true population mean conditional on the predictors. These confidence limits are strongly sensitive to sample size. If you make the sample size large, the lower and upper bounds will converge towards each other (and towards the prediction itself), forming a very narrow range, as you show in your graph. But the dispersion of the actual outcomes is not affected by the sample size--it may fluctuate a bit between larger and smaller samples, but only randomly around the actual population dispersion.

In short: the closeness of the 95% LB and UB lines is an artifact of large sample size (or, less commonly, a regression model that is extremely accurate at predicting the mean value of the outcome, but not the individual outcomes themselves). The dispersion of the outcomes is more or less constant regardless of sample size and can be arbitrary large or small compared to the width of the confidence intervals.

Here's a simple toy model you can play with:

Code:

clear set obs 10 set seed 1234 gen x = runiform() gen y = 10*x + 5 + rnormal(0, 1) regress y x predictnl yhat = predict(), ci(lb ub) graph twoway scatter y x || line yhat lb ub x, sort

Run that. Then change the sample size to 100, 1000, and 10,000. You will see that the upper and lower bound lines quickly converge to each other, and as the sample size goes, less and less of the actual data is within the confidence bounds. There is no reason it should be.

Last edited by Clyde Schechter; 23 Mar 2022, 21:58.
Comment
Ali Niazi

Join Date: Feb 2018

Posts: 28
#4

23 Mar 2022, 23:15

Dear Clyde,
Thanks for your insightful response.
Comment

Announcement

Narrow confidence intervals for prediction: xtpoisson

Comment

Comment

Comment