Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Narrow confidence intervals for prediction: xtpoisson

    Hi,

    I'm trying to find a good model for predicting an outcome with non-negative integer values and autocorrelated.


    Code:
    xtset id time
    
    xtpoisson outcome  L(0/4).predictor , pa corr(ar 2)  
    
    predictnl yhat = predict(), ci(lb ub)
    My problem is that I get a very narrow confidence interval, almost indistinguishable from the prediction line. I expected to have more actual data within the confidence intervals of prediction. Does anyone know I'm getting a "right" confidence interval or not?

  • #2
    This is what my outcome, prediction, and confidence intervals look like:






    Attached Files
    Last edited by Ali Niazi; 23 Mar 2022, 22:13.

    Comment


    • #3
      I expected to have more actual data within the confidence intervals of prediction. Does anyone know I'm getting a "right" confidence interval or not?
      Your expectation is based on a misunderstanding.

      -predictnl- (or -predict-) calculates an expected value for the outcome variable conditional on the predictor(s) in each observation. The 95% CI reflects the reproducibility of that prediction on repeated sampling. If you drew a new data sample many times, about 95% of the "Prediction" lines would surround the true population mean conditional on the predictors. These confidence limits are strongly sensitive to sample size. If you make the sample size large, the lower and upper bounds will converge towards each other (and towards the prediction itself), forming a very narrow range, as you show in your graph. But the dispersion of the actual outcomes is not affected by the sample size--it may fluctuate a bit between larger and smaller samples, but only randomly around the actual population dispersion.

      In short: the closeness of the 95% LB and UB lines is an artifact of large sample size (or, less commonly, a regression model that is extremely accurate at predicting the mean value of the outcome, but not the individual outcomes themselves). The dispersion of the outcomes is more or less constant regardless of sample size and can be arbitrary large or small compared to the width of the confidence intervals.

      Here's a simple toy model you can play with:

      Code:
      clear
      set obs 10
      set seed 1234
      gen x = runiform()
      gen y = 10*x + 5 + rnormal(0, 1)
      
      regress y x
      
      predictnl yhat = predict(), ci(lb ub)
      
      graph twoway scatter y x || line yhat lb ub x, sort
      Run that. Then change the sample size to 100, 1000, and 10,000. You will see that the upper and lower bound lines quickly converge to each other, and as the sample size goes, less and less of the actual data is within the confidence bounds. There is no reason it should be.
      Last edited by Clyde Schechter; 23 Mar 2022, 22:58.

      Comment


      • #4
        Dear Clyde,
        Thanks for your insightful response.

        Comment

        Working...
        X