Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Zero inflated negative binomial regression vs negative binomial regression

    Hello everyone,

    I am running an analysis to see if serum cholesterol together with sex, DBP, and Age are associated with number of heart attacks. I started off by checking for the assumption of (mean=variance) and noticed that its better to work with negative binomial regression due to overdispersion.

    I went on to fit the model and zero inflated negative binomial regression turned out to be the best model I could work with.

    Compared the -2LL and noticed the full model was better than the reduced model. However am stuck on how to proceed to the next step of dropping all insignificant interaction terms, which p-values to consider in the model and how i can fit my model.

    Kindly help me out.

    Thanks

  • #2
    Actually, first consider the meaning of a ZINB model versus a count model. If you fit a count model, you are saying that each person (I assume this is at the person level, since serum cholesterol is an individual characteristic) has some mean number of heart attacks. Anyway, age, DBP, sex, and cholesterol influence that mean. Because one heart attack is already very very bad, I hope that all the means are very very low. For each person, their count of heart attacks can stochastically be zero, it's just that each group of people like them are going to share the same mean number of heart attacks.

    If you fit a zero-inflated version of that model, you're saying that some people (those in the structural zero class) are not vulnerable to heart attacks, i.e. their count will alwys be 0. That doesn't sit well with me just on substantive grounds, as it should not be possible to be completely immune from heart attacks. I am not 100% clear if you tested the ZINB model versus the negative binomial one - just compare the BIC.

    Also, heart attacks are rare events in most populations, and I am assuming that multiple heart attacks are even more rare. I am assuming that you are observing people, and that they some of them did in fact have multiple heart attacks. If they didn't, then a count model isn't really the correct model.

    That side note aside, a lot of people don't like backwards selection these days. If you thought some independent variable was important enough to include it in the model, you could just leave it in the final results even if p > 0.05. I think that would be perfectly well accepted these days, at least for main effects. For interaction terms, if you don't have a strong theoretical reason to test an interaction term, then I think it's usually well-accepted to leave non-significant interactions alone.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Dear Weiwen Ng,
      I actually first fitted the model to see which of the models is best to work with. Below is my output; The p-value was statistically significant thus indicating that we shall reject the null hypothesis of no overdispersion and go with the NBRM.


      nbreg Num_of_heartAttack Sex BMI DBP SerumCholestrol Age, dispersion(mean) exposure(PersonTime)

      Fitting Poisson model:

      Iteration 0: log likelihood = -1518.3149
      Iteration 1: log likelihood = -1517.6278
      Iteration 2: log likelihood = -1517.6275
      Iteration 3: log likelihood = -1517.6275

      Fitting constant-only model:

      Iteration 0: log likelihood = -1801.9542
      Iteration 1: log likelihood = -1783.4495
      Iteration 2: log likelihood = -1782.9728
      Iteration 3: log likelihood = -1782.9725
      Iteration 4: log likelihood = -1782.9725

      Fitting full model:

      Iteration 0: log likelihood = -1582.9775
      Iteration 1: log likelihood = -1526.8682
      Iteration 2: log likelihood = -1516.2501
      Iteration 3: log likelihood = -1514.9559
      Iteration 4: log likelihood = -1514.8987
      Iteration 5: log likelihood = -1514.8986

      Negative binomial regression Number of obs = 1,280
      LR chi2(5) = 536.15
      Dispersion = mean Prob > chi2 = 0.0000
      Log likelihood = -1514.8986 Pseudo R2 = 0.1504

      ------------------------------------------------------------------------------------
      Num_of_heartAttack | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      -------------------+----------------------------------------------------------------
      Sex | -.672887 .0562555 -11.96 0.000 -.7831458 -.5626283
      BMI | .0560633 .012467 4.50 0.000 .0316284 .0804983
      DBP | .0202621 .0041456 4.89 0.000 .0121368 .0283874
      SerumCholestrol | .0099298 .0012318 8.06 0.000 .0075156 .0123441
      Age | .0535346 .0027491 19.47 0.000 .0481464 .0589228
      _cons | -12.1102 .5039422 -24.03 0.000 -13.09791 -11.1225
      ln(PersonTime) | 1 (exposure)
      -------------------+----------------------------------------------------------------
      /lnalpha | -2.793931 .4819695 -3.738573 -1.849288
      -------------------+----------------------------------------------------------------
      alpha | .0611803 .029487 .023788 .1573492
      ------------------------------------------------------------------------------------
      Likelihood-ratio test of alpha=0: chibar2(01) = 5.46 Prob>=chibar2 = 0.010


      After the above output, i went ahead to fit the model. I have only shared bit of this part because its really long




      -------------------------------+------------------------------------------------
      Statistics |
      alpha | 0.654
      N | 1280 1280 1280 1280
      ll | -1937.727 -1824.369 -1809.922 -1760.423
      bic | 3918.382 3698.821 3705.699 3613.857
      aic | 3887.454 3662.738 3643.843 3546.847
      --------------------------------------------------------------------------------

      legend: b/t

      Tests and Fit Statistics

      PRM BIC= -5239.526 AIC= 3.037 Prefer Over Evidence
      -------------------------------------------------------------------------
      vs NBRM BIC= -5459.087 dif= 219.561 NBRM PRM Very strong
      AIC= 2.862 dif= 0.176 NBRM PRM
      LRX2= 226.716 prob= 0.000 NBRM PRM p=0.000
      -------------------------------------------------------------------------
      vs ZIP BIC= -5452.209 dif= 212.683 ZIP PRM Very strong
      AIC= 2.847 dif= 0.190 ZIP PRM
      Vuong= 6.818 prob= 0.000 ZIP PRM p=0.000
      -------------------------------------------------------------------------
      vs ZINB BIC= -5544.051 dif= 304.525 ZINB PRM Very strong
      AIC= 2.771 dif= 0.266 ZINB PRM
      -------------------------------------------------------------------------
      NBRM BIC= -5459.087 AIC= 2.862 Prefer Over Evidence
      -------------------------------------------------------------------------
      vs ZIP BIC= -5452.209 dif= -6.878 NBRM ZIP Strong
      AIC= 2.847 dif= 0.015 ZIP NBRM
      -------------------------------------------------------------------------
      vs ZINB BIC= -5544.051 dif= 84.964 ZINB NBRM Very strong
      AIC= 2.771 dif= 0.091 ZINB NBRM
      Vuong= 5.493 prob= 0.000 ZINB NBRM p=0.000
      -------------------------------------------------------------------------
      ZIP BIC= -5452.209 AIC= 2.847 Prefer Over Evidence
      -------------------------------------------------------------------------
      vs ZINB BIC= -5544.051 dif= 91.842 ZINB ZIP Very strong
      AIC= 2.771 dif= 0.076 ZINB ZIP
      LRX2= 98.996 prob= 0.000 ZINB ZIP p=0.000
      -------------------------------------------------------------------------




      Comment

      Working...
      X