Zero inflated negative binomial regression vs negative binomial regression

Shakira Babirye Kasozi

Join Date: Nov 2021

Posts: 2
#1

Zero inflated negative binomial regression vs negative binomial regression

18 Nov 2021, 14:07

Hello everyone,

I am running an analysis to see if serum cholesterol together with sex, DBP, and Age are associated with number of heart attacks. I started off by checking for the assumption of (mean=variance) and noticed that its better to work with negative binomial regression due to overdispersion.

I went on to fit the model and zero inflated negative binomial regression turned out to be the best model I could work with.

Compared the -2LL and noticed the full model was better than the reduced model. However am stuck on how to proceed to the next step of dropping all insignificant interaction terms, which p-values to consider in the model and how i can fit my model.

Kindly help me out.

Thanks
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

18 Nov 2021, 17:23

Actually, first consider the meaning of a ZINB model versus a count model. If you fit a count model, you are saying that each person (I assume this is at the person level, since serum cholesterol is an individual characteristic) has some mean number of heart attacks. Anyway, age, DBP, sex, and cholesterol influence that mean. Because one heart attack is already very very bad, I hope that all the means are very very low. For each person, their count of heart attacks can stochastically be zero, it's just that each group of people like them are going to share the same mean number of heart attacks.

If you fit a zero-inflated version of that model, you're saying that some people (those in the structural zero class) are not vulnerable to heart attacks, i.e. their count will alwys be 0. That doesn't sit well with me just on substantive grounds, as it should not be possible to be completely immune from heart attacks. I am not 100% clear if you tested the ZINB model versus the negative binomial one - just compare the BIC.

Also, heart attacks are rare events in most populations, and I am assuming that multiple heart attacks are even more rare. I am assuming that you are observing people, and that they some of them did in fact have multiple heart attacks. If they didn't, then a count model isn't really the correct model.

That side note aside, a lot of people don't like backwards selection these days. If you thought some independent variable was important enough to include it in the model, you could just leave it in the final results even if p > 0.05. I think that would be perfectly well accepted these days, at least for main effects. For interaction terms, if you don't have a strong theoretical reason to test an interaction term, then I think it's usually well-accepted to leave non-significant interactions alone.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
3 likes
Comment
Shakira Babirye Kasozi

Join Date: Nov 2021

Posts: 2
#3

19 Nov 2021, 01:21

Dear Weiwen Ng,
I actually first fitted the model to see which of the models is best to work with. Below is my output; The p-value was statistically significant thus indicating that we shall reject the null hypothesis of no overdispersion and go with the NBRM.

nbreg Num_of_heartAttack Sex BMI DBP SerumCholestrol Age, dispersion(mean) exposure(PersonTime)

Fitting Poisson model:

Iteration 0: log likelihood = -1518.3149
Iteration 1: log likelihood = -1517.6278
Iteration 2: log likelihood = -1517.6275
Iteration 3: log likelihood = -1517.6275

Fitting constant-only model:

Iteration 0: log likelihood = -1801.9542
Iteration 1: log likelihood = -1783.4495
Iteration 2: log likelihood = -1782.9728
Iteration 3: log likelihood = -1782.9725
Iteration 4: log likelihood = -1782.9725

Fitting full model:

Iteration 0: log likelihood = -1582.9775
Iteration 1: log likelihood = -1526.8682
Iteration 2: log likelihood = -1516.2501
Iteration 3: log likelihood = -1514.9559
Iteration 4: log likelihood = -1514.8987
Iteration 5: log likelihood = -1514.8986

Negative binomial regression Number of obs = 1,280
LR chi2(5) = 536.15
Dispersion = mean Prob > chi2 = 0.0000
Log likelihood = -1514.8986 Pseudo R2 = 0.1504

------------------------------------------------------------------------------------
Num_of_heartAttack | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
Sex | -.672887 .0562555 -11.96 0.000 -.7831458 -.5626283
BMI | .0560633 .012467 4.50 0.000 .0316284 .0804983
DBP | .0202621 .0041456 4.89 0.000 .0121368 .0283874
SerumCholestrol | .0099298 .0012318 8.06 0.000 .0075156 .0123441
Age | .0535346 .0027491 19.47 0.000 .0481464 .0589228
_cons | -12.1102 .5039422 -24.03 0.000 -13.09791 -11.1225
ln(PersonTime) | 1 (exposure)
-------------------+----------------------------------------------------------------
/lnalpha | -2.793931 .4819695 -3.738573 -1.849288
-------------------+----------------------------------------------------------------
alpha | .0611803 .029487 .023788 .1573492
------------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 5.46 Prob>=chibar2 = 0.010

After the above output, i went ahead to fit the model. I have only shared bit of this part because its really long

-------------------------------+------------------------------------------------
Statistics |
alpha | 0.654
N | 1280 1280 1280 1280
ll | -1937.727 -1824.369 -1809.922 -1760.423
bic | 3918.382 3698.821 3705.699 3613.857
aic | 3887.454 3662.738 3643.843 3546.847
--------------------------------------------------------------------------------

legend: b/t

Tests and Fit Statistics

PRM BIC= -5239.526 AIC= 3.037 Prefer Over Evidence
-------------------------------------------------------------------------
vs NBRM BIC= -5459.087 dif= 219.561 NBRM PRM Very strong
AIC= 2.862 dif= 0.176 NBRM PRM
LRX2= 226.716 prob= 0.000 NBRM PRM p=0.000
-------------------------------------------------------------------------
vs ZIP BIC= -5452.209 dif= 212.683 ZIP PRM Very strong
AIC= 2.847 dif= 0.190 ZIP PRM
Vuong= 6.818 prob= 0.000 ZIP PRM p=0.000
-------------------------------------------------------------------------
vs ZINB BIC= -5544.051 dif= 304.525 ZINB PRM Very strong
AIC= 2.771 dif= 0.266 ZINB PRM
-------------------------------------------------------------------------
NBRM BIC= -5459.087 AIC= 2.862 Prefer Over Evidence
-------------------------------------------------------------------------
vs ZIP BIC= -5452.209 dif= -6.878 NBRM ZIP Strong
AIC= 2.847 dif= 0.015 ZIP NBRM
-------------------------------------------------------------------------
vs ZINB BIC= -5544.051 dif= 84.964 ZINB NBRM Very strong
AIC= 2.771 dif= 0.091 ZINB NBRM
Vuong= 5.493 prob= 0.000 ZINB NBRM p=0.000
-------------------------------------------------------------------------
ZIP BIC= -5452.209 AIC= 2.847 Prefer Over Evidence
-------------------------------------------------------------------------
vs ZINB BIC= -5544.051 dif= 91.842 ZINB ZIP Very strong
AIC= 2.771 dif= 0.076 ZINB ZIP
LRX2= 98.996 prob= 0.000 ZINB ZIP p=0.000
-------------------------------------------------------------------------
Comment

Announcement

Zero inflated negative binomial regression vs negative binomial regression

Comment

Comment