Zero-inflated Poisson: ratio (95% CI) for a 0,1 variable

Frank Shann

Join Date: Jan 2018
Posts: 15

Zero-inflated Poisson: ratio (95% CI) for a 0,1 variable

24 Oct 2020, 08:21

Zero-inflated Poisson: ratio (95% CI) for a 0,1 variable

I would greatly appreciate advice about the best way to calculate the overall ratio and its 95% CI (not the difference) for a categorical variable with values 0 or 1 in zero-inflated Poisson regression (zip) in Stata/IC 16.1. For example, in the fishing dataset, calculate the overall ratio and its 95% CI for the number of fish caught for camper=1 / camper=0 given that campers who go fishing catch more fish than non-campers who go fishing (count model) and campers are more likely to go fishing than non-campers (inflate model):

Code:

use https://www.stata-press.com/data/r16/fish
zip count persons child i.camper i.livebait, ///
inflate(persons child i.camper i.livebait) vce(robust) irr nolog
Zero-inflated Poisson regression                Number of obs     =        250
Nonzero obs       =        108                  Zero obs          =        142
Inflation model      = logit                    Wald chi2(4)      =      31.53
Log pseudolikelihood = -712.4109                Prob > chi2       =     0.0000
------------------------------------------------------------------------------
             |               Robust
       count |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
count        |
     persons |   2.307947   .4018602     4.80   0.000     1.640645    3.246663
       child |   .3070827   .1140441    -3.18   0.001     .1482986    .6358779
    1.camper |   1.816776   .6923589     1.57   0.117      .860826    3.834313
  1.livebait |   6.164537   2.812036     3.99   0.000     2.521236    15.07258
       _cons |   .0861211   .0717259    -2.94   0.003      .016834    .4405871
-------------+----------------------------------------------------------------
inflate      |
     persons |  -.9311615   .2235628    -4.17   0.000    -1.369337   -.4929864
       child |   1.958577   .3640051     5.38   0.000      1.24514    2.672014
    1.camper |  -.8716704   .4055473    -2.15   0.032    -1.666529   -.0768123
  1.livebait |   .7407971   1.512794     0.49   0.624    -2.224225    3.705819
       _cons |   .8337162   1.751823     0.48   0.634    -2.599794    4.267227
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Note: _cons estimates baseline incidence rate.
 
. mchange camper, stat(all)
zip: Changes in mu | Number of obs = 250
Expression: Predicted number of count, predict()
       | Change p-value LL    UL    z-value Std Err From  To
-------+-------------------------------------------------------
camper |                                                                                       
1 vs 0 | 2.153  0.022   0.314 3.991 2.295   0.938   1.840 3.993
Average prediction: 3.230
 
. * Ratio camper/non-camper
. di 3.993/1.840 = 2.1701087
. * Lower 95% CI
. di (1.840 + 0.314)/1.840 = 1.1706522
. * Upper 95% CI
. di (1.840 + 3.991)/1.840 = 3.1690217
 
. margins camper, post
Predictive margins, Number of obs = 250, Model VCE : Robust
Expression: Predicted number of events, predict()
------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      camper |
          0  |   1.839983    .518668     3.55   0.000     .8234123    2.856553
          1  |   3.992508   .8026569     4.97   0.000      2.41933    5.565687
------------------------------------------------------------------------------
 
. nlcom (Ratio: _b[1.camper] / _b[0.camper])
  risk_ratio:  _b[1.camper] / _b[0.camper]
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  Ratio      |   2.169862   .7368881     2.94   0.003     .7255876    3.614136
-------------------------------------------------------------------------------

Using mchange (from SPost) I calculate (above) that the camper/non-camper ratio (95% CI) is 2.170 (1.171-3.169), p = 0.022. However, nlcom (above) returns 2.170 (0.726-3.614), p = 0.003.

The Stata Base Reference Manual Release 16 entry for nlcom on page 1721 (Technical note) says in part, ‘The test of H0 : exp(β) = 1 is asymptotically equivalent to a test of H0 : β = 0, the Wald test in the original metric, but the latter has better small-sample properties. Thus if you specify eform, you get a test of H0 : β = 0. nlcom, however, is general. It does not attempt to infer the test of greatest interest for a given transformation, and so a test of H0 : transformed coefficient = 0 is always given, regardless of the transformation.’

1. The Reference Manual entry suggests that the mchange estimate may be better than the nlcom estimate. Is that correct?
2. What is the best way to calculate the 95% CI of the ratio of campers/non-campers?

Tags: None

Frank Shann

Join Date: Jan 2018

Posts: 15
#2

25 Oct 2020, 01:34

I think my calculation (above) using mchange camper is incorrect.

LL of difference = 0.314 = (3.993 - x) - (1.840 + x)
which gives x = 0.9195
So LL of ratio = (3.993 - 0.9195) / (1.840 + 0.9195) = 1.114

UL of difference = 3.991 = (3.993 + y) - (1.840 - y)
which gives y = 0.919
So UL of ratio = (3.993 + 0.919) / (1.840 - 0.919) = 5.333

So the ratio of camper=1 / camper=0 is 2.17 (95% CI, 1.11-5.33).
1. Is that correct?
2. Is there a better way to calculate the 95% CI of the ratio?
Comment

Frank Shann

Join Date: Jan 2018
Posts: 15

16 Nov 2020, 23:40

I have had extensive expert advice from Chris Chan at Stata Technical Services about how to calculate the ratio and its 95% CI for the number of fish caught by campers versus non-campers (1.camper/0.camper) after a zero-inflated regression (zip or zinb) using the fictional fishing dataset:

margins camper, post
nlcom ln(_b[1.camper]/_b[0.camper])
then di exp(ratio), di exp(lowCI), di exp(highCI)

In the fishing dataset, visitors to a park are asked whether or not they have a camper, how many people were in the group, were there children in the group, and how many fish were caught. Some visitors who did fish did not catch any fish. Other visitors did not fish at all, so there are excess zeros in the data; but there is no data on whether a person fished or not.

Zero-inflation models (zip, zinb) report two equations:
Equation 1: linear or count model (expected count if not a degenerate or structural zero)
Equation 2: binary or inflate model (odds of being a degenerate or structural zero).

1. To calculate the ratio and its 95% CI for the number of fish caught by campers versus non-campers (or any other 0,1 variable), we need to use both of the equations in the zero-inflated regression: campers who go fishing catch more fish than non-campers who go fishing (the first, count, or linear equation) and campers are more likely to go fishing than non-campers (the second, inflate, or binary equation). margins camper, post incorporates both equations, using count-probability*(1 - inflate-probability).

2. nlcom can then be used to calculate the ratio and its 95% CI.

3. However, the 95% CI should be calculated in the log scale, then antilogged, as discussed at https://www.stata.com/statalist/arch.../msg00428.html.
(a) So use
nlcom ln(_b[1.camper]/_b[0.camper])
then di exp(ratio), di exp(lowCI), di exp(highCI)
which gives camper/non-camper ratio = 2.17 (95% CI 1.12-4.22);
(b) And do not calculate the 95% CI on the antilogs:
nlcom (risk_ratio: _b[1.camper]/_b[0.camper])
which gives camper/non-camper ratio = 2.17 (95% CI 0.73-3.61).

Here is a do file to calculate the camper/non-camper ratio and its 95% CI in zip (or zinb):

Code:

capture program drop _all
program antilog
    local bound  invnorm(.975)*sqrt(el(r(V),1,1))
    local lparm el(r(b),1,1)
    local ll  exp(`lparm' - `bound')
    local ul  exp(`lparm' + `bound')
    di _col(1) "parm =" %10.4f exp(`lparm')  ///
       _col(20) "ll ="  %10.4f `ll'  ///
       _col(40) "ul ="  %10.4f `ul'
end

use https://www.stata-press.com/data/r16/fish, clear
zip count persons child i.camper i.livebait, ///
    inflate(persons child i.camper i.livebait) vce(robust) nolog
margins camper, post
nlcom ln(_b[1.camper]/_b[0.camper]), post
antilog

Chris Chen sent me the following demonstration of how margins camper works after zip; there are 250 observations with 103 non-campers and 147 campers. For the number of events for non-campers, margins camper replaces camper=0 for all 250 observations then calculates the mean of (prob_event_eq1))*(1 - prob_degenerative_zero_eq2) and, for the number of events for campers, it replaces camper=1 for all 250 observations then calculates the mean of (prob_event_eq1))*(1 - prob_degenerative_zero_eq2)).

Code:

use https://www.stata-press.com/data/r16/fish, clear
zip count persons child i.camper i.livebait, ///
inflate(persons child i.camper i.livebait) vce(robust) nolog
 
generate yhat_eq1 = _b[count:_cons] + _b[count:persons]*persons + ///
_b[count:child]*child + _b[count:1.camper]*camper + _b[count:1.livebait]*livebait
predict yhat_eq1_1, xb
assert yhat_eq1 == yhat_eq1_1
 
gen yhat_eq2 = _b[inflate:_cons] + _b[inflate:persons]*persons + _b[inflate: ///
child]*child + _b[inflate:1.camper]*camper + _b[inflate:1.livebait]*livebait
predict yhat_eq2_2, xb equation(#2)
assert yhat_eq2 == yhat_eq2_2
 
predict p, pr  // probability_degenerative_zero, equation(#2)
gen myp=exp(yhat_eq2)/(1+exp(yhat_eq2))
sum p myp
 
predict n, n  // number of events
gen myn=(1-p)*exp(yhat_eq1)  // p_not_degenerative_zero_eq2 * p_event_eq1
sum n myn
 
margins camper
 
//reproduce the margins estimates
preserve
replace camper = 0
predict yhat0, xb  // linear prediction, equation(#1)
predict p0, pr  // probability_degenerative_zero, equation(#2)
generate camper0_n = (1-p0)*exp(yhat0)  // p_not_degen_zero_eq2 * p_event_eq1
quietly summarize camper0_n
display r(mean)
 
replace camper = 1
predict yhat1, xb  // linear prediction, equation(#1)
predict p1, pr  // probability of a degenerative zero, equation(#2)
generate camper1_n = (1-p1)*exp(yhat1)  // p_not_degen_zero_eq2 * p-event_eq1
quietly summarize camper1_n
display r(mean)
restore

An alternative to margins camper is the less commonly used margins, over(camper), which uses both equations to calculate the number of events (without replacing the values for camper), then reports the mean for non-campers and the mean for campers. This gives camper/non-camper ratio = 2.91 (95% CI, 1.44-5.90).

Code:

use https://www.stata-press.com/data/r16/fish, clear
zip count persons child i.camper i.livebait, ///
inflate(persons child i.camper i.livebait) vce(robust) nolog
 
margins, over(camper)
 
//reproduce the margins estimates
generate yhat_eq1 = _b[count:_cons] + _b[count:persons]*persons ///
         + _b[count:child]*child + _b[count:1.camper]*camper ///
         + _b[count:1.livebait]*livebait
generate yhat_eq2 = _b[inflate:_cons] + _b[inflate:persons]*persons ///
         + _b[inflate:child]*child + _b[inflate:1.camper]*camper ///
         + _b[inflate:1.livebait]*livebait
gen pd=exp(yhat_eq2)/(1+exp(yhat_eq2)) //p_degenerate_zero_eq2
 
generate camper_n = (1-pd)*exp(yhat_eq1) //p_not_degen_zero_eq2 * p_event_eq1
quietly summarize camper_n if camper==0
display r(mean)
quietly summarize camper_n if camper==1
display r(mean)

Chris Chen also sent me demonstrations of how margins works with the argument expression(exp(xb())), which causes margins to use only the first equation (number of fish caught by those who go fishing) and omits the second equation (probability of not going fishing).

margins camper, expression(exp(xb())) first calculates the mean count predicted by eq1 when every case has camper==0, then calculates the mean count predicted by eq1 when every case has camper==1

Code:

use https://www.stata-press.com/data/r16/fish, clear
zip count persons child i.camper i.livebait, ///
inflate(persons child i.camper i.livebait) vce(robust) nolog
 
margins camper, expression(exp(xb()))
 
//reproduce the margins estimates
preserve
replace camper = 0
generate yhat_eq1_0 = _b[count:_cons] + _b[count:persons]*persons ///
         + _b[count:child]*child + _b[count:1.camper]*camper ///
         + _b[count:1.livebait]*livebait
generate camper0 = exp(yhat_eq1_0)
quietly summarize camper0
display r(mean)
 
replace camper = 1
generate yhat_eq1_1 = _b[count:_cons] + _b[count:persons]*persons ///
         + _b[count:child]*child + _b[count:1.camper]*camper ///
         + _b[count:1.livebait]*livebait
generate camper1 = exp(yhat_eq1_1)
quietly summarize camper1
display r(mean)
restore

margins, over(camper) expression(exp(xb())) uses the first equation to calculate the number of events (without replacing the values for camper), then reports the mean for non-campers and the mean for campers.

Code:

use https://www.stata-press.com/data/r16/fish, clear
zip count persons child i.camper i.livebait, ///
inflate(persons child i.camper i.livebait) vce(robust) nolog
 
margins, over(camper) expression(exp(xb()))
 
//reproduce the margins estimates
generate yhat_eq1 = _b[count:_cons] + _b[count:persons]*persons ///
         + _b[count:child]*child + _b[count:1.camper]*camper ///
         + _b[count:1.livebait]*livebait
gen expxb = exp(yhat_eq1)
quietly summarize expxb if camper == 0
display r(mean)
quietly summarize expxb if camper == 1
display r(mean)

I hope others find this information as helpful as I did.

Comment

Frank Shann

Join Date: Jan 2018

Posts: 15
#4

11 Dec 2020, 00:17

For a description of the fishing dataset and zero-inflated Poisson regression, see
https://stats.idre.ucla.edu/stata/da...on-regression/
and https://stats.idre.ucla.edu/stata/ou...on-regression/

Paul Allison argues against using zero-inflated regression at:
https://statisticalhorizons.com/zero-inflated-models
Comment

Announcement

Zero-inflated Poisson: ratio (95% CI) for a 0,1 variable

Comment

Comment

Comment