Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Zero-inflated Poisson: ratio (95% CI) for a 0,1 variable

    Zero-inflated Poisson: ratio (95% CI) for a 0,1 variable

    I would greatly appreciate advice about the best way to calculate the overall ratio and its 95% CI (not the difference) for a categorical variable with values 0 or 1 in zero-inflated Poisson regression (zip) in Stata/IC 16.1. For example, in the fishing dataset, calculate the overall ratio and its 95% CI for the number of fish caught for camper=1 / camper=0 given that campers who go fishing catch more fish than non-campers who go fishing (count model) and campers are more likely to go fishing than non-campers (inflate model):

    Code:
    use https://www.stata-press.com/data/r16/fish
    zip count persons child i.camper i.livebait, ///
    inflate(persons child i.camper i.livebait) vce(robust) irr nolog
    Zero-inflated Poisson regression                Number of obs     =        250
    Nonzero obs       =        108                  Zero obs          =        142
    Inflation model      = logit                    Wald chi2(4)      =      31.53
    Log pseudolikelihood = -712.4109                Prob > chi2       =     0.0000
    ------------------------------------------------------------------------------
                 |               Robust
           count |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    count        |
         persons |   2.307947   .4018602     4.80   0.000     1.640645    3.246663
           child |   .3070827   .1140441    -3.18   0.001     .1482986    .6358779
        1.camper |   1.816776   .6923589     1.57   0.117      .860826    3.834313
      1.livebait |   6.164537   2.812036     3.99   0.000     2.521236    15.07258
           _cons |   .0861211   .0717259    -2.94   0.003      .016834    .4405871
    -------------+----------------------------------------------------------------
    inflate      |
         persons |  -.9311615   .2235628    -4.17   0.000    -1.369337   -.4929864
           child |   1.958577   .3640051     5.38   0.000      1.24514    2.672014
        1.camper |  -.8716704   .4055473    -2.15   0.032    -1.666529   -.0768123
      1.livebait |   .7407971   1.512794     0.49   0.624    -2.224225    3.705819
           _cons |   .8337162   1.751823     0.48   0.634    -2.599794    4.267227
    ------------------------------------------------------------------------------
    Note: Estimates are transformed only in the first equation.
    Note: _cons estimates baseline incidence rate.
     
    . mchange camper, stat(all)
    zip: Changes in mu | Number of obs = 250
    Expression: Predicted number of count, predict()
           | Change p-value LL    UL    z-value Std Err From  To
    -------+-------------------------------------------------------
    camper |                                                                                       
    1 vs 0 | 2.153  0.022   0.314 3.991 2.295   0.938   1.840 3.993
    Average prediction: 3.230
     
    . * Ratio camper/non-camper
    . di 3.993/1.840 = 2.1701087
    . * Lower 95% CI
    . di (1.840 + 0.314)/1.840 = 1.1706522
    . * Upper 95% CI
    . di (1.840 + 3.991)/1.840 = 3.1690217
     
    . margins camper, post
    Predictive margins, Number of obs = 250, Model VCE : Robust
    Expression: Predicted number of events, predict()
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          camper |
              0  |   1.839983    .518668     3.55   0.000     .8234123    2.856553
              1  |   3.992508   .8026569     4.97   0.000      2.41933    5.565687
    ------------------------------------------------------------------------------
     
    . nlcom (Ratio: _b[1.camper] / _b[0.camper])
      risk_ratio:  _b[1.camper] / _b[0.camper]
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
      Ratio      |   2.169862   .7368881     2.94   0.003     .7255876    3.614136
    -------------------------------------------------------------------------------
    Using mchange (from SPost) I calculate (above) that the camper/non-camper ratio (95% CI) is 2.170 (1.171-3.169), p = 0.022. However, nlcom (above) returns 2.170 (0.726-3.614), p = 0.003.

    The Stata Base Reference Manual Release 16 entry for nlcom on page 1721 (Technical note) says in part, ‘The test of H0 : exp(β) = 1 is asymptotically equivalent to a test of H0 : β = 0, the Wald test in the original metric, but the latter has better small-sample properties. Thus if you specify eform, you get a test of H0 : β = 0. nlcom, however, is general. It does not attempt to infer the test of greatest interest for a given transformation, and so a test of H0 : transformed coefficient = 0 is always given, regardless of the transformation.’

    1. The Reference Manual entry suggests that the mchange estimate may be better than the nlcom estimate. Is that correct?
    2. What is the best way to calculate the 95% CI of the ratio of campers/non-campers?

  • #2
    I think my calculation (above) using mchange camper is incorrect.

    LL of difference = 0.314 = (3.993 - x) - (1.840 + x)
    which gives x = 0.9195
    So LL of ratio = (3.993 - 0.9195) / (1.840 + 0.9195) = 1.114

    UL of difference = 3.991 = (3.993 + y) - (1.840 - y)
    which gives y = 0.919
    So UL of ratio = (3.993 + 0.919) / (1.840 - 0.919) = 5.333

    So the ratio of camper=1 / camper=0 is 2.17 (95% CI, 1.11-5.33).
    1. Is that correct?
    2. Is there a better way to calculate the 95% CI of the ratio?

    Comment


    • #3
      I have had extensive expert advice from Chris Chan at Stata Technical Services about how to calculate the ratio and its 95% CI for the number of fish caught by campers versus non-campers (1.camper/0.camper) after a zero-inflated regression (zip or zinb) using the fictional fishing dataset:

      margins camper, post
      nlcom ln(_b[1.camper]/_b[0.camper])

      then di exp(ratio), di exp(lowCI), di exp(highCI)

      In the fishing dataset, visitors to a park are asked whether or not they have a camper, how many people were in the group, were there children in the group, and how many fish were caught. Some visitors who did fish did not catch any fish. Other visitors did not fish at all, so there are excess zeros in the data; but there is no data on whether a person fished or not.

      Zero-inflation models (zip, zinb) report two equations:
      Equation 1: linear or count model (expected count if not a degenerate or structural zero)
      Equation 2: binary or inflate model (odds of being a degenerate or structural zero).

      1. To calculate the ratio and its 95% CI for the number of fish caught by campers versus non-campers (or any other 0,1 variable), we need to use both of the equations in the zero-inflated regression: campers who go fishing catch more fish than non-campers who go fishing (the first, count, or linear equation) and campers are more likely to go fishing than non-campers (the second, inflate, or binary equation). margins camper, post incorporates both equations, using count-probability*(1 - inflate-probability).

      2. nlcom can then be used to calculate the ratio and its 95% CI.

      3. However, the 95% CI should be calculated in the log scale, then antilogged, as discussed at https://www.stata.com/statalist/arch.../msg00428.html.
      (a) So use
      nlcom ln(_b[1.camper]/_b[0.camper])
      then di exp(ratio), di exp(lowCI), di exp(highCI)
      which gives camper/non-camper ratio = 2.17 (95% CI 1.12-4.22);
      (b) And do not calculate the 95% CI on the antilogs:
      nlcom (risk_ratio: _b[1.camper]/_b[0.camper])
      which gives camper/non-camper ratio = 2.17 (95% CI 0.73-3.61).

      Here is a do file to calculate the camper/non-camper ratio and its 95% CI in zip (or zinb):

      Code:
      capture program drop _all
      program antilog
          local bound  invnorm(.975)*sqrt(el(r(V),1,1))
          local lparm el(r(b),1,1)
          local ll  exp(`lparm' - `bound')
          local ul  exp(`lparm' + `bound')
          di _col(1) "parm =" %10.4f exp(`lparm')  ///
             _col(20) "ll ="  %10.4f `ll'  ///
             _col(40) "ul ="  %10.4f `ul'
      end
      
      use https://www.stata-press.com/data/r16/fish, clear
      zip count persons child i.camper i.livebait, ///
          inflate(persons child i.camper i.livebait) vce(robust) nolog
      margins camper, post
      nlcom ln(_b[1.camper]/_b[0.camper]), post
      antilog
      Chris Chen sent me the following demonstration of how margins camper works after zip; there are 250 observations with 103 non-campers and 147 campers. For the number of events for non-campers, margins camper replaces camper=0 for all 250 observations then calculates the mean of (prob_event_eq1))*(1 - prob_degenerative_zero_eq2) and, for the number of events for campers, it replaces camper=1 for all 250 observations then calculates the mean of (prob_event_eq1))*(1 - prob_degenerative_zero_eq2)).

      Code:
      use https://www.stata-press.com/data/r16/fish, clear
      zip count persons child i.camper i.livebait, ///
      inflate(persons child i.camper i.livebait) vce(robust) nolog
       
      generate yhat_eq1 = _b[count:_cons] + _b[count:persons]*persons + ///
      _b[count:child]*child + _b[count:1.camper]*camper + _b[count:1.livebait]*livebait
      predict yhat_eq1_1, xb
      assert yhat_eq1 == yhat_eq1_1
       
      gen yhat_eq2 = _b[inflate:_cons] + _b[inflate:persons]*persons + _b[inflate: ///
      child]*child + _b[inflate:1.camper]*camper + _b[inflate:1.livebait]*livebait
      predict yhat_eq2_2, xb equation(#2)
      assert yhat_eq2 == yhat_eq2_2
       
      predict p, pr  // probability_degenerative_zero, equation(#2)
      gen myp=exp(yhat_eq2)/(1+exp(yhat_eq2))
      sum p myp
       
      predict n, n  // number of events
      gen myn=(1-p)*exp(yhat_eq1)  // p_not_degenerative_zero_eq2 * p_event_eq1
      sum n myn
       
      margins camper
       
      //reproduce the margins estimates
      preserve
      replace camper = 0
      predict yhat0, xb  // linear prediction, equation(#1)
      predict p0, pr  // probability_degenerative_zero, equation(#2)
      generate camper0_n = (1-p0)*exp(yhat0)  // p_not_degen_zero_eq2 * p_event_eq1
      quietly summarize camper0_n
      display r(mean)
       
      replace camper = 1
      predict yhat1, xb  // linear prediction, equation(#1)
      predict p1, pr  // probability of a degenerative zero, equation(#2)
      generate camper1_n = (1-p1)*exp(yhat1)  // p_not_degen_zero_eq2 * p-event_eq1
      quietly summarize camper1_n
      display r(mean)
      restore
      An alternative to margins camper is the less commonly used margins, over(camper), which uses both equations to calculate the number of events (without replacing the values for camper), then reports the mean for non-campers and the mean for campers. This gives camper/non-camper ratio = 2.91 (95% CI, 1.44-5.90).

      Code:
      use https://www.stata-press.com/data/r16/fish, clear
      zip count persons child i.camper i.livebait, ///
      inflate(persons child i.camper i.livebait) vce(robust) nolog
       
      margins, over(camper)
       
      //reproduce the margins estimates
      generate yhat_eq1 = _b[count:_cons] + _b[count:persons]*persons ///
               + _b[count:child]*child + _b[count:1.camper]*camper ///
               + _b[count:1.livebait]*livebait
      generate yhat_eq2 = _b[inflate:_cons] + _b[inflate:persons]*persons ///
               + _b[inflate:child]*child + _b[inflate:1.camper]*camper ///
               + _b[inflate:1.livebait]*livebait
      gen pd=exp(yhat_eq2)/(1+exp(yhat_eq2)) //p_degenerate_zero_eq2
       
      generate camper_n = (1-pd)*exp(yhat_eq1) //p_not_degen_zero_eq2 * p_event_eq1
      quietly summarize camper_n if camper==0
      display r(mean)
      quietly summarize camper_n if camper==1
      display r(mean)
      Chris Chen also sent me demonstrations of how margins works with the argument expression(exp(xb())), which causes margins to use only the first equation (number of fish caught by those who go fishing) and omits the second equation (probability of not going fishing).

      margins camper, expression(exp(xb())) first calculates the mean count predicted by eq1 when every case has camper==0, then calculates the mean count predicted by eq1 when every case has camper==1

      Code:
      use https://www.stata-press.com/data/r16/fish, clear
      zip count persons child i.camper i.livebait, ///
      inflate(persons child i.camper i.livebait) vce(robust) nolog
       
      margins camper, expression(exp(xb()))
       
      //reproduce the margins estimates
      preserve
      replace camper = 0
      generate yhat_eq1_0 = _b[count:_cons] + _b[count:persons]*persons ///
               + _b[count:child]*child + _b[count:1.camper]*camper ///
               + _b[count:1.livebait]*livebait
      generate camper0 = exp(yhat_eq1_0)
      quietly summarize camper0
      display r(mean)
       
      replace camper = 1
      generate yhat_eq1_1 = _b[count:_cons] + _b[count:persons]*persons ///
               + _b[count:child]*child + _b[count:1.camper]*camper ///
               + _b[count:1.livebait]*livebait
      generate camper1 = exp(yhat_eq1_1)
      quietly summarize camper1
      display r(mean)
      restore
      margins, over(camper) expression(exp(xb())) uses the first equation to calculate the number of events (without replacing the values for camper), then reports the mean for non-campers and the mean for campers.

      Code:
      use https://www.stata-press.com/data/r16/fish, clear
      zip count persons child i.camper i.livebait, ///
      inflate(persons child i.camper i.livebait) vce(robust) nolog
       
      margins, over(camper) expression(exp(xb()))
       
      //reproduce the margins estimates
      generate yhat_eq1 = _b[count:_cons] + _b[count:persons]*persons ///
               + _b[count:child]*child + _b[count:1.camper]*camper ///
               + _b[count:1.livebait]*livebait
      gen expxb = exp(yhat_eq1)
      quietly summarize expxb if camper == 0
      display r(mean)
      quietly summarize expxb if camper == 1
      display r(mean)
      I hope others find this information as helpful as I did.

      Comment


      • #4
        For a description of the fishing dataset and zero-inflated Poisson regression, see
        https://stats.idre.ucla.edu/stata/da...on-regression/
        and https://stats.idre.ucla.edu/stata/ou...on-regression/

        Paul Allison argues against using zero-inflated regression at:
        https://statisticalhorizons.com/zero-inflated-models

        Comment

        Working...
        X