Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ZINB margins, dydx(*)

    Hello,

    I think the coefficients directly from the zinb regression are hard to interpret, as I need to exponentiate it, and talk about how a variable affects the incidence-rate ratios.

    Code:
    . use https://www.stata-press.com/data/r17/fish,clear
    (Fictional fishing data)
    
    . zinb count persons livebait, inflate(child camper)
    
    Fitting constant-only model:
    
    Iteration 0:   log likelihood = -519.33992  
    Iteration 1:   log likelihood = -451.38662  
    Iteration 2:   log likelihood = -444.49118  
    Iteration 3:   log likelihood = -442.96272  
    Iteration 4:   log likelihood = -442.71065  
    Iteration 5:   log likelihood = -442.66718  
    Iteration 6:   log likelihood =  -442.6631  
    Iteration 7:   log likelihood = -442.66299  
    Iteration 8:   log likelihood = -442.66299  
    
    Fitting full model:
    
    Iteration 0:   log likelihood = -442.66299  (not concave)
    Iteration 1:   log likelihood = -432.83107  (not concave)
    Iteration 2:   log likelihood = -426.32934  
    Iteration 3:   log likelihood = -413.75019  
    Iteration 4:   log likelihood = -403.09586  
    Iteration 5:   log likelihood = -401.56013  
    Iteration 6:   log likelihood = -401.54781  
    Iteration 7:   log likelihood = -401.54776  
    Iteration 8:   log likelihood = -401.54776  
    
    Zero-inflated negative binomial regression      Number of obs     =        250
                                                    Nonzero obs       =        108
                                                    Zero obs          =        142
    
    Inflation model = logit                         LR chi2(2)        =      82.23
    Log likelihood  = -401.5478                     Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
           count |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    count        |
         persons |   .9742984   .1034938     9.41   0.000     .7714543    1.177142
        livebait |   1.557523   .4124424     3.78   0.000     .7491503    2.365895
           _cons |  -2.730064    .476953    -5.72   0.000    -3.664874   -1.795253
    -------------+----------------------------------------------------------------
    inflate      |
           child |   3.185999   .7468551     4.27   0.000      1.72219    4.649808
          camper |  -2.020951    .872054    -2.32   0.020    -3.730146   -.3117567
           _cons |  -2.695385   .8929071    -3.02   0.003     -4.44545   -.9453189
    -------------+----------------------------------------------------------------
        /lnalpha |   .5110429   .1816816     2.81   0.005     .1549535    .8671323
    -------------+----------------------------------------------------------------
           alpha |   1.667029   .3028685                      1.167604    2.380076
    ------------------------------------------------------------------------------
    Therefore, I decide to use margins, dydx(*), as it will tell me the effects of a variable on the number of counts, which is much easier to interpret.
    Code:
    . margins, dydx(*)
    
    Average marginal effects                        Number of obs     =        250
    Model VCE    : OIM
    
    Expression   : Predicted number of events, predict()
    dy/dx w.r.t. : persons livebait child camper
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         persons |   3.051303   .6943597     4.39   0.000     1.690383    4.412222
        livebait |   4.877841   1.580321     3.09   0.002     1.780469    7.975213
           child |  -1.506576   .2890582    -5.21   0.000     -2.07312   -.9400329
          camper |   .9556555   .3077272     3.11   0.002     .3525213     1.55879
    ------------------------------------------------------------------------------
    However, I find it very bizarre that margins, dydx(*) even presents the results for child and camper. Child and camper are inside "inflate" only, and are therefore used to predict only the degenerate zeros, and are not used in the second stage negative binomial regression. How is Stata able to estimate the effects of child and camper on the number of counts then?

    Code:
    
    . zinb count persons livebait child camper, inflate(child camper)
    
    Fitting constant-only model:
    
    Iteration 0:   log likelihood = -519.33992  
    Iteration 1:   log likelihood = -451.38662  
    Iteration 2:   log likelihood = -444.49118  
    Iteration 3:   log likelihood = -442.96272  
    Iteration 4:   log likelihood = -442.71065  
    Iteration 5:   log likelihood = -442.66718  
    Iteration 6:   log likelihood =  -442.6631  
    Iteration 7:   log likelihood = -442.66299  
    Iteration 8:   log likelihood = -442.66299  
    
    Fitting full model:
    
    Iteration 0:   log likelihood = -442.66299  (not concave)
    Iteration 1:   log likelihood =  -431.0508  (not concave)
    Iteration 2:   log likelihood = -421.09041  (not concave)
    Iteration 3:   log likelihood = -420.10731  (not concave)
    Iteration 4:   log likelihood = -414.28162  
    Iteration 5:   log likelihood =  -393.6678  
    Iteration 6:   log likelihood = -388.95768  
    Iteration 7:   log likelihood = -388.84164  
    Iteration 8:   log likelihood = -388.82783  
    Iteration 9:   log likelihood = -388.82573  
    Iteration 10:  log likelihood = -388.82545  
    Iteration 11:  log likelihood =  -388.8254  
    Iteration 12:  log likelihood = -388.82539  
    
    Zero-inflated negative binomial regression      Number of obs     =        250
                                                    Nonzero obs       =        108
                                                    Zero obs          =        142
    
    Inflation model = logit                         LR chi2(4)        =     107.68
    Log likelihood  = -388.8254                     Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
           count |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    count        |
         persons |   1.084071   .1078292    10.05   0.000     .8727292    1.295412
        livebait |   1.556289   .4026417     3.87   0.000     .7671257    2.345452
           child |  -1.287611   .2350382    -5.48   0.000    -1.748277   -.8269444
          camper |   .2635901    .242317     1.09   0.277    -.2113426    .7385228
           _cons |   -2.95122    .468753    -6.30   0.000    -3.869959   -2.032481
    -------------+----------------------------------------------------------------
    inflate      |
           child |   14.66326   564.7218     0.03   0.979    -1092.171    1121.498
          camper |  -14.47428   564.7236    -0.03   0.980    -1121.312    1092.364
           _cons |  -14.71499    564.723    -0.03   0.979    -1121.552    1092.122
    -------------+----------------------------------------------------------------
        /lnalpha |   .4738543   .1625641     2.91   0.004     .1552345    .7924741
    -------------+----------------------------------------------------------------
           alpha |   1.606173    .261106                      1.167932    2.208854
    ------------------------------------------------------------------------------
    
    . margins, dydx(*)
    
    Average marginal effects                        Number of obs     =        250
    Model VCE    : OIM
    
    Expression   : Predicted number of events, predict()
    dy/dx w.r.t. : persons livebait child camper
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         persons |   3.512221   .8344028     4.21   0.000     1.876821     5.14762
        livebait |   5.042135   1.646428     3.06   0.002     1.815196    8.269075
           child |  -5.274766   42.49499    -0.12   0.901    -88.56342    78.01389
          camper |    1.94288   42.48519     0.05   0.964    -81.32657    85.21233
    ------------------------------------------------------------------------------
    If child and camper are both in the "inflate" part and the negative binomial part, what exactly does the coefficient for them under margins, dydx(*) mean?

    Thank you.

  • #2
    So suppose we have two observations with different values of, say, child. The expected difference in their values of count comes from two separate parts of the model. First, the difference in values of child is associated with the probability that this observation is in the "always zero" component of the mixture of outcome distributions. In the output you show, the observation with the higher value of child has a greater chance of being in the "always zero" component because its coefficient in the "inflate" output is positive. On the other hand, if it is not in the "always zero" component, then within the outcome distribution of that component, its expected value of count is lower. These two things reinforce each other: that observation is drawn towards zero by a higher probability of being an "always zero" observation, and in the event it isn't an "always zero" observation, its expected value is lower also (because of the negative coefficient in the non-inflate output). -margins- is combining these two effects.

    It works similarly if child only appears in the inflate part of the model. In that case, there is only a single effect in operation: the positive inflate coefficient means a greater probability of being a "zero only" observation, and since 0 is lower than any other possible outcome value in zinb, this means that the overall expected value of count will be lower in observations with higher values of child.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      So suppose we have two observations with different values of, say, child. The expected difference in their values of count comes from two separate parts of the model. First, the difference in values of child is associated with the probability that this observation is in the "always zero" component of the mixture of outcome distributions. In the output you show, the observation with the higher value of child has a greater chance of being in the "always zero" component because its coefficient in the "inflate" output is positive. On the other hand, if it is not in the "always zero" component, then within the outcome distribution of that component, its expected value of count is lower. These two things reinforce each other: that observation is drawn towards zero by a higher probability of being an "always zero" observation, and in the event it isn't an "always zero" observation, its expected value is lower also (because of the negative coefficient in the non-inflate output). -margins- is combining these two effects.

      It works similarly if child only appears in the inflate part of the model. In that case, there is only a single effect in operation: the positive inflate coefficient means a greater probability of being a "zero only" observation, and since 0 is lower than any other possible outcome value in zinb, this means that the overall expected value of count will be lower in observations with higher values of child.
      This makes a lot of sense. Thanks.

      Comment

      Working...
      X