Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reasonable results in ZINB, but bizarre results after margins, dydx(*) (data attached)

    Hello,

    I try to use margins, dydx(*) after zinb, but it yields crazy results.

    Code:
    . zinb sum_l2 mud_sqft_per_capita office_sqft_per_capita retail_sqft_per_capita,inflate(mud_sqft_per_capita owned) vce(robust)
    
    Fitting constant-only model:
    
    Iteration 0:   log pseudolikelihood = -1935.8032  (not concave)
    Iteration 1:   log pseudolikelihood = -1398.7715  
    Iteration 2:   log pseudolikelihood = -1339.4624  
    Iteration 3:   log pseudolikelihood = -1327.9422  
    Iteration 4:   log pseudolikelihood = -1326.7051  
    Iteration 5:   log pseudolikelihood = -1326.6635  
    Iteration 6:   log pseudolikelihood = -1326.6634  
    
    Fitting full model:
    
    Iteration 0:   log pseudolikelihood = -1326.6634  
    Iteration 1:   log pseudolikelihood = -1310.2141  
    Iteration 2:   log pseudolikelihood = -1301.1221  
    Iteration 3:   log pseudolikelihood = -1291.9257  
    Iteration 4:   log pseudolikelihood =  -1290.279  
    Iteration 5:   log pseudolikelihood = -1290.2517  
    Iteration 6:   log pseudolikelihood = -1290.2517  
    
    Zero-inflated negative binomial regression      Number of obs     =      1,594
                                                    Nonzero obs       =        258
                                                    Zero obs          =      1,336
    
    Inflation model      = logit                    Wald chi2(3)      =      19.52
    Log pseudolikelihood = -1290.252                Prob > chi2       =     0.0002
    
    ----------------------------------------------------------------------------------------
                           |               Robust
                    sum_l2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
    sum_l2                 |
       mud_sqft_per_capita |  -.0350013    .674072    -0.05   0.959    -1.356158    1.286155
    office_sqft_per_capita |   3.080568   1.015168     3.03   0.002     1.090875    5.070261
    retail_sqft_per_capita |   1.813879   .5685284     3.19   0.001     .6995834    2.928174
                     _cons |   .3769259   .2642011     1.43   0.154    -.1408988    .8947507
    -----------------------+----------------------------------------------------------------
    inflate                |
       mud_sqft_per_capita |  -11.19935   1.662089    -6.74   0.000    -14.45699    -7.94172
                     owned |  -.5364868   .5588103    -0.96   0.337    -1.631735    .5587612
                     _cons |   1.413014   .4707848     3.00   0.003     .4902923    2.335735
    -----------------------+----------------------------------------------------------------
                  /lnalpha |   1.650809   .1652354     9.99   0.000     1.326953    1.974664
    -----------------------+----------------------------------------------------------------
                     alpha |   5.211192   .8610735                       3.76954    7.204199
    ----------------------------------------------------------------------------------------
    
    . margins, dydx(*)
    
    Average marginal effects                        Number of obs     =      1,594
    Model VCE    : Robust
    
    Expression   : Predicted number of events, predict()
    dy/dx w.r.t. : mud_sqft_per_capita office_sqft_per_capita retail_sqft_per_capita owned
    
    ----------------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
       mud_sqft_per_capita |   2.16e+08   1.85e+09     0.12   0.907    -3.41e+09    3.84e+09
    office_sqft_per_capita |   7.35e+09   6.57e+10     0.11   0.911    -1.21e+11    1.36e+11
    retail_sqft_per_capita |   4.33e+09   3.74e+10     0.12   0.908    -6.91e+10    7.77e+10
                     owned |   1.44e+07   1.23e+08     0.12   0.907    -2.27e+08    2.56e+08
    ----------------------------------------------------------------------------------------

    You can see that the coefficients of variables after margins, dydx(*) are eight figures. However, if I reduce the number of variables, it works fine.

    Code:
    . zinb sum_l2 mud_sqft_per_capita,inflate(mud_sqft_per_capita owned) vce(robust)
    
    Fitting constant-only model:
    
    Iteration 0:   log pseudolikelihood = -1935.8032  (not concave)
    Iteration 1:   log pseudolikelihood = -1398.7715  
    Iteration 2:   log pseudolikelihood = -1339.4624  
    Iteration 3:   log pseudolikelihood = -1327.9422  
    Iteration 4:   log pseudolikelihood = -1326.7051  
    Iteration 5:   log pseudolikelihood = -1326.6635  
    Iteration 6:   log pseudolikelihood = -1326.6634  
    
    Fitting full model:
    
    Iteration 0:   log pseudolikelihood = -1326.6634  
    Iteration 1:   log pseudolikelihood = -1318.5842  
    Iteration 2:   log pseudolikelihood = -1317.6977  
    Iteration 3:   log pseudolikelihood = -1317.6921  
    Iteration 4:   log pseudolikelihood = -1317.6921  
    
    Zero-inflated negative binomial regression      Number of obs     =      1,594
                                                    Nonzero obs       =        258
                                                    Zero obs          =      1,336
    
    Inflation model      = logit                    Wald chi2(1)      =       6.17
    Log pseudolikelihood = -1317.692                Prob > chi2       =     0.0130
    
    -------------------------------------------------------------------------------------
                        |               Robust
                 sum_l2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
    sum_l2              |
    mud_sqft_per_capita |   2.266131   .9126316     2.48   0.013     .4774059    4.054856
                  _cons |   .5442042   .2637812     2.06   0.039     .0272026    1.061206
    --------------------+----------------------------------------------------------------
    inflate             |
    mud_sqft_per_capita |  -9.984647   1.612426    -6.19   0.000    -13.14494   -6.824349
                  owned |  -.6016932   .5475992    -1.10   0.272    -1.674968    .4715815
                  _cons |   1.391733   .4705785     2.96   0.003      .469416     2.31405
    --------------------+----------------------------------------------------------------
               /lnalpha |   1.802471    .169724    10.62   0.000     1.469818    2.135123
    --------------------+----------------------------------------------------------------
                  alpha |   6.064612    1.02931                      4.348442     8.45809
    -------------------------------------------------------------------------------------
    
    . margins, dydx(*)
    
    Average marginal effects                        Number of obs     =      1,594
    Model VCE    : Robust
    
    Expression   : Predicted number of events, predict()
    dy/dx w.r.t. : mud_sqft_per_capita owned
    
    -------------------------------------------------------------------------------------
                        |            Delta-method
                        |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
    mud_sqft_per_capita |   6.844191   1.356783     5.04   0.000     4.184946    9.503436
                  owned |   .2546788   .2433755     1.05   0.295    -.2223285     .731686
    -------------------------------------------------------------------------------------

    You can see now the coefficients of variables after margins, dydx(*) are of reasonable magnitude.


    I really wanted to share my data using dataex, but since the dataset is large I surpassed the maximum number of characters. The complete dataset is here:https://drive.google.com/file/d/1cRK...ew?usp=sharing
    The following is a list of the data:

    Code:
    . list in 1/100
    
         +-----------------------------------------------------+
         | sum_l2   mud_sq~a   office~a   retail~a       owned |
         |-----------------------------------------------------|
      1. |      0   .0898069          0          0   .57094592 |
      2. |      0   .0035725          0          0   .80785125 |
      3. |      1   .0953043   .0008397          0   .34204793 |
      4. |      1          0          0   .0742383   .76895308 |
      5. |      0    .093042          0          0   .66618496 |
         |-----------------------------------------------------|
      6. |      0    .068348          0          0   .40298507 |
      7. |      0          0          0          0   .81931466 |
      8. |      1   .0758335   .0026269          0   .56521738 |
      9. |      0   .1456166   .0043934          0   .30172414 |
     10. |      0          0          0          0    .4860681 |
         |-----------------------------------------------------|
     11. |      0          0          0          0   .80697054 |
     12. |      0   .0292876          0          0   .44064388 |
     13. |      0          0          0          0   .89447236 |
     14. |      0   .0066272          0          0   .73400676 |
     15. |      0          0          0          0   .85809314 |
         |-----------------------------------------------------|
     16. |      0   .0046185   .0025944   .0053238   .62865949 |
     17. |      0   .0038844   .0026464          0    .6130268 |
     18. |      0   .0124808   .0001208          0   .66139239 |
     19. |      1   .0056831   .0072896          0   .55831265 |
     20. |      0    .013271          0          0   .45341614 |
         |-----------------------------------------------------|
     21. |      0   .0128877   .0007828          0   .63519311 |
     22. |      0          0          0          0   .71408248 |
     23. |      2    .026289          0          0   .76016682 |
     24. |      0          0          0          0   .89873415 |
     25. |      0          0          0          0   .91246682 |
         |-----------------------------------------------------|
     26. |      0          0          0          0   .71515149 |
     27. |      0   .1051657    .000997          0   .24758843 |
     28. |      0   .0929398          0          0   .21495327 |
     29. |      0   .0373987          0   .0087623   .75645757 |
     30. |      1          0          0   .0029182   .65680474 |
         |-----------------------------------------------------|
     31. |      0          0          0          0   .87826085 |
     32. |      0          0          0          0   .90302265 |
     33. |      0          0          0          0   .90434784 |
     34. |      0          0          0          0   .93874425 |
     35. |      0   .2111307          0          0   .05439331 |
         |-----------------------------------------------------|
     36. |      4   .0051789          0          0   .72692305 |
     37. |      0          0          0          0   .92931032 |
     38. |      1          0   .0105478    .173447   .84762865 |
     39. |      0          0          0          0    .9598214 |
     40. |      0          0          0          0   .78552973 |
         |-----------------------------------------------------|
     41. |      0   .0086437          0          0   .62358278 |
     42. |      0          0          0          0    .9366197 |
     43. |      0   .0283563          0          0   .81930691 |
     44. |      0          0          0   .1779139   .95726496 |
     45. |      0          0   .0013528          0   .97846156 |
         |-----------------------------------------------------|
     46. |      0   .0162672   .0212537          0   .79591835 |
     47. |      0          0          0   .0839658   .84812623 |
     48. |      0    .070872   .0101204          0    .6792717 |
     49. |      0          0          0   .1069245   .69617707 |
     50. |      0          0          0          0   .85393256 |
         |-----------------------------------------------------|
     51. |      0   .0505848   .0426975    .022869   .45894736 |
     52. |      0          0   .0099472   .0426096   .68670309 |
     53. |      1   .2311722          0    .002651   .06952965 |
     54. |      1          0          0   .0411989   .93725491 |
     55. |      0   .0880984   .0184813   .0271888   .42549923 |
         |-----------------------------------------------------|
     56. |      0          0   1.496133          0   .85321099 |
     57. |      2    .282277          0   .0589427   .14414415 |
     58. |      0          0   .0006414          0    .6652602 |
     59. |     13   .2656941   .1226111   .5887356   .35841957 |
     60. |      2   .0916511   .0135128          0   .31596452 |
         |-----------------------------------------------------|
     61. |      0          0          0          0   .85137618 |
     62. |      2   .2368419   .0040762          0   .05069124 |
     63. |      0          0          0   .0500728   .82165605 |
     64. |     24          0          0          0           0 |
     65. |      1   .2184411   .0055871          0   .30309987 |
         |-----------------------------------------------------|
     66. |      0   .1965653   .0323297   .0081131   .19964664 |
     67. |      0   .1154042          0          0   .37626776 |
     68. |      0   .1270162    .027308   .0632858   .54897958 |
     69. |      1   .0138945          0          0   .63136458 |
     70. |      0          0          0          0   .88686132 |
         |-----------------------------------------------------|
     71. |      0          0          0          0   .84666669 |
     72. |      5          0          0   .0035174   .83356071 |
     73. |      0   .0378089   .0081194   .0117317   .63396782 |
     74. |      1   .1965736    .007549          0   .14772727 |
     75. |      0          0          0          0   .82617587 |
         |-----------------------------------------------------|
     76. |      0          0          0          0   .78370786 |
     77. |      0          0          0          0   .77880186 |
     78. |      0          0          0          0           1 |
     79. |      0   .0094124          0          0   .67676765 |
     80. |      0   .1000855   .0007475          0   .47757256 |
         |-----------------------------------------------------|
     81. |      0          0          0   .0061988   .77049178 |
     82. |      0          0          0          0   .88834953 |
     83. |      0          0          0          0   .81171548 |
     84. |      0          0          0          0   .77253217 |
     85. |      2   .3587315   .1451557   .0146335   .10459184 |
         |-----------------------------------------------------|
     86. |      0   .0802478   .0154736   .0465494    .2591241 |
     87. |      0   .1259107          0          0    .2429022 |
     88. |      0          0   .0146921   .4493715   .82840234 |
     89. |      0          0          0   .0142334    .9254902 |
     90. |      0   .0522075          0          0   .37804878 |
         |-----------------------------------------------------|
     91. |      0   .0767034          0          0   .64864862 |
     92. |      0          0   .0177341          0   .84394902 |
     93. |      0          0          0          0   .73134327 |
     94. |      0          0   .0004407          0   .70403588 |
     95. |      0    .050782          0          0   .48205128 |
         |-----------------------------------------------------|
     96. |      0   .0881407          0          0   .41935483 |
     97. |      0   .0318145   .0050043   .0561125   .57430339 |
     98. |      3   .1423303   .0013297          0   .29166666 |
     99. |      0   .0023224   .0111253   .1082391   .60655737 |
    100. |      1   .2379044          0   .0040009   .12743823 |
         +-----------------------------------------------------+
    
    .

    I also notice that if I make the number of observations smaller, margins, dydx(*) also generates coefficients with reasonable magnitude

    Code:
    . drop if !mod(_n,2)
    (797 observations deleted)
    
    .
    . zinb sum_l2 mud_sqft_per_capita office_sqft_per_capita retail_sqft_per_capita,inflate(mud_sqft_per_capita owned) vce(robust)
    
    Fitting constant-only model:
    
    Iteration 0:   log pseudolikelihood = -961.15273  (not concave)
    Iteration 1:   log pseudolikelihood = -687.27718  
    Iteration 2:   log pseudolikelihood = -656.77257  
    Iteration 3:   log pseudolikelihood = -650.22461  
    Iteration 4:   log pseudolikelihood = -649.32756  
    Iteration 5:   log pseudolikelihood = -649.28166  
    Iteration 6:   log pseudolikelihood = -649.28131  
    Iteration 7:   log pseudolikelihood = -649.28131  
    
    Fitting full model:
    
    Iteration 0:   log pseudolikelihood = -649.28131  
    Iteration 1:   log pseudolikelihood = -640.25357  
    Iteration 2:   log pseudolikelihood = -638.13366  
    Iteration 3:   log pseudolikelihood = -638.08243  
    Iteration 4:   log pseudolikelihood = -638.08236  
    Iteration 5:   log pseudolikelihood = -638.08236  
    
    Zero-inflated negative binomial regression      Number of obs     =        797
                                                    Nonzero obs       =        125
                                                    Zero obs          =        672
    
    Inflation model      = logit                    Wald chi2(3)      =      24.62
    Log pseudolikelihood = -638.0824                Prob > chi2       =     0.0000
    
    ----------------------------------------------------------------------------------------
                           |               Robust
                    sum_l2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
    sum_l2                 |
       mud_sqft_per_capita |  -.8490054     .70847    -1.20   0.231    -2.237581    .5395703
    office_sqft_per_capita |   3.356185   1.067579     3.14   0.002     1.263769    5.448601
    retail_sqft_per_capita |   2.473402   .8377714     2.95   0.003     .8313999    4.115403
                     _cons |   .8226928   .3300019     2.49   0.013     .1759011    1.469485
    -----------------------+----------------------------------------------------------------
    inflate                |
       mud_sqft_per_capita |   -10.9113   2.320612    -4.70   0.000    -15.45962   -6.362985
                     owned |   -.312362    .694893    -0.45   0.653    -1.674327    1.049603
                     _cons |    1.66365   .5588054     2.98   0.003     .5684117    2.758889
    -----------------------+----------------------------------------------------------------
                  /lnalpha |   1.539509   .2407484     6.39   0.000     1.067651    2.011367
    -----------------------+----------------------------------------------------------------
                     alpha |     4.6623   1.122442                      2.908538    7.473528
    ----------------------------------------------------------------------------------------
    
    . margins, dydx(*)
    
    Average marginal effects                        Number of obs     =        797
    Model VCE    : Robust
    
    Expression   : Predicted number of events, predict()
    dy/dx w.r.t. : mud_sqft_per_capita office_sqft_per_capita retail_sqft_per_capita owned
    
    ----------------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
       mud_sqft_per_capita |    4.90122   1.353754     3.62   0.000      2.24791     7.55453
    office_sqft_per_capita |   4.170577   2.461296     1.69   0.090    -.6534743    8.994629
    retail_sqft_per_capita |   3.073583   1.065935     2.88   0.004     .9843888    5.162777
                     owned |   .1705116   .3809638     0.45   0.654    -.5761638     .917187
    ----------------------------------------------------------------------------------------
    The thing is here I just used a subsample of my data and an easier regression to showcase the issues. In my real regression, I have more than 6000 observations, and I have about 30 variables in the ZINB model with two variables in the "inflate" part. I really need to use margins, dydx(*) to see the effects of the variables on the predicted number of events. It would be really helpful if you could let me know why the coefficients after margins, dydx(*) would be of such an unreasonably large magnitude and how I could deal with this issue. Thank you.

  • #2
    First, you may already know this, but just so everyone is clear: the coefficients that margins reports are in raw units. That is, you'd interpret them like the predicted count of whatever sum_12 is (i.e. like a risk difference). In contrast, the raw coefficients from the ZINB model are interpreted like log incidence rate ratios (from the main model) or log odds (from the logit model, aka the zero inflated part of the model).

    To answer your main question, my instinct is that it's a scaling issue. When you request margins, dydx(), you're asking Stata to compute the marginal effects of a one-unit change in whatever explanatory variable. From your data sample (thanks for providing one!), it looks like for the most part, the explanatory variables usually take on values from 0 to 0.01something, although some are higher. So, a one unit change in one of those variables is a big change. The names imply that they are the number of square feet of something per capita. Maybe think about multiplying them by 100 or 1000, so that they're the number of square feet per 100 or per 1,000 persons.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Originally posted by Weiwen Ng View Post
      First, you may already know this, but just so everyone is clear: the coefficients that margins reports are in raw units. That is, you'd interpret them like the predicted count of whatever sum_12 is (i.e. like a risk difference). In contrast, the raw coefficients from the ZINB model are interpreted like log incidence rate ratios (from the main model) or log odds (from the logit model, aka the zero inflated part of the model).

      To answer your main question, my instinct is that it's a scaling issue. When you request margins, dydx(), you're asking Stata to compute the marginal effects of a one-unit change in whatever explanatory variable. From your data sample (thanks for providing one!), it looks like for the most part, the explanatory variables usually take on values from 0 to 0.01something, although some are higher. So, a one unit change in one of those variables is a big change. The names imply that they are the number of square feet of something per capita. Maybe think about multiplying them by 100 or 1000, so that they're the number of square feet per 100 or per 1,000 persons.
      Hello Weiwen,

      Thank you for your reply. The regressors are in the unit of 1000 sqft per capita, so the margins result means that an increase of 1000 sqft per capita of the regressor is associated with an eight-figure increase in y. This is still ridiculous. I expect that an increase of 1000 sqft per capita of the regressor to be associated with a one-figure increase in y, maybe about 2-8 units of increase in y. Also scaling issues cannot explain why the margins results are reasonable when I reduce the number of regressors and when I reduce the sample size. I was wondering whether you may have other explanations as to why the margins results are so ridiculous. Thank you.

      Comment

      Working...
      X