Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins and exposure term in negative binomial models

    Dear Statalist,

    I am using a negative binomial (nbreg) to model the number of events in geographical units, with exposure term for the number of individuals at risk in each unit. I am trying to better understand how exposure is constructed and interpreted:

    1. I want to show substantive effect of some independent variables of interest. When I use postestimation commands such as margins and margins plot, will they automatically account for the exposure term?

    In an earlier post about use of offset command, Sam said that:

    By constraining the coefficient of the exposure variable to equal one you transform the model into a model of rates (e.g., injuries per unit of exposure instead of the probability that the person will be injured).
    So a predicted value from a nbreg with exposure term gives me rates per individual at risk and if I wanted to obtain a predicted number of events in each unit, I would need to multiply it by exposure_term?

    2. If I wanted to construct predicted values by hand, is it correct to think that I can calculate it as y=exp(b_0+sum(b_i*x_i)+ln(exposure_term))? When entered through exposure command, the coefficient for exposure_term is set to be equal to 1, the coefficient for ln(exposure_term) is not exactly equal to 1, so I am not sure whether I need to adjust something else and how.

    Any suggestions/hints would be greatly appreciated.

    Thanks,
    Juta

  • #2
    Dear Juta:

    Here is my answer, as I understand Stata, to your questions.

    1) Yes, margins does account for the exposure term if you use the "over" option, but not the "at" option. (Search for a question I posted about margins with exposure terms for a great answer to this question.) Your code would be something like:

    margins, over("variable_with_geographical_units")

    2) Yes, the predicted values are the exponentiated sum of your coefficients (plus the log of the exposure term). I'm not sure why your exposure coefficient isn't equal to 1. If you use the "exposure" option with "glm" or "nbreg", Stata automatically logs the values of the exposure term for you. If you use the "offset" option, you have to log the values of the exposure term manually, then put the logged values (not the original ones), in parentheses. But the choice of one or other option shouldn't matter. The coefficient should always be exactly 1. If you posted your code and results, that would be helpful.

    Here's an example.

    The data are a frequency table that gives the number of respondents who report having lived abroad (LivAb, column four), out of the total number of people (F, column five, the exposure term) in each of eight categories given by a full cross-classification of three binary variables, having family members abroad, sex (male=1), and rural residence (rural=1).

    +--------------------------------------+
    FamAbr~d Sex Rural LivAb F
    --------------------------------------
    1. 0 0 0 11 440
    2. 0 0 1 8 205
    3. 0 1 0 28 400
    4. 0 1 1 24 184
    5. 1 0 0 50 397
    --------------------------------------
    6. 1 0 1 24 166
    7. 1 1 0 105 426
    8. 1 1 1 50 173
    +--------------------------------------+

    Here's the regression and estimated coefficients:

    nbreg LivAb FamAbroad Sex Rural, exposure(F)

    y1
    LivAb:FamAbroad 1.2039895
    LivAb:Sex .77861417
    LivAb:Rural .25909923
    LivAb:_cons -3.3852964

    I predicted the results with:

    margins, over(FamAbroad Sex Rural)

    -------------------------------------------------------------------------------------
    | Delta-method
    | Margin Std. Err. z P>|z| [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
    FamAbroad#Sex#Rural |
    0 0 0 | 14.90175 2.261863 6.59 0.000 10.46858 19.33492
    0 0 1 | 8.996295 1.467059 6.13 0.000 6.120912 11.87168
    0 1 0 | 29.51157 3.945925 7.48 0.000 21.7777 37.24544
    0 1 1 | 17.59039 2.585205 6.80 0.000 12.52348 22.6573
    1 0 0 | 44.81888 5.259027 8.52 0.000 34.51137 55.12638
    1 0 1 | 24.28309 3.269221 7.43 0.000 17.87553 30.69064
    1 1 0 | 104.7678 9.088464 11.53 0.000 86.95475 122.5809
    1 1 1 | 55.13022 6.051548 9.11 0.000 43.26941 66.99104
    -------------------------------------------------------------------------------------

    And by hand for the last observation (a rural male with family members abroad)

    . display exp(1.2039895+.77861417+.25909923-3.3852964+ln(173))
    55.130224


    Hope this helps!

    Best,
    David
    Web site:
    ​http://investigadores.cide.edu/crow/


    Las Américas y el Mundo:
    http://lasamericasyelmundo.cide.edu/

    ==========================================
    David Crow
    Associate Professor, División de Estudios Internacionales
    Centro de Investigación y Docencia Económicas (CIDE)
    ==========================================

    Comment


    • #3
      Dear David,
      Many thanks for a detailed answer and your example. I was (am) a bit confused, here is why. I am modelling riot participation in neighborhoods. When I run a model with exposure term:

      nbreg `depvar' `neighvars' `ethnicvars' `contextvars' `districtvars' , vce(cluster districtname) nolog exposure(residents)


      Negative binomial regression Number of obs = 25022
      Dispersion = mean Wald chi2(22) = 2109.33
      Log pseudolikelihood = -5873.8109 Prob > chi2 = 0.0000

      (Std. Err. adjusted for 32 clusters in districtname)
      charged_days Coef. Std. Err. z P>z [95% Conf. Interval]
      OA_youth 2.076011 0.674001 3.08 0.002 0.754993 3.39703
      OA_owned -0.3112 0.222187 -1.4 0.161 -0.74668 0.124275
      OA_class1 -1.01423 0.783415 -1.29 0.195 -2.54969 0.521239
      OA_class2 -0.57539 1.084162 -0.53 0.596 -2.70031 1.549529
      OA_class3 0.158465 1.260051 0.13 0.9 -2.31119 2.62812
      density -0.88564 0.326333 -2.71 0.007 -1.52525 -0.24604
      OA_recent_arrivals -0.97694 1.380629 -0.71 0.479 -3.68292 1.729046
      OA_condis 0.44638 0.089804 4.97 0 0.270367 0.622392
      OA_whiteirish -2.01125 2.869781 -0.7 0.483 -7.63592 3.613418
      OA_whiteother -1.52231 0.735314 -2.07 0.038 -2.9635 -0.08112
      OA_black_african 0.267121 0.630103 0.42 0.672 -0.96786 1.5021
      OA_black_carrib 2.485802 1.127984 2.2 0.028 0.274994 4.69661
      OA_asian_pakistani 1.743263 0.879157 1.98 0.047 0.020148 3.466379
      OA_asian_indian -2.94992 0.999164 -2.95 0.003 -4.90824 -0.99159
      OA_asian_bangladeshi 0.366331 0.626589 0.58 0.559 -0.86176 1.594422
      OA_otherasian -0.44456 1.196941 -0.37 0.71 -2.79052 1.901405
      envy1500 -1.92559 0.477805 -4.03 0 -2.86207 -0.98911
      diversity 1.763117 0.624773 2.82 0.005 0.538586 2.987649
      d_footlocker2 0.058315 0.008486 6.87 0 0.041683 0.074948
      respectvalue -1.03069 0.18491 -5.57 0 -1.39311 -0.66827
      growth_all1 -0.26934 1.262879 -0.21 0.831 -2.74454 2.205861
      turnout -0.2676 0.9362 -0.29 0.775 -2.10252 1.567315
      _cons -3.70375 1.344265 -2.76 0.006 -6.33846 -1.06903
      ln(residents) 1 (exposure)
      /lnalpha 0.588199 0.109279 0.374016 0.802382
      alpha 1.800742 0.196784 1.45356 2.230848
      But then when I put ln(residents) as an independent variable instead of exposure option I get

      nbreg `depvar' `neighvars' `ethnicvars' `contextvars' `districtvars' lnres, vce(cluster districtname) nolog

      Negative binomial regression Number of obs = 25022
      Dispersion = mean Wald chi2(23) = 2166.48
      Log pseudolikelihood = -5873.5848 Prob > chi2 = 0.0000

      (Std. Err. adjusted for 32 clusters in districtname)
      charged_days Coef. Std. Err. z P>z [95% Conf. Interval]
      OA_youth 2.139978 0.671971 3.18 0.001 0.822939 3.457017
      OA_owned -0.30138 0.229663 -1.31 0.189 -0.75151 0.148748
      OA_class1 -1.07792 0.782675 -1.38 0.168 -2.61194 0.456091
      OA_class2 -0.6452 1.113922 -0.58 0.562 -2.82845 1.53805
      OA_class3 0.140808 1.276893 0.11 0.912 -2.36186 2.643473
      density -0.87897 0.326099 -2.7 0.007 -1.51811 -0.23983
      OA_recent_arrivals -1.00519 1.373926 -0.73 0.464 -3.69804 1.687651
      OA_condis 0.439533 0.089304 4.92 0 0.2645 0.614566
      OA_whiteirish -2.07785 2.860285 -0.73 0.468 -7.6839 3.528206
      OA_whiteother -1.52063 0.738365 -2.06 0.039 -2.9678 -0.07346
      OA_black_african 0.274848 0.632185 0.43 0.664 -0.96421 1.513909
      OA_black_carrib 2.475558 1.12428 2.2 0.028 0.272009 4.679106
      OA_asian_pakistani 1.761014 0.880915 2 0.046 0.034453 3.487575
      OA_asian_indian -2.92643 0.991531 -2.95 0.003 -4.86979 -0.98306
      OA_asian_bangladeshi 0.38848 0.621084 0.63 0.532 -0.82882 1.605782
      OA_otherasian -0.45449 1.190698 -0.38 0.703 -2.78822 1.879231
      envy1500 -1.94574 0.478951 -4.06 0 -2.88446 -1.00701
      diversity 1.782298 0.623968 2.86 0.004 0.559343 3.005253
      d_footlocker2 0.05819 0.008568 6.79 0 0.041397 0.074982
      respectvalue_boro~6m -1.03072 0.184279 -5.59 0 -1.3919 -0.66954
      growth_all1 -0.27068 1.260594 -0.21 0.83 -2.7414 2.200036
      turnout -0.25017 0.940662 -0.27 0.79 -2.09383 1.593497
      lnres 0.919316 0.148985 6.17 0 0.627311 1.21132
      _cons -3.23876 1.406065 -2.3 0.021 -5.99459 -0.48292
      /lnalpha 0.586368 0.109185 0.372369 0.800367
      alpha 1.797448 0.196254 1.451168 2.226357
      Where coefficient for lnres is not exactly equal to 1. In the meantime someone suggested that this should not be a problem and that I should just run

      test lnres=1

      ( 1) [charged_days]lnres = 1

      chi2( 1) = 0.29
      Prob > chi2 = 0.5881

      and assume that it's not statistically different from 1 which I think is a reasonable suggestion. Anyway, many thanks for replying.

      Juta

      Comment


      • #4
        Dear Juta-

        The problem with the second way you've specified the model--i.e., entering the logged exposure term directly in the model--is that the coefficient is estimated as a free parameter rather than being constrained to 1. What the "exposure" option does is exactly that: linearly restrict the coefficient of the exposure/offset term to 1.

        The problem with the solution suggested to you is that 1) it won't always work (the fact that it works now is accidental) and 2) even though the parameter is not statistically different than 1, the predicted values will be off, possibly by a lot for large values of X. In fact, the difference between the values predicted by your model and the values predicted will increase exponentially with X.

        To estimate your model manually, you would need to impose restrict the exposure coefficient to 1 "by hand," either with a design matrix or the "constraints" option. I'm not sure why you would want to do that. Is there some reason you can't use the "exposure," as in your first model?

        Hope this help.

        All the best,
        David

        Web site:
        ​http://investigadores.cide.edu/crow/


        Las Américas y el Mundo:
        http://lasamericasyelmundo.cide.edu/

        ==========================================
        David Crow
        Associate Professor, División de Estudios Internacionales
        Centro de Investigación y Docencia Económicas (CIDE)
        ==========================================

        Comment

        Working...
        X