Margins and exposure term in negative binomial models

Juta Kawalerowicz

Join Date: Apr 2014

Posts: 15
#1

Margins and exposure term in negative binomial models

21 Apr 2014, 12:17

Dear Statalist,

I am using a negative binomial (nbreg) to model the number of events in geographical units, with exposure term for the number of individuals at risk in each unit. I am trying to better understand how exposure is constructed and interpreted:

1. I want to show substantive effect of some independent variables of interest. When I use postestimation commands such as margins and margins plot, will they automatically account for the exposure term?

In an earlier post about use of offset command, Sam said that:

By constraining the coefficient of the exposure variable to equal one you transform the model into a model of rates (e.g., injuries per unit of exposure instead of the probability that the person will be injured).

So a predicted value from a nbreg with exposure term gives me rates per individual at risk and if I wanted to obtain a predicted number of events in each unit, I would need to multiply it by exposure_term?

2. If I wanted to construct predicted values by hand, is it correct to think that I can calculate it as y=exp(b_0+sum(b_i*x_i)+ln(exposure_term))? When entered through exposure command, the coefficient for exposure_term is set to be equal to 1, the coefficient for ln(exposure_term) is not exactly equal to 1, so I am not sure whether I need to adjust something else and how.

Any suggestions/hints would be greatly appreciated.

Thanks,
Juta
Tags: None
David Crow

Join Date: Apr 2014

Posts: 37
#2

27 Apr 2014, 13:00

Dear Juta:

Here is my answer, as I understand Stata, to your questions.

1) Yes, margins does account for the exposure term if you use the "over" option, but not the "at" option. (Search for a question I posted about margins with exposure terms for a great answer to this question.) Your code would be something like:

margins, over("variable_with_geographical_units")

2) Yes, the predicted values are the exponentiated sum of your coefficients (plus the log of the exposure term). I'm not sure why your exposure coefficient isn't equal to 1. If you use the "exposure" option with "glm" or "nbreg", Stata automatically logs the values of the exposure term for you. If you use the "offset" option, you have to log the values of the exposure term manually, then put the logged values (not the original ones), in parentheses. But the choice of one or other option shouldn't matter. The coefficient should always be exactly 1. If you posted your code and results, that would be helpful.

Here's an example.

The data are a frequency table that gives the number of respondents who report having lived abroad (LivAb, column four), out of the total number of people (F, column five, the exposure term) in each of eight categories given by a full cross-classification of three binary variables, having family members abroad, sex (male=1), and rural residence (rural=1).

+--------------------------------------+
FamAbr~d Sex Rural LivAb F
--------------------------------------
1. 0 0 0 11 440
2. 0 0 1 8 205
3. 0 1 0 28 400
4. 0 1 1 24 184
5. 1 0 0 50 397
--------------------------------------
6. 1 0 1 24 166
7. 1 1 0 105 426
8. 1 1 1 50 173
+--------------------------------------+

Here's the regression and estimated coefficients:

nbreg LivAb FamAbroad Sex Rural, exposure(F)

y1
LivAb:FamAbroad 1.2039895
LivAb:Sex .77861417
LivAb:Rural .25909923
LivAb:_cons -3.3852964

I predicted the results with:

margins, over(FamAbroad Sex Rural)

-------------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
FamAbroad#Sex#Rural |
0 0 0 | 14.90175 2.261863 6.59 0.000 10.46858 19.33492
0 0 1 | 8.996295 1.467059 6.13 0.000 6.120912 11.87168
0 1 0 | 29.51157 3.945925 7.48 0.000 21.7777 37.24544
0 1 1 | 17.59039 2.585205 6.80 0.000 12.52348 22.6573
1 0 0 | 44.81888 5.259027 8.52 0.000 34.51137 55.12638
1 0 1 | 24.28309 3.269221 7.43 0.000 17.87553 30.69064
1 1 0 | 104.7678 9.088464 11.53 0.000 86.95475 122.5809
1 1 1 | 55.13022 6.051548 9.11 0.000 43.26941 66.99104
-------------------------------------------------------------------------------------

And by hand for the last observation (a rural male with family members abroad)

. display exp(1.2039895+.77861417+.25909923-3.3852964+ln(173))
55.130224

Hope this helps!

Best,
David

Web site:
http://investigadores.cide.edu/crow/

Las Américas y el Mundo:
http://lasamericasyelmundo.cide.edu/

==========================================
David Crow
Associate Professor, División de Estudios Internacionales
Centro de Investigación y Docencia Económicas (CIDE)
==========================================
1 like
Comment

Juta Kawalerowicz

Join Date: Apr 2014
Posts: 15

30 Apr 2014, 05:16

Dear David,
Many thanks for a detailed answer and your example. I was (am) a bit confused, here is why. I am modelling riot participation in neighborhoods. When I run a model with exposure term:

nbreg `depvar' `neighvars' `ethnicvars' `contextvars' `districtvars' , vce(cluster districtname) nolog exposure(residents)

Negative binomial regression Number of obs = 25022
Dispersion = mean Wald chi2(22) = 2109.33
Log pseudolikelihood = -5873.8109 Prob > chi2 = 0.0000

(Std. Err. adjusted for 32 clusters in districtname)

charged_days	Coef.	Std.	Err.	z	P>z	[95% Conf. Interval]

OA_youth	2.076011	0.674001	3.08	0.002	0.754993	3.39703
OA_owned	-0.3112	0.222187	-1.4	0.161	-0.74668	0.124275
OA_class1	-1.01423	0.783415	-1.29	0.195	-2.54969	0.521239
OA_class2	-0.57539	1.084162	-0.53	0.596	-2.70031	1.549529
OA_class3	0.158465	1.260051	0.13	0.9	-2.31119	2.62812
density	-0.88564	0.326333	-2.71	0.007	-1.52525	-0.24604
OA_recent_arrivals	-0.97694	1.380629	-0.71	0.479	-3.68292	1.729046
OA_condis	0.44638	0.089804	4.97	0	0.270367	0.622392
OA_whiteirish	-2.01125	2.869781	-0.7	0.483	-7.63592	3.613418
OA_whiteother	-1.52231	0.735314	-2.07	0.038	-2.9635	-0.08112
OA_black_african	0.267121	0.630103	0.42	0.672	-0.96786	1.5021
OA_black_carrib	2.485802	1.127984	2.2	0.028	0.274994	4.69661
OA_asian_pakistani	1.743263	0.879157	1.98	0.047	0.020148	3.466379
OA_asian_indian	-2.94992	0.999164	-2.95	0.003	-4.90824	-0.99159
OA_asian_bangladeshi	0.366331	0.626589	0.58	0.559	-0.86176	1.594422
OA_otherasian	-0.44456	1.196941	-0.37	0.71	-2.79052	1.901405
envy1500	-1.92559	0.477805	-4.03	0	-2.86207	-0.98911
diversity	1.763117	0.624773	2.82	0.005	0.538586	2.987649
d_footlocker2	0.058315	0.008486	6.87	0	0.041683	0.074948
respectvalue	-1.03069	0.18491	-5.57	0	-1.39311	-0.66827
growth_all1	-0.26934	1.262879	-0.21	0.831	-2.74454	2.205861
turnout	-0.2676	0.9362	-0.29	0.775	-2.10252	1.567315
_cons	-3.70375	1.344265	-2.76	0.006	-6.33846	-1.06903
ln(residents)	1	(exposure)

/lnalpha	0.588199	0.109279			0.374016	0.802382

alpha	1.800742	0.196784			1.45356	2.230848

But then when I put ln(residents) as an independent variable instead of exposure option I get

nbreg `depvar' `neighvars' `ethnicvars' `contextvars' `districtvars' lnres, vce(cluster districtname) nolog

Negative binomial regression Number of obs = 25022
Dispersion = mean Wald chi2(23) = 2166.48
Log pseudolikelihood = -5873.5848 Prob > chi2 = 0.0000

(Std. Err. adjusted for 32 clusters in districtname)

charged_days	Coef.	Std.	Err.	z	P>z	[95% Conf. Interval]

OA_youth	2.139978	0.671971	3.18	0.001	0.822939	3.457017
OA_owned	-0.30138	0.229663	-1.31	0.189	-0.75151	0.148748
OA_class1	-1.07792	0.782675	-1.38	0.168	-2.61194	0.456091
OA_class2	-0.6452	1.113922	-0.58	0.562	-2.82845	1.53805
OA_class3	0.140808	1.276893	0.11	0.912	-2.36186	2.643473
density	-0.87897	0.326099	-2.7	0.007	-1.51811	-0.23983
OA_recent_arrivals	-1.00519	1.373926	-0.73	0.464	-3.69804	1.687651
OA_condis	0.439533	0.089304	4.92	0	0.2645	0.614566
OA_whiteirish	-2.07785	2.860285	-0.73	0.468	-7.6839	3.528206
OA_whiteother	-1.52063	0.738365	-2.06	0.039	-2.9678	-0.07346
OA_black_african	0.274848	0.632185	0.43	0.664	-0.96421	1.513909
OA_black_carrib	2.475558	1.12428	2.2	0.028	0.272009	4.679106
OA_asian_pakistani	1.761014	0.880915	2	0.046	0.034453	3.487575
OA_asian_indian	-2.92643	0.991531	-2.95	0.003	-4.86979	-0.98306
OA_asian_bangladeshi	0.38848	0.621084	0.63	0.532	-0.82882	1.605782
OA_otherasian	-0.45449	1.190698	-0.38	0.703	-2.78822	1.879231
envy1500	-1.94574	0.478951	-4.06	0	-2.88446	-1.00701
diversity	1.782298	0.623968	2.86	0.004	0.559343	3.005253
d_footlocker2	0.05819	0.008568	6.79	0	0.041397	0.074982
respectvalue_boro~6m	-1.03072	0.184279	-5.59	0	-1.3919	-0.66954
growth_all1	-0.27068	1.260594	-0.21	0.83	-2.7414	2.200036
turnout	-0.25017	0.940662	-0.27	0.79	-2.09383	1.593497
lnres	0.919316	0.148985	6.17	0	0.627311	1.21132
_cons	-3.23876	1.406065	-2.3	0.021	-5.99459	-0.48292

/lnalpha	0.586368	0.109185			0.372369	0.800367

alpha	1.797448	0.196254			1.451168	2.226357

Where coefficient for lnres is not exactly equal to 1. In the meantime someone suggested that this should not be a problem and that I should just run

test lnres=1

( 1) [charged_days]lnres = 1

chi2( 1) = 0.29
Prob > chi2 = 0.5881

and assume that it's not statistically different from 1 which I think is a reasonable suggestion. Anyway, many thanks for replying.

Juta

Comment

David Crow

Join Date: Apr 2014

Posts: 37
#4

30 Apr 2014, 12:03

Dear Juta-

The problem with the second way you've specified the model--i.e., entering the logged exposure term directly in the model--is that the coefficient is estimated as a free parameter rather than being constrained to 1. What the "exposure" option does is exactly that: linearly restrict the coefficient of the exposure/offset term to 1.

The problem with the solution suggested to you is that 1) it won't always work (the fact that it works now is accidental) and 2) even though the parameter is not statistically different than 1, the predicted values will be off, possibly by a lot for large values of X. In fact, the difference between the values predicted by your model and the values predicted will increase exponentially with X.

To estimate your model manually, you would need to impose restrict the exposure coefficient to 1 "by hand," either with a design matrix or the "constraints" option. I'm not sure why you would want to do that. Is there some reason you can't use the "exposure," as in your first model?

Hope this help.

All the best,
David

Web site:
http://investigadores.cide.edu/crow/

Las Américas y el Mundo:
http://lasamericasyelmundo.cide.edu/

==========================================
David Crow
Associate Professor, División de Estudios Internacionales
Centro de Investigación y Docencia Económicas (CIDE)
==========================================
Comment

Announcement