Dear Statalisters-
I'm estimating some models with offsets (log rate, binomial, etc.) for aggregate data. It appears that "margins" doesn't give the correct predicted values after some of these models, though "predict" does. My question: am I doing something wrong, or does "margins" not work well after models with offsets?
Here's the data:
. input FamAb Sex Rural LivAb F
FamAb Sex Rural LivAb F
1. 0 0 0 11 440
2. 0 0 1 8 205
3. 0 1 0 28 400
4. 0 1 1 24 184
5. 1 0 0 50 397
6. 1 0 1 24 166
7. 1 1 0 105 426
8. 1 1 1 50 173
9. end
The table represents the frequency of respondents who report having lived abroad (LivAb, fourth column), out of the number of people "at risk" of having lived abroad (F, fifth column) in each of eight categories of a full cross-classification of three independent binary variables: having family members abroad (FamAb), sex (Male=1), and rural vs. urban residence (Rural=1).
Here are two possible models for the data: 1) a log rate model with F as the offset term (specified with the "exposure" option) and 2) a binomial regression in which F enters the model as the denominator of a dependent variable specified as a proportion.
Model 1:
glm LivAb FamAbroad Sex Rural, family(poisson) trace exposure(F)
And the results:
The problem is this. When I use "margins" after the model to predict the frequencies of "LivAb", the results are patently wrong:
margins, at(FamAb=(0 1) Sex=(0 1) Rural=(0 1))
Here are the predictions:
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | 10.12218 1.536396 6.59 0.000 7.110899 13.13346
2 | 13.11594 2.138864 6.13 0.000 8.923843 17.30804
3 | 22.05068 2.948346 7.48 0.000 16.27202 27.82933
4 | 28.57244 4.199202 6.80 0.000 20.34215 36.80272
5 | 33.74116 3.959172 8.52 0.000 25.98133 41.501
6 | 43.72053 5.886074 7.43 0.000 32.18404 55.25702
7 | 73.50348 6.376324 11.53 0.000 61.00611 86.00084
8 | 95.24304 10.45466 9.11 0.000 74.75229 115.7338
------------------------------------------------------------------------------
And here are the predictions recovered using "predict"
predict n1
+----------------------------------------------+
| FamAb Sex Rural LivAb F n1 |
|----------------------------------------------|
1. | 0 0 0 11 440 14.90174 |
2. | 0 0 1 8 205 8.996295 |
3. | 0 1 0 28 400 29.51157 |
4. | 0 1 1 24 184 17.59039 |
5. | 1 0 0 50 397 44.81887 |
|----------------------------------------------|
6. | 1 0 1 24 166 24.28309 |
7. | 1 1 0 105 426 104.7678 |
8. | 1 1 1 50 173 55.13023 |
+----------------------------------------------+
It's easy to verify that the predictions with "margins" are wrong, and those with "predict" are right. For example, the predicted frequency for a rural male with family members abroad is 55.13, not the 95.24 predicted by "margins":
display exp(1.203989 + .7786142 + .2590993 - 3.385296 + ln(173))
55.130224
Model 2
The same goes for this binomial regression (the canonical link is "logit", so the model is logit for grouped data).
glm LivAb FamAbroad Sex Rural, family(binomial F)
Again, the predictions obtained with "predict" are correct, and those with "margins," at least how I specified the command (again, "margins, at(FamAb=(0 1) Sex=(0 1) Rural=(0 1))"), wrong. (I spare readers a tedious rehearsal of the details--unless you want one!
So, am I specifying "margins" wrong, or does "margins" not work with offsets?
Thanks,
David
-- Personal Web site:
http://investigadores.cide.edu/crow/
I'm estimating some models with offsets (log rate, binomial, etc.) for aggregate data. It appears that "margins" doesn't give the correct predicted values after some of these models, though "predict" does. My question: am I doing something wrong, or does "margins" not work well after models with offsets?
Here's the data:
. input FamAb Sex Rural LivAb F
FamAb Sex Rural LivAb F
1. 0 0 0 11 440
2. 0 0 1 8 205
3. 0 1 0 28 400
4. 0 1 1 24 184
5. 1 0 0 50 397
6. 1 0 1 24 166
7. 1 1 0 105 426
8. 1 1 1 50 173
9. end
The table represents the frequency of respondents who report having lived abroad (LivAb, fourth column), out of the number of people "at risk" of having lived abroad (F, fifth column) in each of eight categories of a full cross-classification of three independent binary variables: having family members abroad (FamAb), sex (Male=1), and rural vs. urban residence (Rural=1).
Here are two possible models for the data: 1) a log rate model with F as the offset term (specified with the "exposure" option) and 2) a binomial regression in which F enters the model as the denominator of a dependent variable specified as a proportion.
Model 1:
glm LivAb FamAbroad Sex Rural, family(poisson) trace exposure(F)
And the results:
OIM | ||||||
LivAb | Coef. | Std. Err. | z | P>z | [95% Conf. | Interval] |
FamAbroad | 1.203989 | .1359458 | 8.86 | 0.000 | .9375405 | 1.470438 |
Sex | .7786142 | .1249003 | 6.23 | 0.000 | .5338142 | 1.023414 |
Rural | .2590993 | .1208217 | 2.14 | 0.032 | .022293 | .4959055 |
_cons | -3.385296 | .1517851 | -22.30 | 0.000 | -3.68279 | -3.087803 |
ln(F) | 1 | (exposure) | ||||
The problem is this. When I use "margins" after the model to predict the frequencies of "LivAb", the results are patently wrong:
margins, at(FamAb=(0 1) Sex=(0 1) Rural=(0 1))
Here are the predictions:
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | 10.12218 1.536396 6.59 0.000 7.110899 13.13346
2 | 13.11594 2.138864 6.13 0.000 8.923843 17.30804
3 | 22.05068 2.948346 7.48 0.000 16.27202 27.82933
4 | 28.57244 4.199202 6.80 0.000 20.34215 36.80272
5 | 33.74116 3.959172 8.52 0.000 25.98133 41.501
6 | 43.72053 5.886074 7.43 0.000 32.18404 55.25702
7 | 73.50348 6.376324 11.53 0.000 61.00611 86.00084
8 | 95.24304 10.45466 9.11 0.000 74.75229 115.7338
------------------------------------------------------------------------------
And here are the predictions recovered using "predict"
predict n1
+----------------------------------------------+
| FamAb Sex Rural LivAb F n1 |
|----------------------------------------------|
1. | 0 0 0 11 440 14.90174 |
2. | 0 0 1 8 205 8.996295 |
3. | 0 1 0 28 400 29.51157 |
4. | 0 1 1 24 184 17.59039 |
5. | 1 0 0 50 397 44.81887 |
|----------------------------------------------|
6. | 1 0 1 24 166 24.28309 |
7. | 1 1 0 105 426 104.7678 |
8. | 1 1 1 50 173 55.13023 |
+----------------------------------------------+
It's easy to verify that the predictions with "margins" are wrong, and those with "predict" are right. For example, the predicted frequency for a rural male with family members abroad is 55.13, not the 95.24 predicted by "margins":
display exp(1.203989 + .7786142 + .2590993 - 3.385296 + ln(173))
55.130224
Model 2
The same goes for this binomial regression (the canonical link is "logit", so the model is logit for grouped data).
glm LivAb FamAbroad Sex Rural, family(binomial F)
Again, the predictions obtained with "predict" are correct, and those with "margins," at least how I specified the command (again, "margins, at(FamAb=(0 1) Sex=(0 1) Rural=(0 1))"), wrong. (I spare readers a tedious rehearsal of the details--unless you want one!
data:image/s3,"s3://crabby-images/785e0/785e0ddaca7694bb1aa33b461f2ecb5ba7c8476e" alt="Wink"
So, am I specifying "margins" wrong, or does "margins" not work with offsets?
Thanks,
David
-- Personal Web site:
http://investigadores.cide.edu/crow/
Comment