Dear Statlist,
I am analysing a dataset in which researchers recorded whether or not a complication occurred after a surgical intervention. I am now trying to estimate the proportion of interventions in which one (or more) complications occur, or in other words, the risk of having a complication (at least one complication) when undergoing the intervention.
This seemingly easy problem gets complicated by the fact that there are several patients who had more than 1 (up to 5) interventions.
The outcome is binary: at least one complication occurred yes/no.
There were 50 interventions, in which 20 were complicated by at least one complication. So, the risk of at least one complication when undergoing this intervention at first glance seems to be 0.4, with a binomial exact 95% CI of (.2640784, .548206).
. ci proportion complication
-- Binomial Exact --
Variable | Obs Proportion Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------
complication | 50 .4 .069282 .2640784 .548206
However, the 50 interventions were in 28 subjects, who do not contribute independent information. Some of the subjects who had the intervention had a complication almost each time, so the question is whether this is a problem of the intervention, or a problem of the patient…
I am not aware of a simple method to deal with this problem (if there is, please let me know!). So I thought a good way of estimating the probability while accounting for repeated observations in patients would be to use a logistic mixed effects model without independent variables, and the intercept should be the log-odds of having a complication, which I could then convert back to a probability.
I used the following code (“complication” is the binary dependent outcome , and “id” the patient identifier):
. melogit complication || id:, or nolog
Mixed-effects logistic regression Number of obs = 50
Group variable: id Number of groups = 28
Obs per group:
min = 1
avg = 1.8
max = 5
Integration method: mvaghermite Integration pts. = 7
Wald chi2(0) = .
Log likelihood = -28.909638 Prob > chi2 = .
------------------------------------------------------------------------------
complication | Odds Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .2597622 .2667601 -1.31 0.189 .0347091 1.944054
-------------+----------------------------------------------------------------
id |
var(_cons)| 8.304876 8.75897 1.050968 65.62611
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
LR test vs. logistic model: chibar2(01) = 9.48 Prob >= chibar2 = 0.0010
The intercept estimate is already on the odds scale, and converting this back to a probability gives .20619939, 95% CI (.03354479, .66033232):
. disp "probability=".2597622/(1+.2597622)
probability=.20619939
. disp "lower 95%CI limit=".0347091/(1+.0347091)
lower 95%CI limit=.03354479
. disp "upper 95%CI limit=" 1.944054/(1+1.944054)
upper 95%CI limit=.66033232
However, when using the margins command to estimate the probability and CI, I get a totally different result:
. margins
Warning: prediction constant over observations.
Predictive margins Number of obs = 50
Model VCE : OIM
Expression : Marginal predicted mean, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .338628 .0823884 4.11 0.000 .1771498 .5001062
------------------------------------------------------------------------------
I was wondering why there is a major discrepancy between what I calculate from the intercept, and what Stata calculates in the margins command. Strange enough, when I run a "normal" logistic regression model without mixed effects, my own calculation and margins give the same result for the point estimate, and only slightly different confidence limits:
. logistic complication
Logistic regression Number of obs = 50
LR chi2(0) = -0.00
Prob > chi2 = .
Log likelihood = -33.650583 Pseudo R2 = -0.0000
------------------------------------------------------------------------------
complication | Odds Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .6666667 .1924501 -1.40 0.160 .3786065 1.173896
------------------------------------------------------------------------------
. margins
Warning: prediction constant over observations.
Predictive margins Number of obs = 50
Model VCE : OIM
Expression : Pr(complication), predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .4 .069282 5.77 0.000 .2642097 .5357903
------------------------------------------------------------------------------
. disp "probability=".6666667/(1+.6666667)
probability=.40000001
. disp "lower 95%CI limit=".3786065/(1+.3786065)
lower 95%CI limit=.27462985
. disp "upper 95%CI limit=" 1.173896/(1+1.173896)
upper 95%CI limit=.53999639
Any ideas on where this discrepancy comes from, and whether I should use the intercept estimate and its confidence interval or the margins prediction to estimate the risk of having at least one complication?
Moreover, is my approach of estimating the proportion and CI generally valid, or is there some other technique that would work better?
Thank you,
Patrick
Attachment: original data
I am analysing a dataset in which researchers recorded whether or not a complication occurred after a surgical intervention. I am now trying to estimate the proportion of interventions in which one (or more) complications occur, or in other words, the risk of having a complication (at least one complication) when undergoing the intervention.
This seemingly easy problem gets complicated by the fact that there are several patients who had more than 1 (up to 5) interventions.
The outcome is binary: at least one complication occurred yes/no.
There were 50 interventions, in which 20 were complicated by at least one complication. So, the risk of at least one complication when undergoing this intervention at first glance seems to be 0.4, with a binomial exact 95% CI of (.2640784, .548206).
. ci proportion complication
-- Binomial Exact --
Variable | Obs Proportion Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------
complication | 50 .4 .069282 .2640784 .548206
However, the 50 interventions were in 28 subjects, who do not contribute independent information. Some of the subjects who had the intervention had a complication almost each time, so the question is whether this is a problem of the intervention, or a problem of the patient…
I am not aware of a simple method to deal with this problem (if there is, please let me know!). So I thought a good way of estimating the probability while accounting for repeated observations in patients would be to use a logistic mixed effects model without independent variables, and the intercept should be the log-odds of having a complication, which I could then convert back to a probability.
I used the following code (“complication” is the binary dependent outcome , and “id” the patient identifier):
. melogit complication || id:, or nolog
Mixed-effects logistic regression Number of obs = 50
Group variable: id Number of groups = 28
Obs per group:
min = 1
avg = 1.8
max = 5
Integration method: mvaghermite Integration pts. = 7
Wald chi2(0) = .
Log likelihood = -28.909638 Prob > chi2 = .
------------------------------------------------------------------------------
complication | Odds Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .2597622 .2667601 -1.31 0.189 .0347091 1.944054
-------------+----------------------------------------------------------------
id |
var(_cons)| 8.304876 8.75897 1.050968 65.62611
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
LR test vs. logistic model: chibar2(01) = 9.48 Prob >= chibar2 = 0.0010
The intercept estimate is already on the odds scale, and converting this back to a probability gives .20619939, 95% CI (.03354479, .66033232):
. disp "probability=".2597622/(1+.2597622)
probability=.20619939
. disp "lower 95%CI limit=".0347091/(1+.0347091)
lower 95%CI limit=.03354479
. disp "upper 95%CI limit=" 1.944054/(1+1.944054)
upper 95%CI limit=.66033232
However, when using the margins command to estimate the probability and CI, I get a totally different result:
. margins
Warning: prediction constant over observations.
Predictive margins Number of obs = 50
Model VCE : OIM
Expression : Marginal predicted mean, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .338628 .0823884 4.11 0.000 .1771498 .5001062
------------------------------------------------------------------------------
I was wondering why there is a major discrepancy between what I calculate from the intercept, and what Stata calculates in the margins command. Strange enough, when I run a "normal" logistic regression model without mixed effects, my own calculation and margins give the same result for the point estimate, and only slightly different confidence limits:
. logistic complication
Logistic regression Number of obs = 50
LR chi2(0) = -0.00
Prob > chi2 = .
Log likelihood = -33.650583 Pseudo R2 = -0.0000
------------------------------------------------------------------------------
complication | Odds Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .6666667 .1924501 -1.40 0.160 .3786065 1.173896
------------------------------------------------------------------------------
. margins
Warning: prediction constant over observations.
Predictive margins Number of obs = 50
Model VCE : OIM
Expression : Pr(complication), predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .4 .069282 5.77 0.000 .2642097 .5357903
------------------------------------------------------------------------------
. disp "probability=".6666667/(1+.6666667)
probability=.40000001
. disp "lower 95%CI limit=".3786065/(1+.3786065)
lower 95%CI limit=.27462985
. disp "upper 95%CI limit=" 1.173896/(1+1.173896)
upper 95%CI limit=.53999639
Any ideas on where this discrepancy comes from, and whether I should use the intercept estimate and its confidence interval or the margins prediction to estimate the risk of having at least one complication?
Moreover, is my approach of estimating the proportion and CI generally valid, or is there some other technique that would work better?
Thank you,
Patrick
Attachment: original data
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(id complication) 1 1 1 1 1 1 2 0 3 0 4 0 4 0 4 0 4 0 4 1 5 1 5 0 6 1 6 1 6 1 6 1 7 0 7 1 8 0 9 0 10 1 11 1 12 1 13 0 14 0 15 0 15 0 16 0 16 1 16 1 16 1 17 0 18 0 19 0 20 0 20 0 20 0 20 0 20 0 21 0 22 0 23 0 24 1 24 1 24 1 25 0 26 0 26 0 27 0 28 1 end
Comment