Estiamtion of a Confidence interval for a proportion in non-independent data

Patrick Schober

Join Date: Mar 2016

Posts: 17
#1

Estiamtion of a Confidence interval for a proportion in non-independent data

07 Nov 2019, 02:27

Dear Statlist,

I am analysing a dataset in which researchers recorded whether or not a complication occurred after a surgical intervention. I am now trying to estimate the proportion of interventions in which one (or more) complications occur, or in other words, the risk of having a complication (at least one complication) when undergoing the intervention.

This seemingly easy problem gets complicated by the fact that there are several patients who had more than 1 (up to 5) interventions.

The outcome is binary: at least one complication occurred yes/no.

There were 50 interventions, in which 20 were complicated by at least one complication. So, the risk of at least one complication when undergoing this intervention at first glance seems to be 0.4, with a binomial exact 95% CI of (.2640784, .548206).

. ci proportion complication

-- Binomial Exact --
Variable | Obs Proportion Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------
complication | 50 .4 .069282 .2640784 .548206

However, the 50 interventions were in 28 subjects, who do not contribute independent information. Some of the subjects who had the intervention had a complication almost each time, so the question is whether this is a problem of the intervention, or a problem of the patient…

I am not aware of a simple method to deal with this problem (if there is, please let me know!). So I thought a good way of estimating the probability while accounting for repeated observations in patients would be to use a logistic mixed effects model without independent variables, and the intercept should be the log-odds of having a complication, which I could then convert back to a probability.

I used the following code (“complication” is the binary dependent outcome , and “id” the patient identifier):

. melogit complication || id:, or nolog

Mixed-effects logistic regression Number of obs = 50
Group variable: id Number of groups = 28

Obs per group:
min = 1
avg = 1.8
max = 5

Integration method: mvaghermite Integration pts. = 7

Wald chi2(0) = .
Log likelihood = -28.909638 Prob > chi2 = .
------------------------------------------------------------------------------
complication | Odds Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .2597622 .2667601 -1.31 0.189 .0347091 1.944054
-------------+----------------------------------------------------------------
id |
var(_cons)| 8.304876 8.75897 1.050968 65.62611
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
LR test vs. logistic model: chibar2(01) = 9.48 Prob >= chibar2 = 0.0010

The intercept estimate is already on the odds scale, and converting this back to a probability gives .20619939, 95% CI (.03354479, .66033232):

. disp "probability=".2597622/(1+.2597622)
probability=.20619939

. disp "lower 95%CI limit=".0347091/(1+.0347091)
lower 95%CI limit=.03354479

. disp "upper 95%CI limit=" 1.944054/(1+1.944054)
upper 95%CI limit=.66033232

However, when using the margins command to estimate the probability and CI, I get a totally different result:

. margins
Warning: prediction constant over observations.

Predictive margins Number of obs = 50
Model VCE : OIM

Expression : Marginal predicted mean, predict()

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .338628 .0823884 4.11 0.000 .1771498 .5001062
------------------------------------------------------------------------------

I was wondering why there is a major discrepancy between what I calculate from the intercept, and what Stata calculates in the margins command. Strange enough, when I run a "normal" logistic regression model without mixed effects, my own calculation and margins give the same result for the point estimate, and only slightly different confidence limits:

. logistic complication

Logistic regression Number of obs = 50
LR chi2(0) = -0.00
Prob > chi2 = .
Log likelihood = -33.650583 Pseudo R2 = -0.0000

------------------------------------------------------------------------------
complication | Odds Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .6666667 .1924501 -1.40 0.160 .3786065 1.173896
------------------------------------------------------------------------------

. margins
Warning: prediction constant over observations.

Predictive margins Number of obs = 50
Model VCE : OIM

Expression : Pr(complication), predict()

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .4 .069282 5.77 0.000 .2642097 .5357903
------------------------------------------------------------------------------

. disp "probability=".6666667/(1+.6666667)
probability=.40000001

. disp "lower 95%CI limit=".3786065/(1+.3786065)
lower 95%CI limit=.27462985

. disp "upper 95%CI limit=" 1.173896/(1+1.173896)
upper 95%CI limit=.53999639

Any ideas on where this discrepancy comes from, and whether I should use the intercept estimate and its confidence interval or the margins prediction to estimate the risk of having at least one complication?

Moreover, is my approach of estimating the proportion and CI generally valid, or is there some other technique that would work better?

Thank you,

Patrick

Attachment: original data

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(id complication) 1 1 1 1 1 1 2 0 3 0 4 0 4 0 4 0 4 0 4 1 5 1 5 0 6 1 6 1 6 1 6 1 7 0 7 1 8 0 9 0 10 1 11 1 12 1 13 0 14 0 15 0 15 0 16 0 16 1 16 1 16 1 17 0 18 0 19 0 20 0 20 0 20 0 20 0 20 0 21 0 22 0 23 0 24 1 24 1 24 1 25 0 26 0 26 0 27 0 28 1 end
Tags: None
skolenik

Join Date: Mar 2014

Posts: 102
#2

07 Nov 2019, 10:41

Trust margins, not your back of the envelope calculations. First and foremost, you entirely miss the between-id variance... and it is HUGE, a variance of 8 / standard deviation of 2.8 on the log odds scale means essentially that a given patient either has complications all the time (a positive random effect of about 2.8 means probability of complication of 81%), or never has them (a negative random effect of -2.8, probability of 1.5%). -margins- does this properly.

Also, you could entertain clustering in the descriptive statistic:

Code:

. proportion complication, cluster(id) Proportion estimation Number of obs = 50 (Std. Err. adjusted for 28 clusters in id) -------------------------------------------------------------- | Robust Logit | Proportion Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ complication | 0 | .6 .1002752 .3889289 .7794986 1 | .4 .1002752 .2205014 .6110711 --------------------------------------------------------------

-- Stas Kolenikov || http://stas.kolenikov.name
-- Principal Survey Scientist, Abt SRBI
-- Opinions stated in this post are mine only
1 like
Comment
Patrick Schober

Join Date: Mar 2016

Posts: 17
#3

07 Nov 2019, 13:19

Thank you very much! This is very helpful!
Best regards,
Patrick
Comment

Announcement

Estiamtion of a Confidence interval for a proportion in non-independent data

Comment

Comment