Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estiamtion of a Confidence interval for a proportion in non-independent data

    Dear Statlist,

    I am analysing a dataset in which researchers recorded whether or not a complication occurred after a surgical intervention. I am now trying to estimate the proportion of interventions in which one (or more) complications occur, or in other words, the risk of having a complication (at least one complication) when undergoing the intervention.

    This seemingly easy problem gets complicated by the fact that there are several patients who had more than 1 (up to 5) interventions.

    The outcome is binary: at least one complication occurred yes/no.

    There were 50 interventions, in which 20 were complicated by at least one complication. So, the risk of at least one complication when undergoing this intervention at first glance seems to be 0.4, with a binomial exact 95% CI of (.2640784, .548206).

    . ci proportion complication

    -- Binomial Exact --
    Variable | Obs Proportion Std. Err. [95% Conf. Interval]
    -------------+---------------------------------------------------------------
    complication | 50 .4 .069282 .2640784 .548206



    However, the 50 interventions were in 28 subjects, who do not contribute independent information. Some of the subjects who had the intervention had a complication almost each time, so the question is whether this is a problem of the intervention, or a problem of the patient…

    I am not aware of a simple method to deal with this problem (if there is, please let me know!). So I thought a good way of estimating the probability while accounting for repeated observations in patients would be to use a logistic mixed effects model without independent variables, and the intercept should be the log-odds of having a complication, which I could then convert back to a probability.

    I used the following code (“complication” is the binary dependent outcome , and “id” the patient identifier):


    . melogit complication || id:, or nolog

    Mixed-effects logistic regression Number of obs = 50
    Group variable: id Number of groups = 28

    Obs per group:
    min = 1
    avg = 1.8
    max = 5

    Integration method: mvaghermite Integration pts. = 7

    Wald chi2(0) = .
    Log likelihood = -28.909638 Prob > chi2 = .
    ------------------------------------------------------------------------------
    complication | Odds Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    _cons | .2597622 .2667601 -1.31 0.189 .0347091 1.944054
    -------------+----------------------------------------------------------------
    id |
    var(_cons)| 8.304876 8.75897 1.050968 65.62611
    ------------------------------------------------------------------------------
    Note: Estimates are transformed only in the first equation.
    LR test vs. logistic model: chibar2(01) = 9.48 Prob >= chibar2 = 0.0010



    The intercept estimate is already on the odds scale, and converting this back to a probability gives .20619939, 95% CI (.03354479, .66033232):


    . disp "probability=".2597622/(1+.2597622)
    probability=.20619939

    . disp "lower 95%CI limit=".0347091/(1+.0347091)
    lower 95%CI limit=.03354479

    . disp "upper 95%CI limit=" 1.944054/(1+1.944054)
    upper 95%CI limit=.66033232



    However, when using the margins command to estimate the probability and CI, I get a totally different result:


    . margins
    Warning: prediction constant over observations.

    Predictive margins Number of obs = 50
    Model VCE : OIM

    Expression : Marginal predicted mean, predict()

    ------------------------------------------------------------------------------
    | Delta-method
    | Margin Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    _cons | .338628 .0823884 4.11 0.000 .1771498 .5001062
    ------------------------------------------------------------------------------



    I was wondering why there is a major discrepancy between what I calculate from the intercept, and what Stata calculates in the margins command. Strange enough, when I run a "normal" logistic regression model without mixed effects, my own calculation and margins give the same result for the point estimate, and only slightly different confidence limits:


    . logistic complication

    Logistic regression Number of obs = 50
    LR chi2(0) = -0.00
    Prob > chi2 = .
    Log likelihood = -33.650583 Pseudo R2 = -0.0000

    ------------------------------------------------------------------------------
    complication | Odds Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    _cons | .6666667 .1924501 -1.40 0.160 .3786065 1.173896
    ------------------------------------------------------------------------------

    . margins
    Warning: prediction constant over observations.

    Predictive margins Number of obs = 50
    Model VCE : OIM

    Expression : Pr(complication), predict()

    ------------------------------------------------------------------------------
    | Delta-method
    | Margin Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    _cons | .4 .069282 5.77 0.000 .2642097 .5357903
    ------------------------------------------------------------------------------

    . disp "probability=".6666667/(1+.6666667)
    probability=.40000001

    . disp "lower 95%CI limit=".3786065/(1+.3786065)
    lower 95%CI limit=.27462985

    . disp "upper 95%CI limit=" 1.173896/(1+1.173896)
    upper 95%CI limit=.53999639



    Any ideas on where this discrepancy comes from, and whether I should use the intercept estimate and its confidence interval or the margins prediction to estimate the risk of having at least one complication?

    Moreover, is my approach of estimating the proportion and CI generally valid, or is there some other technique that would work better?

    Thank you,

    Patrick

    Attachment: original data

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id complication)
     1 1
     1 1
     1 1
     2 0
     3 0
     4 0
     4 0
     4 0
     4 0
     4 1
     5 1
     5 0
     6 1
     6 1
     6 1
     6 1
     7 0
     7 1
     8 0
     9 0
    10 1
    11 1
    12 1
    13 0
    14 0
    15 0
    15 0
    16 0
    16 1
    16 1
    16 1
    17 0
    18 0
    19 0
    20 0
    20 0
    20 0
    20 0
    20 0
    21 0
    22 0
    23 0
    24 1
    24 1
    24 1
    25 0
    26 0
    26 0
    27 0
    28 1
    end

  • #2
    Trust margins, not your back of the envelope calculations. First and foremost, you entirely miss the between-id variance... and it is HUGE, a variance of 8 / standard deviation of 2.8 on the log odds scale means essentially that a given patient either has complications all the time (a positive random effect of about 2.8 means probability of complication of 81%), or never has them (a negative random effect of -2.8, probability of 1.5%). -margins- does this properly.

    Also, you could entertain clustering in the descriptive statistic:

    Code:
    . proportion complication, cluster(id)
    
    Proportion estimation             Number of obs   =         50
    
                        (Std. Err. adjusted for 28 clusters in id)
    --------------------------------------------------------------
                 |               Robust              Logit
                 | Proportion   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
    complication |
              0  |         .6   .1002752      .3889289    .7794986
              1  |         .4   .1002752      .2205014    .6110711
    --------------------------------------------------------------
    -- Stas Kolenikov || http://stas.kolenikov.name
    -- Principal Survey Scientist, Abt SRBI
    -- Opinions stated in this post are mine only

    Comment


    • #3
      Thank you very much! This is very helpful!
      Best regards,
      Patrick

      Comment

      Working...
      X