Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing upper bound of 95% confidence interval on stci median (Kaplan-Meier estimator) despite sufficient data

    When extracting an estimate of the median time to failure using a Kaplan Meier estimator (in failure format), I am getting 95% confidence interval outputs that are missing an upper bound.

    To obtain this median, I am using the basic survival code:

    Code:
    stset time, id(participantid) failure (fail == 1) origin(min) scale(30.437)
    stci
    Missing Upper Bound.png



    For the following reasons, I cannot determine why the upper limit of the 95% confidence interval would be missing:
    1. The analysis is limited to participants who experience the failure; there is no censoring and median failure must be reached.
    2. The time at risk in the analysis extends far above the median survival estimate which should leave room in for an upper confidence interval bound.
    In the stci manual (https://www.stata.com/manuals/ststci.pdf), I am unable to find specifics of what methods are used to calculate the 95% confidence interval for the median, though it appears to be based on a non-parametric formula.

    Would anyone be able to explain what might be causing the missing upper bound?

  • #2
    You are correct that this is using a non-parametric method, it is using Kaplan-Meier (product-limit) estimates. It is explained in the PDF manual following -help stci- in the section called "Methods and Formulas".

    Let's create fake data using the same tiny sample size, n=6. The distibtution of failure times here is immaterial for our purposes.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id t fail)
    1  2 1
    2  4 1
    3  5 1
    4  8 1
    5  8 1
    6 10 1
    end
    Now let's stset the data, note all failures. I will also list out the Kaplan-Meier survival estimates, along with the default -stci- output which estimates the median survival time.

    Code:
    . stset t, fail(fail) id(id)
    
    . sts list
    
            Failure _d: fail
      Analysis time _t: t
           ID variable: id
    
    Kaplan–Meier survivor function
    
                 At           Net    Survivor      Std.
      Time     risk   Fail   lost    function     error     [95% conf. int.]
    ------------------------------------------------------------------------
         2        6      1      0      0.8333    0.1521     0.2731    0.9747
         4        5      1      0      0.6667    0.1925     0.1946    0.9044
         5        4      1      0      0.5000    0.2041     0.1109    0.8037
         8        3      2      0      0.1667    0.1521     0.0077    0.5168
        10        1      1      0      0.0000         .          .         .
    ------------------------------------------------------------------------
    Note: Net lost equals the number lost minus the number who entered.
    
    . stci
    
            Failure _d: fail
      Analysis time _t: t
           ID variable: id
    
                 | Number of
                 |  subjects         50%      Std. err.    [95% conf. interval]
    -------------+-------------------------------------------------------------
           Total |         6           5      1.632993            2          .
    The percentiles of surivival time are a function of the Kaplan-Meier survival estimates. The standard errors listed in the the KM estimates are based on Greenwood's method, but these are not used for computation of the confidence interval for KM survival estimates because of numerical issues (namely, they can result in invalid values outside the range of [0, 1], and are inefficient compared to other methods). Instead, those confidence intervals are derived from maximum likelihood estimators for the confidence interval of ln(-ln(S(t))). The logic is similar for the standard errors and confidence intervals for estimates of the pth percentile of survival time.

    From the manual

    For a given confidence level, the upper confidence limit for the pth percentile is defined as the first time at which the upper confidence limit for S(t) (based on a ln(-lnS(t)) transformation) is less than or equal to 1-p/100, and, similarly, the lower confidence limit is defined as the first time at which the lower confidence limit of S(t) is less than or equal to 1 - p/100.
    From this logic, to estimate the median (50th percentile) of the failure time, the failure time in which the lower bound of the KM survival estimate is less than 0.5 (= 1-50/100) is t=2. So t=2 is the lower bound of the the confidence interval for the median survival time. However, there is never an observed failure time for which the upper bound of the confidence interval for the KM estimate is less than 0.5, despite coming very close in this example. The result is the upper bound is missing.

    If we were to slightly change our request, and ask for the 48th percentile of survival time and its confidence interval, we no see that both lower and upper bound are present. (This of course is to demonstrate what is happening, and I am not advocating that you do this in your example to "fudge" a confidence interval.)

    Code:
    . stci, p(48)
    
            Failure _d: fail
      Analysis time _t: t
           ID variable: id
    
                 | Number of
                 |  subjects         48%      Std. err.    [95% conf. interval]
    -------------+-------------------------------------------------------------
           Total |         6           5      1.632993            2          8
    Last edited by Leonardo Guizzetti; 15 Feb 2022, 22:34.

    Comment


    • #3
      Thank you for your detailed explanation of this concept, Dr. Guizzetti; this was immensely helpful and clarifying.

      Sincerely,
      Andrew

      Comment

      Working...
      X