Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Different results with survci vs stcurve

    Hello,

    I have been running some Cox regressions with stcox, and have realised that two commands for plotting the covariate-adjusted survival curve—namely, stcurve and survci—are giving very different results. Specifically, stcurve is outputting slower time to events.

    I couldn't see anything in the documentation of either command that explains why they might estimate the survival curve differently. In particular, they both say that they estimate the covariate-adjusted curve using the means of covariates, which is supported by the output text when running the commands.


    Can anyone provide some insight? I would like to estimate median survival times using one of these functions, but am not longer sure which one is best.



    Below is an example using a Stata dataset, for which I have attached the two curves. You can clearly see that the median time is completely different (as is confirmed by looking at the outfiles).

    Code:
    . use https://www.stata-press.com/data/r18/drugtr
    (Patient survival in drug trial)
    
    .  stset studytime, failure(died)
    
    Survival-time data settings
    
             Failure event: died!=0 & died<.
    Observed time interval: (0, studytime]
         Exit on or before: failure
    
    --------------------------------------------------------------------------
             48  total observations
              0  exclusions
    --------------------------------------------------------------------------
             48  observations remaining, representing
             31  failures in single-record/single-failure data
            744  total analysis time at risk and under observation
                                                    At risk from t =         0
                                         Earliest observed entry t =         0
                                              Last observed exit t =        39
    
    . stcox drug age
    
            Failure _d: died
      Analysis time _t: studytime
    
    Iteration 0:  Log likelihood = -99.911448
    Iteration 1:  Log likelihood = -83.551879
    Iteration 2:  Log likelihood = -83.324009
    Iteration 3:  Log likelihood = -83.323546
    Refining estimates:
    Iteration 0:  Log likelihood = -83.323546
    
    Cox regression with Breslow method for ties
    
    No. of subjects =  48                                   Number of obs =     48
    No. of failures =  31
    Time at risk    = 744
                                                            LR chi2(2)    =  33.18
    Log likelihood = -83.323546                             Prob > chi2   = 0.0000
    
    ------------------------------------------------------------------------------
              _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            drug |   .1048772   .0477017    -4.96   0.000     .0430057    .2557622
             age |   1.120325   .0417711     3.05   0.002     1.041375     1.20526
    ------------------------------------------------------------------------------
    
    . stcurve, survival outfile(test_drug)
    note: function evaluated at overall means of covariates.
    
    . survci, outfile(test_drug_survci)
    (drug=0.00; age=55.88)
    Graph from stcurve
    Click image for larger version

Name:	stcurve graph.png
Views:	1
Size:	49.1 KB
ID:	1750818

    Graph from survci
    Click image for larger version

Name:	survci graph.png
Views:	1
Size:	65.7 KB
ID:	1750819
    Last edited by Giulia Vivaldi; 22 Apr 2024, 10:56.

  • #2
    It seems that -survci- does not default to mean values of covariates:
    . survci, outfile(test_drug_survci)
    (drug=0.00; age=55.88)
    The mean value of the variable drug is not 0, though it does get the mean value of age correct. And if you force -survci- to use the actual mean value it matches the results from -stcurve-:
    Code:
    clear*
    use https://www.stata-press.com/data/r18/drugtr
    stset studytime, failure(died)
    
    stcox drug age
    
    stcurve, survival name(stcurve, replace)
    
    summ drug, meanonly
    survci, at(drug = `r(mean)') name(survci, replace)
    
    graph combine stcurve survci, altshrink nocopies
    gives us
    Click image for larger version

Name:	stcurve_vs_curveci.png
Views:	1
Size:	141.8 KB
ID:	1750827

    You can see that now the two curves agree.

    Moreover, -survci- does not promise it will set all variables to their means. From the help file:
    at(varname=# [varname=# ...]) specifies the values of the covariates used in stcox for which the estimates of the plotted function are to be computed. If left
    unspecified, continuous covariates will be set to their mean values, and factor variables will be set to their base levels. [emphasis added]

    Comment


    • #3
      Thank you for untangling that for me, Clyde. I did a lot of assuming and not enough checking!

      Presumably, when dealing with factor covariates, setting them to base levels for such curves is preferable since it better reflects what the variable is meant to represent—is that correct?

      Comment


      • #4
        I think it depends on the context and the specific goals of the research. Factor variables being used primarily to represent nominal (categorical) variables, it is odd to calculate their means and use them. After all, the mean will be something between 0 and 1 and, in principle, not even a possible value for the variable. But, if the model is intended to reflect overall population trends, rather than modeling individual level outcomes, then using the mean is actually appropriate, as it represents the prevalence of that category in the population.

        Even in the absence of population-level research questions, with factor-variables there is the question of which level to use in -survci-. By default, that program chooses the base category--which respects whatever choice you have made previously. But how does one choose the base category in the first place? Sometimes there is a clear rationale. But in many situations there is nothing particularly privileged about any one level of a categorical variable and the choice is arbitrary. So in that case, the means of the indicators, reflecting prevalence of the categories, might be a more meaningful "compromise" than singling out any one level.

        tl;dr It depends!

        Comment

        Working...
        X