Different results with survci vs stcurve

Giulia Vivaldi

Join Date: Jan 2023
Posts: 8

Different results with survci vs stcurve

22 Apr 2024, 09:52

Hello,

I have been running some Cox regressions with stcox, and have realised that two commands for plotting the covariate-adjusted survival curve—namely, stcurve and survci—are giving very different results. Specifically, stcurve is outputting slower time to events.

I couldn't see anything in the documentation of either command that explains why they might estimate the survival curve differently. In particular, they both say that they estimate the covariate-adjusted curve using the means of covariates, which is supported by the output text when running the commands.

Can anyone provide some insight? I would like to estimate median survival times using one of these functions, but am not longer sure which one is best.

Below is an example using a Stata dataset, for which I have attached the two curves. You can clearly see that the median time is completely different (as is confirmed by looking at the outfiles).

Code:

. use https://www.stata-press.com/data/r18/drugtr
(Patient survival in drug trial)

.  stset studytime, failure(died)

Survival-time data settings

         Failure event: died!=0 & died<.
Observed time interval: (0, studytime]
     Exit on or before: failure

--------------------------------------------------------------------------
         48  total observations
          0  exclusions
--------------------------------------------------------------------------
         48  observations remaining, representing
         31  failures in single-record/single-failure data
        744  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        39

. stcox drug age

        Failure _d: died
  Analysis time _t: studytime

Iteration 0:  Log likelihood = -99.911448
Iteration 1:  Log likelihood = -83.551879
Iteration 2:  Log likelihood = -83.324009
Iteration 3:  Log likelihood = -83.323546
Refining estimates:
Iteration 0:  Log likelihood = -83.323546

Cox regression with Breslow method for ties

No. of subjects =  48                                   Number of obs =     48
No. of failures =  31
Time at risk    = 744
                                                        LR chi2(2)    =  33.18
Log likelihood = -83.323546                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
          _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        drug |   .1048772   .0477017    -4.96   0.000     .0430057    .2557622
         age |   1.120325   .0417711     3.05   0.002     1.041375     1.20526
------------------------------------------------------------------------------

. stcurve, survival outfile(test_drug)
note: function evaluated at overall means of covariates.

. survci, outfile(test_drug_survci)
(drug=0.00; age=55.88)

Graph from stcurve

Click image for larger version

Name: stcurve graph.png
Views: 1
Size: 49.1 KB
ID: 1750818

Graph from survci

Click image for larger version

Name: survci graph.png
Views: 1
Size: 65.7 KB
ID: 1750819

Last edited by Giulia Vivaldi; 22 Apr 2024, 09:56.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

22 Apr 2024, 10:13

It seems that -survci- does not default to mean values of covariates:

. survci, outfile(test_drug_survci)
(drug=0.00; age=55.88)

The mean value of the variable drug is not 0, though it does get the mean value of age correct. And if you force -survci- to use the actual mean value it matches the results from -stcurve-:

Code:

clear* use https://www.stata-press.com/data/r18/drugtr stset studytime, failure(died) stcox drug age stcurve, survival name(stcurve, replace) summ drug, meanonly survci, at(drug = `r(mean)') name(survci, replace) graph combine stcurve survci, altshrink nocopies

gives us

You can see that now the two curves agree.

Moreover, -survci- does not promise it will set all variables to their means. From the help file:

at(varname=# [varname=# ...]) specifies the values of the covariates used in stcox for which the estimates of the plotted function are to be computed. If left
unspecified, continuous covariates will be set to their mean values, and factor variables will be set to their base levels. [emphasis added]
Comment
Giulia Vivaldi

Join Date: Jan 2023

Posts: 8
#3

23 Apr 2024, 02:02

Thank you for untangling that for me, Clyde. I did a lot of assuming and not enough checking!

Presumably, when dealing with factor covariates, setting them to base levels for such curves is preferable since it better reflects what the variable is meant to represent—is that correct?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

23 Apr 2024, 08:55

I think it depends on the context and the specific goals of the research. Factor variables being used primarily to represent nominal (categorical) variables, it is odd to calculate their means and use them. After all, the mean will be something between 0 and 1 and, in principle, not even a possible value for the variable. But, if the model is intended to reflect overall population trends, rather than modeling individual level outcomes, then using the mean is actually appropriate, as it represents the prevalence of that category in the population.

Even in the absence of population-level research questions, with factor-variables there is the question of which level to use in -survci-. By default, that program chooses the base category--which respects whatever choice you have made previously. But how does one choose the base category in the first place? Sometimes there is a clear rationale. But in many situations there is nothing particularly privileged about any one level of a categorical variable and the choice is arbitrary. So in that case, the means of the indicators, reflecting prevalence of the categories, might be a more meaningful "compromise" than singling out any one level.

tl;dr It depends!
1 like
Comment

Announcement

Different results with survci vs stcurve

Comment

Comment

Comment