Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survival analysis. Age as time variable. Need to include age in posterior analysis.

    Dear statalist community,

    Following the advice from the community given in my last post, I set up a survival analysis of my data. To recap briefly, I am doing a study about risk factors for stroke in a subset of patients. For doing so, I gathered retrospective data from a cohort of patients and organised it into" assessments", which every assessment meaning each time they came to the clinic and we have different clinical information from them. As in every assessment not all patients have the same clinical information, I have a problem with missing data in some variables which I will come to later.

    My idea is to:

    1. Encounter covariates (glomerular filtration rate, presence of white matter lesions at the brain MRI, heart status, treatment...) associated with an increasing number of events (stroke) or events that happen sooner
    2. With such variables, fit a Cox model for risk prediction.

    For doing so, I have stset my data, I have done the Kaplan-Meyer analysis of the different variables (graphically and logrank analysis), run an univariate Cox model with all the important covariates and then a multivariate one. As I several doubts have arisen, I wanted to share them with you to seek help. I will try to maintain one post per doubt, so I will start with the first one, in case I had to redo everything from the beginning.

    To give you a glimpse of how my stset data looks, let me share here the stset output and a dataex example of some covariates and the stset variables.

    Code:
     stset
    -> stset meanageass_, id(id) failure(stroketotallong_==1) enter(time==.)
                          exit(stroketotallong_==2)
    
                    id:  id
         failure event:  stroketotallong_ == 1
    obs. time interval:  (meanageass_[_n-1], meanageass_]
     enter on or after:  time==.
     exit on or before:  stroketotallong_==2
    
    ------------------------------------------------------------------------------
          2,258  total observations
            409  observations end on or before enter()
    ------------------------------------------------------------------------------
          1,849  observations remaining, representing
            380  subjects
             73  failures in multiple-failure-per-subject data
          2,093  total analysis time at risk and under observation
                                                    at risk from t =         0
                                         earliest observed entry t =        15
                                              last observed exit t =        87

    1-My subject identification variable is id.
    2-My failure variables is stroketotallong_ which records 0/1 the status of stroke yes/no. As I wanted to include the subjects with more than one stroke, I set up the exit condition with stroketotallong_==2 which is not met by any subject, and let me include as many as 1 as possible.
    3-My time variable was initially the number of assessment, as each assessment was 1 year apart from the other. However, from reading the recommended book "An introduction to survival analysis using stata" and after reading this article , I decided to change it to the age of the participants. It makes sense, as in the same assessment patients with different ages, and thus, with different risk for stroke, might be considered to have equal risk if the time variable is the assessment time.

    The data with some covariates looks like:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int id float meanageass_ int gfr_ float(wml_ treatment_ LAecho_ stroketotallong_) byte(_st _d _t _t0)
     1 34   . . . . 0 0 .  .  .
     1 35  97 1 0 . 0 1 0 35 34
     1 36   . . 0 . 0 1 0 36 35
     1 37   . . 0 . 0 1 0 37 36
     1 38   . . 0 . 0 1 0 38 37
     1 39   . . 0 . 0 1 0 39 38
     2 27   . . . . 0 0 .  .  .
     2 28  98 0 1 1 0 1 0 28 27
     2 29  96 0 2 . 0 1 0 29 28
     2 30  98 . 1 . 0 1 0 30 29
     2 31 104 . 1 . 0 1 0 31 30
     2 32 103 . 1 . 0 1 0 32 31
     2 33 105 . . . 0 1 0 33 32
     4 40   . . . . 0 0 .  .  .
     4 41 112 0 1 0 0 1 0 41 40
     4 42 102 . 3 1 0 1 0 42 41
     4 43 103 . 3 . 0 1 0 43 42
     4 44   . . 3 . 0 1 0 44 43
     4 45   . . 3 . 0 1 0 45 44
     5 41   . . . . 0 0 .  .  .
     6 32   . . . . 0 0 .  .  .
     6 33  94 0 1 2 0 1 0 33 32
     6 34  79 . 2 . 0 1 0 34 33
     6 35  93 0 2 . 0 1 0 35 34
     6 36 103 . 2 . 0 1 0 36 35
     6 37  63 . 2 . 0 1 0 37 36
     6 41  48 . . . 0 1 0 41 37
     7 58   . . . . 0 0 .  .  .
     7 59  98 . 3 . 0 1 0 59 58
     7 61 100 . 3 . 0 1 0 61 59
     7 62 106 . 1 0 0 1 0 62 61
     7 63 102 . 3 . 0 1 0 63 62
     7 64 111 . 3 . 0 1 0 64 63
     7 67  93 . . . 0 1 0 67 64
     8 20   . . . . 0 0 .  .  .
     8 21 115 0 3 1 0 1 0 21 20
     9 40   . . . . 0 0 .  .  .
     9 42  92 1 1 0 0 1 0 42 40
     9 44  80 1 1 . 0 1 0 44 42
     9 45  84 . 1 . 0 1 0 45 44
     9 46   . . 1 . 0 1 0 46 45
     9 47   . . 1 . 0 1 0 47 46
    10 21   . . . . 0 0 .  .  .
    10 23 117 0 0 0 0 1 0 23 21
    10 25 121 0 0 0 0 1 0 25 23
    10 26 115 . 0 . 0 1 0 26 25
    10 27 129 . 0 . 0 1 0 27 26
    10 28 121 . 0 . 0 1 0 28 27
    10 29 106 . . . 0 1 0 29 28
    11 25   . . . . 0 0 .  .  .
    11 26 131 0 0 0 0 1 0 26 25
    11 28 120 0 0 1 0 1 0 28 26
    11 29 132 . 0 . 0 1 0 29 28
    11 30 124 . 0 . 0 1 0 30 29
    11 31  97 . 0 . 0 1 0 31 30
    11 32 125 . . . 0 1 0 32 31
    13 38   . . . . 0 0 .  .  .
    13 39 121 0 0 . 0 1 0 39 38
    13 41 124 . 0 . 0 1 0 41 39
    13 42 106 . 0 . 0 1 0 42 41
    13 43 105 . 0 . 0 1 0 43 42
    13 44 114 . 0 . 0 1 0 44 43
    14 41   . . . . 1 0 .  .  .
    14 42 109 . 3 . 0 1 0 42 41
    14 43  95 0 1 0 0 1 0 43 42
    14 44  89 0 3 0 0 1 0 44 43
    14 45 107 . 3 0 0 1 0 45 44
    14 46 105 . 3 . 0 1 0 46 45
    14 47 103 . . . 0 1 0 47 46
    15 35   . . . . 0 0 .  .  .
    15 36 108 0 1 0 0 1 0 36 35
    15 37 108 0 1 0 0 1 0 37 36
    15 38 102 . 1 0 0 1 0 38 37
    15 39 113 . 1 0 0 1 0 39 38
    15 40 105 . 1 . 0 1 0 40 39
    15 41 104 . . . 0 1 0 41 40
    16 66   . . . . 0 0 .  .  .
    16 67 110 0 1 0 0 1 0 67 66
    16 69  97 0 3 0 0 1 0 69 67
    16 70 103 . 3 . 0 1 0 70 69
    16 71  93 . 3 . 0 1 0 71 70
    16 72   . . 3 . 0 1 0 72 71
    17 60   . . . . 1 0 .  .  .
    17 61  76 1 1 0 0 1 0 61 60
    17 62  77 1 1 0 0 1 0 62 61
    17 63  72 . 1 0 0 1 0 63 62
    17 64  64 . 1 . 0 1 0 64 63
    17 65  61 . 1 . 0 1 0 65 64
    17 66  63 . . . 0 1 0 66 65
    18 21   . . . . 0 0 .  .  .
    18 22 113 0 1 0 0 1 0 22 21
    18 23 119 0 1 . 0 1 0 23 22
    18 24 113 . 1 . 0 1 0 24 23
    18 25 122 . 1 . 0 1 0 25 24
    18 26 123 . 1 . 0 1 0 26 25
    18 27 116 . . . 0 1 0 27 26
    19 33   . . . . 1 0 .  .  .
    19 34 115 2 1 0 0 1 0 34 33
    19 35  99 2 1 0 0 1 0 35 34
    19 36 110 . 1 . 0 1 0 36 35
    end
    label values gfr_ gfrgoods
    label values wml_ wml
    label def wml 0 "No WML", modify
    label def wml 1 "Fazekas 1", modify
    label def wml 2 "Fazekas 2", modify
    label values treatment_ Treatment
    label def Treatment 0 "No treatment", modify
    label def Treatment 1 "Agalsidase alpha", modify
    label def Treatment 2 "Agalsidase beta", modify
    label def Treatment 3 "Migalastat", modify
    label values LAecho_ LA
    label def LA 0 "No dilated", modify
    label def LA 1 "Mildly Dilated", modify
    label def LA 2 "Modeately dilated", modify
    The overall survival function in this data is the following:

    Code:
         failure _d:  stroketotallong_ == 1
       analysis time _t:  meanageass_
      enter on or after:  time==.
      exit on or before:  stroketotallong_==2
                     id:  id
    
                 At           Net    Survivor      Std.
      Time     Risk   Fail   Lost    Function     Error     [95% Conf. Int.]
    ------------------------------------------------------------------------
        15        0      0     -2      1.0000         .          .         .
        16        2      0     -1      1.0000         .          .         .
        17        3      0    -10      1.0000         .          .         .
        18       13      0     -7      1.0000         .          .         .
        19       20      0     -5      1.0000         .          .         .
        20       25      0     -4      1.0000         .          .         .
        21       29      0     -7      1.0000         .          .         .
        22       36      0     -5      1.0000         .          .         .
        23       41      0      6      1.0000         .          .         .
        24       35      0      1      1.0000         .          .         .
        25       34      0     -4      1.0000         .          .         .
        26       38      1      5      0.9737    0.0260     0.8275    0.9963
        27       32      0     -3      0.9737    0.0260     0.8275    0.9963
        29       35      0      4      0.9737    0.0260     0.8275    0.9963
        31       31      1     -1      0.9423    0.0398     0.7870    0.9853
        32       31      0     -4      0.9423    0.0398     0.7870    0.9853
        33       35      1      4      0.9154    0.0469     0.7593    0.9720
        34       30      0      5      0.9154    0.0469     0.7593    0.9720
        35       25      0    -11      0.9154    0.0469     0.7593    0.9720
        36       36      2     -7      0.8645    0.0564     0.7043    0.9413
        37       41      0      2      0.8645    0.0564     0.7043    0.9413
        38       39      2     -5      0.8202    0.0616     0.6592    0.9100
        39       42      1      3      0.8006    0.0632     0.6403    0.8950
        40       38      1     -4      0.7796    0.0649     0.6190    0.8787
        41       41      3     -1      0.7225    0.0680     0.5636    0.8318
        42       39      2     -5      0.6855    0.0694     0.5280    0.7999
        43       42      2      1      0.6528    0.0698     0.4980    0.7704
        44       39      0     -2      0.6528    0.0698     0.4980    0.7704
        45       41      1     -1      0.6369    0.0699     0.4834    0.7558
        46       41      3    -10      0.5903    0.0698     0.4413    0.7120
        47       48      1      5      0.5780    0.0694     0.4308    0.6999
        48       42      1      2      0.5642    0.0691     0.4187    0.6865
        49       39      1     -5      0.5498    0.0688     0.4059    0.6724
        50       43      2     -3      0.5242    0.0680     0.3840    0.6468
        51       44      1      0      0.5123    0.0675     0.3739    0.6346
        52       43      4     -6      0.4646    0.0653     0.3337    0.5855
        53       45      4     -6      0.4233    0.0626     0.2998    0.5415
        54       47      1      1      0.4143    0.0619     0.2927    0.5316
        55       45      1      1      0.4051    0.0613     0.2853    0.5216
        56       43      1      0      0.3957    0.0605     0.2777    0.5113
        57       42      0      3      0.3957    0.0605     0.2777    0.5113
        58       39      2     -5      0.3754    0.0591     0.2612    0.4892
        59       42      4      1      0.3397    0.0561     0.2328    0.4493
        60       37      0      2      0.3397    0.0561     0.2328    0.4493
        61       35      2     -7      0.3202    0.0546     0.2172    0.4278
        62       40      1      0      0.3122    0.0538     0.2110    0.4186
        63       39      2      0      0.2962    0.0522     0.1987    0.4002
        64       37      2      3      0.2802    0.0506     0.1863    0.3817
        65       32      0     -1      0.2802    0.0506     0.1863    0.3817
        66       33      1      5      0.2717    0.0498     0.1797    0.3719
        67       27      1      3      0.2617    0.0489     0.1717    0.3606
        68       23      1      0      0.2503    0.0481     0.1623    0.3481
        69       22      2     -1      0.2275    0.0464     0.1439    0.3229
        70       21      3     -2      0.1950    0.0434     0.1184    0.2859
        71       20      1     -1      0.1853    0.0423     0.1110    0.2744
        72       20      1      0      0.1760    0.0412     0.1042    0.2633
        73       19      1     -1      0.1667    0.0400     0.0974    0.2522
        74       19      1      1      0.1580    0.0389     0.0910    0.2415
        75       17      2     -3      0.1394    0.0365     0.0776    0.2189
        76       18      2      0      0.1239    0.0340     0.0671    0.1991
        77       16      3      0      0.1007    0.0302     0.0516    0.1690
        78       13      2      3      0.0852    0.0274     0.0415    0.1487
        79        8      0      3      0.0852    0.0274     0.0415    0.1487
        81        5      0     -1      0.0852    0.0274     0.0415    0.1487
        83        6      2     -1      0.0568    0.0246     0.0212    0.1183
        84        5      0      2      0.0568    0.0246     0.0212    0.1183
        86        3      0      1      0.0568    0.0246     0.0212    0.1183
        87        2      0      2      0.0568    0.0246     0.0212    0.1183
    ------------------------------------------------------------------------
    Note: Net Lost equals the number lost minus the number who entered.
    Which looks like

    Click image for larger version

Name:	Overall survival.jpg
Views:	1
Size:	20.4 KB
ID:	1602694


    I can show you what it looked like with the assessment time (time since enrolment more or less) as time variable:

    Code:
                At           Net    Survivor      Std.
      Time     Risk   Fail   Lost    Function     Error     [95% Conf. Int.]
    ------------------------------------------------------------------------
         1      380     28    -10      0.9263    0.0134     0.8951    0.9485
         2      362      6      5      0.9110    0.0146     0.8776    0.9355
         3      351     10      0      0.8850    0.0163     0.8486    0.9131
         4      341     10     14      0.8591    0.0178     0.8200    0.8902
         5      317      7    103      0.8401    0.0188     0.7993    0.8733
         6      207      9    124      0.8036    0.0215     0.7572    0.8420
         7       74      0     45      0.8036    0.0215     0.7572    0.8420
         8       29      0     14      0.8036    0.0215     0.7572    0.8420
         9       15      1      7      0.7500    0.0555     0.6210    0.8405
        10        7      1      2      0.6428    0.1100     0.3891    0.8132
        11        4      0      2      0.6428    0.1100     0.3891    0.8132
        12        2      0      1      0.6428    0.1100     0.3891    0.8132
        16        1      1      0      0.0000         .          .         .
    ------------------------------------------------------------------------
    Note: Net Lost equals the number lost minus the number who entered.
    Click image for larger version

Name:	overall survival time since enrolment.jpg
Views:	1
Size:	19.1 KB
ID:	1602695


    So, continuing from the point where I have my data stset with age as the time variable, let me show you what I did with one of the variables to find if it is a risk factor for stroke or not. I am going to choose gender as the example variable.

    1. Kaplan Meyer analysis of gender and stroke

    Code:
    . sts list, by(sex) compare
    
             failure _d:  stroketotallong_ == 1
       analysis time _t:  meanageass_
      enter on or after:  time==.
      exit on or before:  stroketotallong_==2
                     id:  id
    
                     Survivor Function
    sex                Male     Female
    ----------------------------------
    time      15     1.0000     1.0000
              24     1.0000     1.0000
              33     0.9412     0.9143
              42     0.5686     0.7514
              51     0.3162     0.6667
              60     0.2163     0.4321
              69     0.1471     0.2807
              78     0.0525     0.1026
              87     0.0525     0.0616
    ----------------------------------
    
    . sts test sex, logrank
    
             failure _d:  stroketotallong_ == 1
       analysis time _t:  meanageass_
      enter on or after:  time==.
      exit on or before:  stroketotallong_==2
                     id:  id
    
    
    Log-rank test for equality of survivor functions
    
           |   Events         Events
    sex    |  observed       expected
    -------+-------------------------
    Male   |        34          28.93
    Female |        39          44.07
    -------+-------------------------
    Total  |        73          73.00
    
                 chi2(1) =       1.59
                 Pr>chi2 =     0.2072
    Click image for larger version

Name:	KM sex.jpg
Views:	2
Size:	25.5 KB
ID:	1602696


    The results show that it is not significant, so the number of events is not different between groups. My first question is: Can I say this is somehow adjusted by age? I mean, that despite that the curves seems horizontally separated, there is no difference in age? Because I can understand that the number of events is more or less similar, but in this graph, the median survival time age is different between males and females, and this also happens with other variables despite not being significant in the logrank test, and I do not know how to demonstrate that the median age of survival is different, as age is the time variable. Or perhaps it is no different and the logrank test accounts for that as age is the time variable.

    2-Univariate Cox analysis

    Then I proceed with each variable that was significative in the kaplan meyer + those that despite not being significant I consider clinically important to the Cox analysis. I first do a univariate cox analysis alone and then I "adjust" each variable by gender and a genetic status, called N215S (0/1, yes/no). Let me show you the process for sex and for another variable.

    Code:
     stcox sex
    
             failure _d:  stroketotallong_ == 1
       analysis time _t:  meanageass_
      enter on or after:  time==.
      exit on or before:  stroketotallong_==2
                     id:  id
    
    Iteration 0:   log likelihood = -250.79081
    Iteration 1:   log likelihood = -250.04182
    Iteration 2:   log likelihood = -250.04144
    Refining estimates:
    Iteration 0:   log likelihood = -250.04144
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =          380                  Number of obs    =       1,849
    No. of failures =           73
    Time at risk    =         2093
                                                    LR chi2(1)       =        1.50
    Log likelihood  =   -250.04144                  Prob > chi2      =      0.2209
    
    ------------------------------------------------------------------------------
              _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             sex |   .7456718   .1779156    -1.23   0.219     .4671464    1.190262
    ------------------------------------------------------------------------------
    
    
    . stcox sex N215S
    
             failure _d:  stroketotallong_ == 1
       analysis time _t:  meanageass_
      enter on or after:  time==.
      exit on or before:  stroketotallong_==2
                     id:  id
    
    Iteration 0:   log likelihood = -250.79081
    Iteration 1:   log likelihood = -241.75783
    Iteration 2:   log likelihood = -241.52702
    Iteration 3:   log likelihood = -241.52647
    Refining estimates:
    Iteration 0:   log likelihood = -241.52647
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =          380                  Number of obs    =       1,849
    No. of failures =           73
    Time at risk    =         2093
                                                    LR chi2(2)       =       18.53
    Log likelihood  =   -241.52647                  Prob > chi2      =      0.0001
    
    ------------------------------------------------------------------------------
              _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             sex |   .6068005   .1492187    -2.03   0.042     .3747369    .9825745
           N215S |   .3001275   .0959839    -3.76   0.000      .160355    .5617316
    ------------------------------------------------------------------------------

    My second question is: I am not adding age in the "adjusted cox model with N215S" because I think it is iplied adjusted by the time variable. Is that correct?
    My third question is: As you can see, the univariate cox analysis with gender alone has a loglikehood test not significant. Should I proceed despite that result to the adjust model with the variable sex despite that result? Because then, when I adjust by N215S, the loglikelihood ratio is significant and even the HR is significant, so I do not know if consider sex, preliminary, a risk factor for stroke or not.


    Let me show another examples with another variables: dilatation at the echography (LAechodilated 0/1 yes/no) and white matter lesions at the mri (0/1 yes/no). Here I know that I have a problem with missing data which I am thinking to resolve with mi, at least to compare a model with multiple imputation and one without.

    White matter lesions:

    Code:
    . stcox wmlyesno
    
             failure _d:  stroketotallong_ == 1
       analysis time _t:  meanageass_
      enter on or after:  time==.
      exit on or before:  stroketotallong_==2
                     id:  id
    
    Iteration 0:   log likelihood = -108.62454
    Iteration 1:   log likelihood = -106.17538
    Iteration 2:   log likelihood = -106.15317
    Iteration 3:   log likelihood = -106.15316
    Refining estimates:
    Iteration 0:   log likelihood = -106.15316
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =          373                  Number of obs    =         868
    No. of failures =           41
    Time at risk    =         1057
                                                    LR chi2(1)       =        4.94
    Log likelihood  =   -106.15316                  Prob > chi2      =      0.0262
    
    ------------------------------------------------------------------------------
              _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        wmlyesno |   2.440238   1.028713     2.12   0.034     1.068064    5.575284
    ------------------------------------------------------------------------------
    
    . stcox wmlyesno sex N215S
    
             failure _d:  stroketotallong_ == 1
       analysis time _t:  meanageass_
      enter on or after:  time==.
      exit on or before:  stroketotallong_==2
                     id:  id
    
    Iteration 0:   log likelihood = -108.62454
    Iteration 1:   log likelihood =  -103.1311
    Iteration 2:   log likelihood = -103.07765
    Iteration 3:   log likelihood = -103.07762
    Refining estimates:
    Iteration 0:   log likelihood = -103.07762
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =          373                  Number of obs    =         868
    No. of failures =           41
    Time at risk    =         1057
                                                    LR chi2(3)       =       11.09
    Log likelihood  =   -103.07762                  Prob > chi2      =      0.0112
    
    ------------------------------------------------------------------------------
              _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        wmlyesno |   2.206474   .9194987     1.90   0.058     .9749442    4.993649
             sex |   .5305371   .1806634    -1.86   0.063     .2721803    1.034129
           N215S |   .4525625   .1840767    -1.95   0.051     .2039193    1.004382
    ------------------------------------------------------------------------------
    
    .
    With wml yes/no I can see that the loglikehood pvalue are significant in both cases and that the HR do not reach significance probably because of the missing data. However I have the same questions: Should I add age?

    Dilatation at the echo:

    Code:
    . stcox LAechodilated
    
             failure _d:  stroketotallong_ == 1
       analysis time _t:  meanageass_
      enter on or after:  time==.
      exit on or before:  stroketotallong_==2
                     id:  id
    
    Iteration 0:   log likelihood = -84.677888
    Iteration 1:   log likelihood = -84.665632
    Iteration 2:   log likelihood = -84.665632
    Refining estimates:
    Iteration 0:   log likelihood = -84.665632
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =          344                  Number of obs    =         687
    No. of failures =           34
    Time at risk    =          833
                                                    LR chi2(1)       =        0.02
    Log likelihood  =   -84.665632                  Prob > chi2      =      0.8756
    
    -------------------------------------------------------------------------------
               _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
    LAechodilated |   .9440953   .3467623    -0.16   0.876     .4595926    1.939361
    -------------------------------------------------------------------------------
    
    . stcox sex N215S LAechodilated
    
             failure _d:  stroketotallong_ == 1
       analysis time _t:  meanageass_
      enter on or after:  time==.
      exit on or before:  stroketotallong_==2
                     id:  id
    
    Iteration 0:   log likelihood = -84.677888
    Iteration 1:   log likelihood = -83.489152
    Iteration 2:   log likelihood = -83.483187
    Iteration 3:   log likelihood = -83.483187
    Refining estimates:
    Iteration 0:   log likelihood = -83.483187
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =          344                  Number of obs    =         687
    No. of failures =           34
    Time at risk    =          833
                                                    LR chi2(3)       =        2.39
    Log likelihood  =   -83.483187                  Prob > chi2      =      0.4956
    
    -------------------------------------------------------------------------------
               _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
              sex |   .6688925   .2716924    -0.99   0.322     .3017263    1.482858
            N215S |   .5543998   .2375908    -1.38   0.169     .2393516    1.284132
    LAechodilated |   .7876342   .3085109    -0.61   0.542     .3655231    1.697205
    -------------------------------------------------------------------------------
    Here the loglikehood ratio is never significant. Should I drop that variable for further risk models?

    I know that this post is very long, I have not shown the PH assumption. However, I wanted to start asking by the issue of the age as the time variable, in case I had to modify it. Besides, I took the opportunity to ask about the loglikehood ratio of the stcox and if the way I am "adjusting" the HR in the second stcox analysis is correct or not.

    After this, with the variables that are significant plus sex and N215S I plan to define the risk model via stcox, check the PH assumptions and then see If it can be internally validated.

    Thank you very much for all your help and I hope this post make sense somehow.

    Best,

    David.

  • #2
    My first question is: Can I say this is somehow adjusted by age? I mean, that despite that the curves seems horizontally separated, there is no difference in age? Because I can understand that the number of events is more or less similar, but in this graph, the median survival time age is different between males and females, and this also happens with other variables despite not being significant in the logrank test, and I do not know how to demonstrate that the median age of survival is different, as age is the time variable. Or perhaps it is no different and the logrank test accounts for that as age is the time variable.
    Yes, the logrank test is adjusted for age since age is the timescale. I wouldn't say "there is no difference in age" because you are not testing for differences in the age distribution, you are testing if the survival curves differ by sex. The logrank test is constructed by comparing the number of observed to expected events in males and females at each age where an event (stroke) occurs. The test accounts for the fact that the number at risk at each age is not necessarily balanced between the sexes and thereby adjusts for age. There are three conceptually identical tests in your output:

    1. Log rank test of sex
    2. LR test from the Cox model with sex as the only covariate
    3. Wald test of the HR for sex from the Cox model with sex as the only covariate

    You will see that all tests have a very similar test statistic (although you need to square the z test statistic to get a chi square 1) and p-value. All of these are tests of the effect of sex adjusted for age.

    I wouldn't however worry too much about the unadjusted effect of sex since there is evidence confounding. Having said that, the effect of sex appears to depend on age (i.e., you have non-proportional hazards). Males have a higher risk of stroke between ages 40-55 whereas females have a higher risk than males for ages 55+.

    My second question is: I am not adding age in the "adjusted cox model with N215S" because I think it is iplied adjusted by the time variable. Is that correct?
    Yes, that's correct.

    My third question is: As you can see, the univariate cox analysis with gender alone has a loglikehood test not significant. Should I proceed despite that result to the adjust model with the variable sex despite that result? Because then, when I adjust by N215S, the loglikelihood ratio is significant and even the HR is significant, so I do not know if consider sex, preliminary, a risk factor for stroke or not.
    The LR test is a test that all covariates in your model are simultaneously zero. It is rarely of interest. Significance tests of individual covariates are more informative. If you only have 1 covariate then the LR test is a significance test of that covariate but once you have more covariates the LR test is not interesting.

    The short answer to "should I proceed" is yes.

    In your statement "Then I proceed with each variable that was significative in the kaplan meyer + those that despite not being significant I consider clinically important to the Cox analysis", I would suggest focusing on the second clause. With studies such as yours, the primary consideration in variable selection should be your clinical knowledge.

    Comment


    • #3
      Dear Paul,

      Thank you very much for your useful and detailed answer. Please, let me ask a few things regarding your answer, just to double-check I have understood everything correctly.


      I wouldn't however worry too much about the unadjusted effect of sex since there is evidence confounding. Having said that, the effect of sex appears to depend on age (i.e., you have non-proportional hazards). Males have a higher risk of stroke between ages 40-55 whereas females have a higher risk than males for ages 55+
      Thank you very much for reassuring me about the adjustment by age. I did the graph of the hazard ratio to visualize the statement about risk, age and sex.

      Code:
       sts graph, hazard by(sex) kernel(gaussian)
      Click image for larger version

Name:	sex hazard.jpg
Views:	2
Size:	25.3 KB
ID:	1603077


      As you can see, it tells exactly the same thing you spotted. While I can see that the hazard is not proportional, when I test the PH after stcox I obtain the next result

      Code:
      . stcox sex
      
               failure _d:  stroketotallong_ == 1
         analysis time _t:  meanageass_
        enter on or after:  time==.
        exit on or before:  stroketotallong_==2
                       id:  id
      
      Iteration 0:   log likelihood = -250.79081
      Iteration 1:   log likelihood = -250.04182
      Iteration 2:   log likelihood = -250.04144
      Refining estimates:
      Iteration 0:   log likelihood = -250.04144
      
      Cox regression -- Breslow method for ties
      
      No. of subjects =          380                  Number of obs    =       1,849
      No. of failures =           73
      Time at risk    =         2093
                                                      LR chi2(1)       =        1.50
      Log likelihood  =   -250.04144                  Prob > chi2      =      0.2209
      
      ------------------------------------------------------------------------------
                _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               sex |   .7456718   .1779156    -1.23   0.219     .4671464    1.190262
      ------------------------------------------------------------------------------
      
      
      . estat phtest, detail
      
            Test of proportional-hazards assumption
      
            Time:  Time
            ----------------------------------------------------------------
                        |       rho            chi2       df       Prob>chi2
            ------------+---------------------------------------------------
            sex         |      0.18194         2.45        1         0.1172
            ------------+---------------------------------------------------
            global test |                      2.45        1         0.1172
            ----------------------------------------------------------------
      This result would mean that the PH is not violated if I am not misunderstood. Quite frankly, I cannot see what I am missing here.

      In your statement "Then I proceed with each variable that was significative in the kaplan meyer + those that despite not being significant I consider clinically important to the Cox analysis", I would suggest focusing on the second clause. With studies such as yours, the primary consideration in variable selection should be your clinical knowledge
      Thank you very much for helping me to visualize how to proceed with the study. Following your advice, I now plan to:

      1. Do the Kaplan Meier analysis of all the variables plus their logrank.
      2. As I have continuous variables, I will need to do the univariate cox regression for them at least. In many publications I have read, they always present the univariate Cox coefficients and HR of all the variables, independently if they have done the KM previously. I know it gives no new information for the categorical variables, but I can put one column for the univariate and another for the ones chosen for the multivariate, which will stem from the clinical decision (as you told me) plus those significant that make clinical sense.

      However, I have some doubts of the multivariate model. I know that I have missing data and adding new variables sometimes reduce the number of cases and, therefore, that changes the p value and significance of the variables and that is why I am working on the multiple iteration model to compare it with my raw model.

      Just to show you a glimpse of the model I am working now with to understand everything (it is almost as the final model but with a couple less of variables) please, let me put here the univariate cox analysis of some variables and then a model of all of them at the same time (as all of them are clinically relevant, plus all of them minus one are significant)

      sex= gender (0-1), gfrmore90=glomerular filtration rate of more than 90mL/min/m2 (0-1) , N215S=specific genetic condition (0-1), wmlyesno= presence or not of white matter lesions on the MRI (0-1)



      Code:
      . stcox sex
      
         
      
      Iteration 0:   log likelihood = -250.79081
      Iteration 1:   log likelihood = -250.04182
      Iteration 2:   log likelihood = -250.04144
      Refining estimates:
      Iteration 0:   log likelihood = -250.04144
      
      Cox regression -- Breslow method for ties
      
      No. of subjects =          380                  Number of obs    =       1,849
      No. of failures =           73
      Time at risk    =         2093
                                                      LR chi2(1)       =        1.50
      Log likelihood  =   -250.04144                  Prob > chi2      =      0.2209
      
      ------------------------------------------------------------------------------
                _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               sex |   .7456718   .1779156    -1.23   0.219     .4671464    1.190262
      ------------------------------------------------------------------------------
      
      . stcox N215S
      
      
      
      Iteration 0:   log likelihood = -250.79081
      Iteration 1:   log likelihood = -243.70901
      Iteration 2:   log likelihood = -243.56629
      Iteration 3:   log likelihood = -243.56606
      Refining estimates:
      Iteration 0:   log likelihood = -243.56606
      
      Cox regression -- Breslow method for ties
      
      No. of subjects =          380                  Number of obs    =       1,849
      No. of failures =           73
      Time at risk    =         2093
                                                      LR chi2(1)       =       14.45
      Log likelihood  =   -243.56606                  Prob > chi2      =      0.0001
      
      ------------------------------------------------------------------------------
                _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             N215S |   .3426848   .1051817    -3.49   0.000     .1877724    .6253998
      ------------------------------------------------------------------------------
      
      . stcox gfrmore90
      
      
      
      Iteration 0:   log likelihood = -223.73882
      Iteration 1:   log likelihood = -220.88859
      Iteration 2:   log likelihood =  -220.8847
      Iteration 3:   log likelihood =  -220.8847
      Refining estimates:
      Iteration 0:   log likelihood =  -220.8847
      
      Cox regression -- Breslow method for ties
      
      No. of subjects =          366                  Number of obs    =       1,584
      No. of failures =           68
      Time at risk    =         1786
                                                      LR chi2(1)       =        5.71
      Log likelihood  =    -220.8847                  Prob > chi2      =      0.0169
      
      ------------------------------------------------------------------------------
                _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
         gfrmore90 |   .4751378   .1514096    -2.34   0.020     .2544321    .8872935
      ------------------------------------------------------------------------------
      
      . stcox wmlyesno
      
       
      
      Iteration 0:   log likelihood = -108.62454
      Iteration 1:   log likelihood = -106.17538
      Iteration 2:   log likelihood = -106.15317
      Iteration 3:   log likelihood = -106.15316
      Refining estimates:
      Iteration 0:   log likelihood = -106.15316
      
      Cox regression -- Breslow method for ties
      
      No. of subjects =          373                  Number of obs    =         868
      No. of failures =           41
      Time at risk    =         1057
                                                      LR chi2(1)       =        4.94
      Log likelihood  =   -106.15316                  Prob > chi2      =      0.0262
      
      ------------------------------------------------------------------------------
                _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
          wmlyesno |   2.440238   1.028713     2.12   0.034     1.068064    5.575284
      ------------------------------------------------------------------------------
      
      . stcox sex N215S gfrmore90 wmlyesno
      
         
      
      Cox regression -- Breslow method for ties
      
      No. of subjects =          359                  Number of obs    =         815
      No. of failures =           37
      Time at risk    =          962
                                                      LR chi2(4)       =        9.95
      Log likelihood  =   -90.137307                  Prob > chi2      =      0.0413
      
      ------------------------------------------------------------------------------
                _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               sex |   .6046194    .216121    -1.41   0.159     .3000685     1.21827
             N215S |   .5783757   .2464391    -1.29   0.199     .2509129    1.333205
         gfrmore90 |   .6174489    .281266    -1.06   0.290     .2528447    1.507815
          wmlyesno |   2.180676   .9806948     1.73   0.083     .9032091    5.264947
      ------------------------------------------------------------------------------
      
      . estat phtest
      
            Test of proportional-hazards assumption
      
            Time:  Time
            ----------------------------------------------------------------
                        |                      chi2       df       Prob>chi2
            ------------+---------------------------------------------------
            global test |                      1.95        4         0.7451
            ----------------------------------------------------------------
      With this final model:

      1-I cannot say that any of these variables are risk factors, despite their significance in the univariate analysis, right? I suppose there is a huge effect by the missing data.
      2-In the case I did not have any missing data and this was the final result, this would mean that I have generated a model that is not useful to predict my outcome? Or I could still try to use it for generating a prediction score?
      3-In the case that this model had one or two of the variables that were significant, would I try to generate a prediction score only with that/those variables, or I have to use all of them?

      Let me exemplify this with an example:

      Code:
      . stcox sex N215S autoinmunitynum gfrmore90
      
      
      Iteration 0:   log likelihood = -223.73882
      Iteration 1:   log likelihood = -210.86756
      Iteration 2:   log likelihood = -210.03683
      Iteration 3:   log likelihood = -210.03215
      Iteration 4:   log likelihood = -210.03215
      Refining estimates:
      Iteration 0:   log likelihood = -210.03215
      
      Cox regression -- Breslow method for ties
      
      No. of subjects =          366                  Number of obs    =       1,584
      No. of failures =           68
      Time at risk    =         1786
                                                      LR chi2(4)       =       27.41
      Log likelihood  =   -210.03215                  Prob > chi2      =      0.0000
      
      ---------------------------------------------------------------------------------
                   _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ----------------+----------------------------------------------------------------
                  sex |   .6486877   .1701896    -1.65   0.099     .3878953    1.084818
                N215S |   .3256782   .1081536    -3.38   0.001     .1698696    .6243985
      autoinmunitynum |   4.031936   1.545896     3.64   0.000     1.901744    8.548212
            gfrmore90 |   .5788828   .1875955    -1.69   0.092     .3067229    1.092534
      ---------------------------------------------------------------------------------
      This is another model using the same variables as before but changing wmlyesno by the presence or not of an autoimmune disease (0-1 yes/no). Now we have a lot more of data, and two variables are significant. Would I use only those two to generate predictions?

      And just a last question...after seeing that this model is much better than the previous one, can I drop variables such as wmlyesno and generate this new model after seeing that they are not significant in my study and generate a more accurate model?

      I know I ask too many things, but after reading and reading, I have gathered so many questions...

      Thank you very much for all your help.

      Best,

      David.

      Comment

      Working...
      X