Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • just want to make sure I understood stpm3 correctly Q1-Q3

    Hello I would like to plot the cumulative hazard for a continuous variable - score -
    One of my colleagues advised me to look into -stpm3-

    Originally I fitted the following model into -stcox-
    Code:
    mkspline scoregrp1 44 scoregrp2=score
    
    ///Final model - after backward selection and choice of covariates (I know many people don't fancy backward selection, however I used this and clinical judgement to keep my variables, my finalised model is : 
    stcox c.scoregrp* age##age, hr
    
    As -scoregrp- is a linear spline continuous variable I am unable to graph the hazard using
    sts graph, hazard by(catvar) kernel(epan) name(h, replace)
    And I am not keen on categorising either... so a colleague recommended me to use -stpm3 -


    Code:
    stpm3 @ns(score, df(4)) scale(lncumhazard) df(5)eform
    predict lnhr, xbnotime ci
    Questions:
    May I confirm that if my finalised model from cox model had age as a predictor included in the model.

    q1. Why do the authors of the code add centre(50) to their original code? : stpm3 @ns(age, df(4) center(50)) year8594, scale(lncumhazard) df(5) eform

    q2. Should my code appear in this way - with age into the code as an additional predictor:

    Code:
    stpm3 @ns(score, df(4) age), scale(lncumhazard) df(5)eform
    --- I have not used center as I don't know what this is for so therefore not used

    q3. Should I set my covariates eg in my case age at the average level in order to work with the predict function?
    Code:
    stpm3 @ns(score, df(4)) scale(lncumhazard) df(5)eform
    predict lnhr, xbnotime at(age 68.9, obsvalues) ci   ////here my average of age was 68.9
    
    ///Finally plotting the graph were score is the continous variable vs cumulative hazard 
    twoway     (rarea lnhr_lci lnhr_uci score, sort) ///
            (line lnhr score, sort lpattern(solid)) ///
            , legend(off) ytitle("log relative hazard") name(two, replace)
    q4. And finally the code produces 4 natural spline variables
    - Am I correct to say I can infer conclusions from the c_ns_f1_score* (with score*=score 1 score 2 score 3 score 4) depedning on the p -value

    Many thanks

  • #2
    I am not sure exactly what you are asking for.
    1. You ask for the cumulative hazard but use sts graph, hazard, which gives a hazard function (not the cumulative hazard). Note that sts graph will not use your stcox model results.
    2. Your code predict lnhr, xbnotime will predict the log hazard ratio for each individual given their covariate pattern. This is also known as the prognostic index.
    3. You can predict for an individual with the average level of all covariates, but this can be awkward for factor variables. You can predict for any combination of covariate you are interested in.
    4. You could be interested in a marginal measure, i.e. the average measure in your study population. You can do this using standsurv, which will predict a function for all individuals and then take the average.
    Below is some annotated code which does various things that may help you.
    Code:
    // Load data
    use https://www.pclambert.net/data/rott2b, clear
    // stset with death due cancer as the event
    stset os, f(osi=1) exit(time 120) scale(12)
    
    stpm3 @ns(age, df(3))    /// natural cubic splines for age
          i.hormon,          /// 2 level factor vaiable
          df(3)              /// 3 df for baseline 
          scale(lncumhazard) //  model scale
    
    // Predict cumulative hazard for specific covariate pattern
    predict CH_age_50_hormon_0, at1(age 50 hormon 0)     /// covariate you want prediction for
                                cumhazard                /// prediction option
                                timevar(0 10, step(0.1)) /// time to predict at
                                frame(pred, replace)     //  frame to store predictions
    
    frame pred: line CH_age_50_hormon_0 tt
                                
          
    // if you want the hazard just change to hazard option      
    predict hazard_age_50_hormon_0, at1(age 50 hormon 0)     /// covariate you want prediction for
                                    hazard                   /// prediction option
                                    frame(pred, merge)       //  merge with existing frame
    
    // if you want the marginal cumulative hazard
    // use standsurv to get marginal survival and transform
    // using H(t) = -log[S(t)]
    // standsurv will predict the survival function for all subjects
    // and then take an average of these predictions
    // I think this is more interesting than the cumulative hazard.
    range tt 0 10 101
    standsurv, at1(.) surv timevar(tt) frame(standsurv_results, replace) ///
               atvar(S_marginal)
    // transform          
    frame standsurv_results: gen CH_marginal = -log(S_marginal)
    // plot
    frame standsurv_results: line CH_marginal tt
    
    // I see your code uses linear splines
    // I am not a fan of these, but you can do an equivalent thing using
    // linear bsplines
    // Example below with age (knot at age 55)
    stpm3 @bs(age, degree(1) knots(55)) /// linear B-splines for age
          i.hormon,                     /// 2 level factor vaiable
          df(3)                         /// 3 df for baseline 
          scale(lncumhazard)            // model scale
          
    // Again you can predict for any covariate pattern      
    predict CH_age_50_hormon_0 CH_age_50_hormon_1,    /// new variables 
                             at1(age 50 hormon 0)     /// covariates you want prediction for
                             at2(age 50 hormon 1)     /// covariates you want prediction for
                             cumhazard                /// prediction option
                             timevar(0 10, step(0.1)) /// time to predict at
                             frame(pred2, replace)    //  frame to store predictions
    frame pred2: line CH_age_50_hormon_* tt

    Comment


    • #3
      Click image for larger version

Name:	3F4C09B5-36BE-4561-9CBF-C5D72898B6FF.jpeg
Views:	1
Size:	334.3 KB
ID:	1740796
      Thank you
      this is really helpful. And thank you for your input and website.

      in terms of

      1. why does one insert center(50)) - what is the theory behind it ?
      2. Re my question 4 in the original post , I have attached output here.


      the code produces 4 natural spline variables .

      I was stuck with regards to what the ages the c_ns_f1_age1 represent…and how to interpret it for each one. As you can see all have p <0.05 except _ns_f1_age2

      I have tried:
      Code:
      tabstat age, by(_ns_f1_age1) statistics(min max)
      However, for each _ns_f1_age* have the same ranges of ages ie 12 – 77

      – Am I correct to say I cannot infer conclusions from the c_ns_f1_age1 output?

      - However I should use -standsurv- to be able to determine the overall average measure of the population ?


      Appreciate your thoughts

      ultimately I have a patient reported outcome score which is a continuous measure, I found that there is a certain threshold - score of 40 - when there is a decrease in complications. So I wanted to explore how the hazard changes as the patient reported outcome score changes. And determine if the value of 40 is significant or not.

      Comment


      • #4
        To answer you questions,

        1). The center option is not really needed here. It can be used to make the baseline hazard (when all covariates are equal to zero) meaningful. For example all spline variables will be equal to zero when using center(50). Below I give an example and show you get the same predictions when using or not using the center() option.


        Here are two models similar to yours - one with and one without the center() option.
        Code:
        use https://www.pclambert.net/data/colon, clear
        stset surv_mm, f(status=1) scale(12) exit(time 120.5)
         
        // This is a similar model to yours without the centre option
        stpm3 @ns(age, df(4)) i.year8594, scale(lncumhazard) df(5) eform
        estimate store model1
         
        // This is a similar model to yours with the centre option
        stpm3 @ns(age, df(4) center(50)) i.year8594, scale(lncumhazard) df(5) eform
        estimate store model2
        The xb equation give identical estimates
        Code:
        . // note that likelihood identical
        . // as are coefficients for xb part
        . est tab model1 model2, stats(ll)
        
        ----------------------------------------
            Variable |   model1       model2    
        -------------+--------------------------
        xb           |
         _ns_f1_age1 | -6.4221153   -6.4221153  
         _ns_f1_age2 | -.39020104   -.39020104  
         _ns_f1_age3 | -2.0156809   -2.0156809  
         _ns_f1_age4 | -2.6304365   -2.6304365  
                     |
            year8594 |
        Diagnosed..  | -.19797403   -.19797403  
        -------------+--------------------------
        time         |
                _ns1 | -12.093919   -12.093919  
                _ns2 |  2.7761433    2.7761433  
                _ns3 | -1.3629519   -1.3629519  
                _ns4 | -.89169779   -.89169779  
                _ns5 | -.37195654   -.37195654  
               _cons |  1.9917666   -.27243401  
        -------------+--------------------------
        Statistics   |                          
                  ll | -23520.095   -23520.095  
        ----------------------------------------
        In fact the only parameter that differs is the intercept, which makes sense as we have changed what the baseline refers to.

        We can now do a prediction from both models. Let's predict the hazard ratio for age. To do this we need to set a reference age to which other ages are compared. I will make age 50 the reference to be consistent with model2.

        For model 1 we can do the following

        Code:
        estimates restore model1
        gen t5 = 5 // predict HR at 5 years
                   // note that could use any timepoint as
                   // this is a PH model and so HR does not change
                   // over time.
                   // However, approach would also work when relaxing PH assumption.
        
        predict , at1(age 50, obsvalue) /// This is the reference age
                  at2(.)                /// This means use observed values 
                  hazard                /// predict hazard
                  ci                    /// calculate confidence interval
                  nogen                 /// Do not generate hazards (only ratio)
                  timevar(t5)           /// time to calculate HR
                  contrast(ratio)       /// form ratio at2/at1 
                  contrastvar(hr1_age)  /// name of contrast variable
                  merge                 //  merge with current frame
        We need to predict at a time point - I use 5 years. However this is a proportional hazards model, so the HR will be consistent at all time points. However, the syntax is useful if we fit a non-proportional hazards model.

        You can then plot the HR as a function of age with age 50 as the reference.

        Code:
        twoway line hr1_age* age, sort yscale(log)
        The nice thing about the above code is that we can change the reference age to 60 or 70 or whatever we want without having to refit the model.

        For model2 the identical code to the above will work. However, we can also extract the hazard ratio in other ways. For example, using partpred.

        Code:
        estimates restore model2
        partpred hr2_age, for(_ns_f1_age*) ci(hr2_age_lci hr2_age_uci) eform
        The above is useful as it will work in other types of models, e.g. stcox or when using logistic regression for a binary outcome.

        However, the differences between the two approaches is negligible. If using stpm3, I prefer the first way due to its flexibility.

        Code:
        . compare hr1_age hr2_age
        
                                                ---------- Difference ----------
                                    Count       Minimum      Average     Maximum
        ------------------------------------------------------------------------
        hr1_age<hr2_age              6654     -2.66e-15    -3.85e-16   -1.11e-16
        hr1_age=hr2_age              4489
        hr1_age>hr2_age              4421      1.11e-16     3.80e-16    3.55e-15
                               ----------
        Jointly defined             15564     -2.66e-15    -5.67e-17    3.55e-15
                               ----------
        Total                       15564
        
        . compare hr1_age_lci hr2_age_lci
        
                                                ---------- Difference ----------
                                    Count       Minimum      Average     Maximum
        ------------------------------------------------------------------------
        hr1_a~lci<hr2_a~lci          9273     -9.82e-10    -8.48e-12   -3.66e-15
        hr1_a~lci=hr2_a~lci            49
        hr1_a~lci>hr2_a~lci          6242      4.20e-14     4.89e-13    8.37e-12
                               ----------
        Jointly defined             15564     -9.82e-10    -4.85e-12    8.37e-12
                               ----------
        Total                       15564
        
        . compare hr1_age_uci hr2_age_uci
        
                                                ---------- Difference ----------
                                    Count       Minimum      Average     Maximum
        ------------------------------------------------------------------------
        hr1_a~uci<hr2_a~uci          6242     -1.16e-11    -5.60e-13   -4.24e-14
        hr1_a~uci=hr2_a~uci            49
        hr1_a~uci>hr2_a~uci          9273      3.11e-15     8.60e-12    9.82e-10
                               ----------
        Jointly defined             15564     -1.16e-11     4.90e-12    9.82e-10
                               ----------
        Total                       15564
        2) Your second question refers to the interpretability of the spline variables / coefficients. In general I advise not to attempt to interpret them or look at them individually. Together they enable you to model a non-linear function.

        Comment


        • #5
          Thank you so much for your kind reply. And apologies for the delay in replying, I've just been testing different things.

          I still have a question regarding choosing the 'best' reference value. In my case rather than modelling age, I am modelling a post-operative score (0-48, 48 best outcome, 0 - worst outcome) in order to determine if there is a particular number at which the risk of failure is definite.

          I tried using different reference values and plotting the graphs. in the picture (with 2 images) attached, you can see the far left (reference 0) , right (reference 28) and the one with one picture (reference 48)
          I haven't used any particular theory with regards to what reference value to use.

          As you can see, using different reference values results in creating different Y scales, which to me doesn't seem right.

          Q. So is there some sort of theory to use at which the best place to place a reference values for all the other values to be compared to this reference value?
          In your case, using hormonal data, it seems logical to use a typical 50-year-old woman.... middle-aged female at risk of breast cancer.


          Code:
          stpm3 @ns(postopscore, df(4)) c.age, scale(lncumhazaard) df(5) eform
          
          ///Predict HR at 5 years
          gen t5=5
          
          predict , at1(postopscore 1, obsvalue) /// This is the reference ag
                        at2(.) /// This means use observed values
                        hazard /// predict hazard
                        ci /// calculate confidence interval
                        nogen /// Do not generate hazards (only ratio)
                        timevar(t5) /// time to calculate HR
                        contrast(ratio) /// form ratio at2/at1
                        contrastvar(hr1_age) /// name of contrast variable
                        merge // merge with current frame
          
          twoway line hr1_postopscore* postopscore, sort yscale(log)
          Click image for larger version

Name:	Screenshot 2024-01-29 at 12.51.01.png
Views:	1
Size:	31.5 KB
ID:	1741393

          Click image for larger version

Name:	Screenshot 2024-01-29 at 12.51.14.png
Views:	1
Size:	35.8 KB
ID:	1741394


          In addition, having previously used stpm2 and it's modelling capacity with your tutorials . I also tried the following, however this plots the 5 year survival as a function of the score.
          It doesn't have any relevance to what I was trying to plot in terms of the hazard function as a function of the postopscore in order to identify how this varies over time.
          So I Would appreciate your input regarding my question above. However I have supplied this code in any case someone finds it useful in the future.

          Q 2. What is the difference between the abilities of stpm2 vs stpm3 ?

          Code:
          stpm2 c.postopscore i.proceduretype c.age, scale(h) df(5) reform 
          
          //Procedure = 1 = treatment 
          gen t5=5 
          predict t5_score, surv at(procedure 1 age 70) ci timevar(t5)
          
          ///Prcoedure = 0 = control 
          predict t5_score2, surv at(procedure 0 age 70) ci timevar(t5) 
          
          ///Graph 
          twoway (rarea t5_score_lci t5_score_uci postopscore, sort color(blue%25)) (line t5_score postopscore, sort colour(blue)) (line t5_score2 postopscorem sort colour(red)) (rarea t5_score2_lci t5_score2_uci postopscore, sort color(red%25)), title("5 year survival"))
          Last edited by Rose Matthews; 29 Jan 2024, 07:40.

          Comment

          Working...
          X