Comparing cox proportional hazard linear and non-linear (restricted cubic spline) models using likelihood ratio test

patrick handcock

Join Date: Nov 2017

Posts: 84
#1

Comparing cox proportional hazard linear and non-linear (restricted cubic spline) models using likelihood ratio test

29 Nov 2021, 08:39

Hi folks - I am trying to understand and figure out how to actually code/test non-linearity between spline (cox proportional hazards regression) and linear models using LR tests. I have seen this described in papers, but the actual mechanics of it in Stata are still a little unclear to me.

For example, in this paper here (https://www.ahajournals.org/doi/full...AHA.120.020718) they look at this for a diet variable and CVD and state:

"We computed restricted cubic splines with 4 knots to visually assess the shape of association between ADPQS as a continuous variable (both time- varying average and 13- year change) and risk of CVD. Statistical significance of nonlinearity (ie, curvature) was tested by comparing the spline model with the linear model, and P values of <0.05 were regarded as statistically significant nonlinear relationship between the exposure and the outcome. Statistical significance of linearity was tested by comparing the linear model to the model including only the covariates, both using likelihood ratio tests."

They then report p-values for these results in text and with a figure (below).

"A monotonic decrease in CVD risk with time-varying average APDQS (P-nonlinearity=0.12 and P-linearity<0.001; Figure A) and the 13- year change in APDQS (P-nonlinearity=0.54 and P- linearity=0.04; Figure B) was observed in restricted cubic splines."

I know one can compare nested models to look at prediction level with LR tests (e.g. below link)...

https://stats.idre.ucla.edu/stata/fa...test-in-stata/

Code:

logit hiwrite female read estimates store m1 logit hiwrite female read math science estimates store m2 lrtest m1 m2

... but how would one as per the above example paper compare a cox-regression linear model to a non-linear (restricted cubic spline) model using LR, and what is the relevance of the sentence "linearity was tested by comparing the linear model to the model including only the covariates"?

What would this Stata code look like in practice as a (quick/simple) example to compute these different p-values to compare the non-linear and linear models in this instance?

Thanks so much! I really appreciate your thoughts
Patrick

Last edited by patrick handcock; 29 Nov 2021, 08:57.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#2

29 Nov 2021, 15:59

If you use the -mkspline- command and specify you want 4 knots, Stata will create three cubic spline variables. If you graph each of those against the original APDQS variable, you will see that the first one is identical to the original APDQS variable itself. So the models are, indeed, nested. To get -lrtest- to recognize that fact, once you make the spline variables, I suggest you drop the original variable. Then your strictly linear model uses only the first spline variable, and the cubic-spline model uses all three. -lrtest- will be happy with that.
Comment

patrick handcock

Join Date: Nov 2017
Posts: 84

30 Nov 2021, 03:47

Thank so much for your reply Clyde Schechter - definitely puts me on the right track. Can I possibly just probe a little further to better understand how this might be coded (e.g. with an example)?

Code:

//Globals (for example)
global event = "ACM"                          
global censor = "age_death_censor"             
global time_to_event = "time_death_censor_yrs_4d_60s"     
global exposure = "apdqs"    


// Not quite sure how to do below cox analysis with mkspline aspect as well (used to using uvrs - interested though if simple?)
mkspline2 rc = apdqs, cubic nknots(4) displayknots


// UVRS attempt with cox analysis (example)

//stset
stset $censor, failure($event) enter(age) id(ri_rec_id) 
tabstat $time_to_event, stats (med p25 p75 iqr min max)

// Cubic spline model 
xi: uvrs stcox $exposure i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09) df(4) degree(3)    
* This generates 4 x cubic spline variables (apdqs_0, apdqs_1, apdqs_2, apdqs_3)
estimates store m1

// Linear model would be simply
stcox $exposure i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09)  
estimates store m2

// check relationship of original variable with *_0
scatter apdqs apdqs_0 // these are a perfect linear relationship

drop apdqs

// LR test to compare ?
lrtest m1 m2

Do I understand correctly that the above compares the linear to the cubic spline (non-linear) model? -i.e. above "non-linearity was tested by comparing the spline model with the linear model (via LR test)"

For the second part of above paper example -i.e. "linearity was tested by comparing the linear model to the model including only the covariates" - would this just be comparing two linear models (i.e. stcox...): both including the covariates, but one with and one without the exposure variable apdqs (via LR test)? Or does one need to provide a test for 'linear trend'.

e.g....

Code:

stcox $exposure i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09)  
estimates store m1

stcox i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09)  
estimates store m2

lrtest m1 m2

Thanks so much for your time.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#4

30 Nov 2021, 09:32

It is hard to understand your code, in part because you do not explain what the variables are. For example, age_death_censor, to me, sounds like it would be the name of a dichotomous variable distinguishing observed events from censored observations. This is even more strongly enforced by your storing it in a global (shame on you!) macro named censor. But in your -stset- command you actually use it as the time to event/censorship. So it isn't clear to me that your Cox models are being properly coded irrespective of the specific issues relating to the cubic splines. But let me assume that you have this right and are just using nomenclature that confuses me.

// Cubic spline model
xi: uvrs stcox $exposure i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09) df(4) degree(3)
* This generates 4 x cubic spline variables (apdqs_0, apdqs_1, apdqs_2, apdqs_3)
estimates store m1

No, this is all wrong. First of all, there is no reason to use the -xi:- prefix here, and it will just complicate things later. Drop the xi: prefix, but leav all the i.'s as they are. Factor variable notation has long displaced -xi- for handling indicator variables and interactions. There are still some relatively exotic commands that don't support factor-variable notation, but -stcox- isn't among those. For the most part, -xi- is obsolete. If you think you want to use -xi- for something, think harder and try using factor-variable notation instead. (Read -help fvvarlist- for more information.)

Turning to the spline issues, you have correctly used -mkspline- to generate the spline variables. But then you never use them. Instead you have dug up some uvrs command that is not part of official Stata. I don't know what that is, and can't advise you on how to use it. It may be perfectly fine for purpose, but there is no need for it. Instead:

Code:

// VERIFY FIRST SPLINE VARIABLE IS THE ORIGINAL VARIABLE assert float(apdqs) == float(rc1) // MODEL WITH FULL SPLINE stcox rc* i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09) estimates store full_spline // MODEL WITH ONLY LINEAR TERM stcox rc1 i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09) lrtest full_spline .
Comment
patrick handcock

Join Date: Nov 2017

Posts: 84
#5

30 Nov 2021, 10:28

Thanks again for the reply Clyde Schechter .

Huge apologies for not being clearer on the variables - I had rushed past that bit and mainly just wanted to provide a bit of context. My mistake! I went back to edit it in hindsight but was too late.

Re. The time_to_event variable (here, time_death_censor). This is a continuous variable (years) from exposure baseline to mortality (ACM) event or censor date. This allows the cox model to be run with age as the underlying time scale.

Re. uvrs vs. mkspline -- from what I understand from speaking to a statistician recently -- the 2 commands have different "rules" for choosing the default location of the knots (but both can be over-ridden) and effectivenly achieve the same purpose. uvrs just has some helpful uses when saving out the beta coefficients for later calculating and plotting of the estimated hazard ratio as a function of exposure. This is my main reason for using it with cox in Stata. I gather adjustrcspline and mfxrcspline are rather old user-written commands developed for Stata version 10 that have not been updated to allow all the standard twoway options that were introduced more recently.

All the above aside - thanks so much for your example code with mkspline - that is very helpful!

Can I push my luck and just quiz a little further on this relating back to the original paper/questions and interpretation?

1) Assuming I get a statistically significant LR test result (e.g.. Prob > chi2 = 0.02) - as I understand it this suggests that the less restrictive model (the full spline model with more variables) fits the data significantly better than the more restrictive model (i.e. linear model) - thus suggesting curvature in the relationship between exposure and outcome and that a spline would be a better option? Conversely, a non-significant LR test results (e.g. p>0.05) suggests the linear model is just as good as the spline model in terms of fit, and is thus likely the 'best' (or at least simpler) modelling option to present?

2) Just to follow up on my second question with regard to the above paper example and (A) 'testing for non-linearity (ie, curvature)' vs. (B) 'testing for linearity' using LR tests. I was a bit confused by the authors approach for (B) of "comparing the linear model to the model including only the covariates" to 'test for linearity' - my simple example application of this (i.e. below) made sense based on the description of their method, but conceptually I'm not sure that 'proves' linearity - or am I missing something perhaps in what is meant here? Would value any thoughts as these two approaches seem quite interlinked/nuanced. I would have thought presenting the spline would be sufficient in either case, with less reliance on p-values anyway (i.e. a spline will always show the shape, though I guess the interpretation of that shape can be somewhat subjective without formal tests like the LR test) - but that is a side issue.

Code:

stcox apdqs i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09) estimates store m1 stcox i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09) estimates store m2 // statistically significant p-value for LR test between models suggests a better fit for model with exposure included vs. without (as might be expected - but... // ... what does this say about linearity of association between apdqs and mortality?...) lrtest m1 m2

Thanks again so much for your time!! It is much appreciated.

Last edited by patrick handcock; 30 Nov 2021, 10:39.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#6

30 Nov 2021, 10:52

Yes, within the framework of likelihood-ratio testing, a significant p-value would support including the spline terms and using just the linear, and a non-significant p-value would support the opposite.

I believe the second question relates to whether you need to include apdqs in the model at all. And you would do it analogously to the way you handle the spline. First run the model with apdqs included. Store the estimates. Then run the same model but excluding apdqs. Then do the likelihood ratio test. A significant p-value would support including apdqs, a non-significant one the opposite. There is one subtlety here that did not arise in the spline vs linear testing. In this round, when you run the model that excludes apdqs, you have to add -if e(sample)- to the regression command to assure that the regression will run on the same set of observations as the one including apdqs. (The concern is that there may be observations with missing values for apdqs that aren't included in the estimation sample for the model including it, and those would then be added to the estimation sample when apdqs is excluded. This problem doesn't arise with the spline, because the spline variables are either all missing or all non-missing.)
Comment
patrick handcock

Join Date: Nov 2017

Posts: 84
#7

30 Nov 2021, 11:09

Thanks Clyde Schechter and for the extra tip in e(sample).

I'm not sure I quite follow, with regard to the second question, on why including vs. excluding apdqs would be informative as a 'test for linearity' ? What does this conceptually tell us beyond what was looked at when comparing the cubic spline vs. the linear model in terms of fit (A)?

i.e. the question of

(A) 'testing for non-linearity (ie, curvature)' -- by comparing spline vs. linear models with LR tests

vs.

(B) 'testing for linearity' using LR tests -- by comparing linear models including vs. excluding apdqs with LR tests ?

-- I can't quite see how (B) conceptually informs or 'proves' linearity?

Earlier quote from paper above for context:

For example, in this paper here (https://www.ahajournals.org/doi/full...AHA.120.020718) they look at this for a diet variable and CVD and state:

"We computed restricted cubic splines with 4 knots to visually assess the shape of association between ADPQS as a continuous variable (both time- varying average and 13- year change) and risk of CVD. Statistical significance of nonlinearity (ie, curvature) was tested by comparing the spline model with the linear model, and P values of <0.05 were regarded as statistically significant nonlinear relationship between the exposure and the outcome. Statistical significance of linearity was tested by comparing the linear model to the model including only the covariates, both using likelihood ratio tests."

They then report p-values for these results in text and with a figure (below).

"A monotonic decrease in CVD risk with time-varying average APDQS (P-nonlinearity=0.12 and P-linearity<0.001; Figure A) and the 13- year change in APDQS (P-nonlinearity=0.54 and P- linearity=0.04; Figure B) was observed in restricted cubic splines."
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#8

30 Nov 2021, 11:21

Well, I agree that their language is confusing, and it is possible that they themselves do not understand what they have done. Perhaps it is better to think of it in reverse order. Put (2) before (1):

The comparison of the model excluding adpqs altogether with a model including it as a linear term is a test for whether to include it in the model at all. Calling this a test of "linearity" is a bit off the actual meaning. It is really testing whether a linear term is better than no term at all. Then, one could go further and ask if a curvilnear model of the adpqs effect is better than just a linear effect. That would be tested by comparing the cubic spline model with the model that includes only linear adpqs.
1 like
Comment
patrick handcock

Join Date: Nov 2017

Posts: 84
#9

30 Nov 2021, 16:12

Thanks Clyde Schechter - I'm glad I was not the only one a bit confused by it. I think your take makes a lot more sense! Thanks again for taking the time to reply - has been very helpful
Comment

patrick handcock

Join Date: Nov 2017
Posts: 84

#10

01 Dec 2021, 03:12

Quick question Clyde Schechter - in code below, should there be an 'estimates store linear_term' line under the model/line with only the linear term, then 'lrtest full_spline linear_term' to compare models (i.e. m1 vs m2)? Or are these already nested? Just wanted to double check I hadn't misinterpreted this.

I was also thinking, if interested in linearity, another approach could also be to add "estat ic" after the full spline model and the linear term model - to examine/compare the AIC and especially BIC statistics to see if they favour linear vs. spline? Not sure if you had further thoughts?

Code:

// VERIFY FIRST SPLINE VARIABLE IS THE ORIGINAL VARIABLE
assert float(apdqs) == float(rc1)

// MODEL WITH FULL SPLINE
stcox rc* i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09)
estat ic
estimates store full_spline

// MODEL WITH ONLY LINEAR TERM
stcox rc1 i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09)
estat ic
estimates store linear_term

lrtest full_spline linear_term

Thanks!

p.s. you piqued my interest around the difference between mkspline approach vs. uvrs and trying to get them to both do the same thing. I was playing around with some example ways to make mkspline more equivalent to the uvrs spline code in terms of the knots - but yes a bit unsure whether the earlier uvrs approach is necessarily truly nested for the LR test (results are consistent in terms of significance, but not exactly the same). Sorry about the rabbit hole!

Code:

// Generate spline variables (note: the 3rd mkspline option more equivalent to uvrs in terms of knot placement, derived from mkspline pctile code before it)
mkspline rc = apdqs, cubic nknots(3) displayknots   // gives knots 1.697513   1.845864   2.025764 based on Harrell et al recommendations
drop rc*

mkspline rc 4 = apdqs, pctile displayknots  //this is equivalent to uvrs default location of knots (e.g. below)
drop rc*

mkspline rc = apdqs, cubic nknots(3) knots(1.764398   1.845864   1.938234) displayknots  //this is equivalent to uvrs default location of knots

UVRS

e.g.

Code:

//Cubic spline model (less restrictive model)
uvrs stcox apdqs i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09)  df(4) degree(3)
estimates store m1

// Linear model ('more restrictive' model)
stcox apdqs i.sex i.smokstat alcohol i.lipid i.antihyp i.antidep, strata(edlevel09)
estimates store m2

// LR test to compare (m2 nested in m1)
lrtest m1 m2

Last edited by patrick handcock; 01 Dec 2021, 03:32.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#11

01 Dec 2021, 11:44

Your use of -estimates store- in #10 is correct.

I don't want to go too far astray here, but I'll digress a bit. I am not a fan of selecting models based on likelihood ratio tests of significance. (In fact, I am not a fan of the concept of statistical significance at all--but I won't go down that rabbit hole here.) The AIC and BIC are, in my view, better than a naked LR test because at least there is some penalization for the added degrees of freedom, so that there is at least a little bit of protection against overfitting. Even so, I am generally reluctant to use any kind of single statistic to decide between models.

In general, I prefer to do model selection on the basis of prior information, estimates of goodness of fit, or graphical and summary statistical explorations of simulated data generated by the models. Cox proportional hazards models, however, are difficult to apply this preference to, as there really aren't any good fit statistics that I'm aware of--having censored observations makes the whole concept of fit kind of ambiguous. And while it is simple enough to simulate data from the models and at least rule out models that generate data that are clearly nonsensical, once you narrow the field to models that are not obviously ridiculous, the presence of censored observations makes it difficult to decide further among them. (This is partly due to the fact that both the Cox model and the parametric survival models assume that censoring is orthogonal to survival time, a mathematically handy assumption that is often demonstrably and seriously in error, and the actual data generation process for survival is often too unknown or too messy to simulate.) So in this context, reliance on LR tests or IC statistics is a fallback position. So I do it, too, but reluctantly. I should add that, in this connection, survival analysis played a larger role in my work a few decades back than it does today. So it may be that there are developments in this area that I have not kept up with that would facilitate my preferred approach to model selection.
Comment
patrick handcock

Join Date: Nov 2017

Posts: 84
#12

02 Dec 2021, 02:18

Thanks very much for the reply and for adding your perspective on this Clyde Schechter - this has been very helpful for my thinking around this as I have found it a bit tricky to determine the 'best' approach for modelling linear and non-linear relationships using cox models in Stata. I think I like the perspective of using prior information/theory and visualising the splines first and foremost, then descriptive tables and these LR and IC tests to 'complement' them as appropriate and with some numbers (while appreciating the limitations for the statistical tests / p-values, particularly in this context). I think if I had the choice I probably wouldn't use the LR tests and that reliance on p-values altogether - but it seems some readers feel more comfortable feeling like they have 'the answer' to the linear vs. non-linear question (i.e. "how do we know it is significantly non-linear", etc...) so I am just trying to navigate that as best I can.

In my particular case when I plot the splines I see what look like (to me) non-linear associations (e.g. below), but the p-value from the LR test comparing linear vs. cubic (via mkspline or uvrs) is non-significant (p=0.77), suggesting a linear model would be sufficient - but I still find it useful to see this visualisation either way despite the LR test. I am guessing in this case due to the smallish sample (n=6000 subjects and n=440 events) and wide 95%CI's. It is a bit ironic (maybe as defined incorrectly by Alanis Morrissette) that greater sample sizes make you more capable of detecting assumption violations (like non-normality) but also make the models more robust to departures from those assumptions (like non-normality).

restricted cubic spline (3 evenly space knots) of association between alpha (activity metric) with cancer incidence

Last edited by patrick handcock; 02 Dec 2021, 03:08.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#13

02 Dec 2021, 09:10

but it seems some readers feel more comfortable feeling like they have 'the answer' to the linear vs. non-linear question (i.e. "how do we know it is significantly non-linear", etc...) so I am just trying to navigate that as best I can.

Well, I can sympathize with you about being pressed by others to do the wrong thing. And not knowing your life-situation I am in no position to advise you on when to take a firm stand and when to decide that this is not the hill you will die on. Being in the senior phase of my career, I have become more aggressive about pushing back in recent years, but I was much more pliant when I was young and my career still in a vulnerable stage. That said, the important thing is to be clear in your own mind about when you are doing the right thing and when you are doing the wrong thing under duress. Yes, there are many people who are more comfortable feeling like they have "the answer" to some question. But it is critical to always remember that uncertainty is everywhere and always present and that the illusion of a certain answer does not change that fact about the world. Do not let yourself succumb to the illusion. At some point in your career, you will have sufficient autonomy to call things the way you see them and stop pleasing the crowd. It would be a shame if, by then, you lose the ability to tell the difference.
2 likes
Comment
Martin Imelda Borg

Join Date: Jan 2022

Posts: 225
#14

15 Aug 2023, 08:09

Originally posted by patrick handcock View Post

Thanks very much for the reply and for adding your perspective on this Clyde Schechter - this has been very helpful for my thinking around this as I have found it a bit tricky to determine the 'best' approach for modelling linear and non-linear relationships using cox models in Stata. I think I like the perspective of using prior information/theory and visualising the splines first and foremost, then descriptive tables and these LR and IC tests to 'complement' them as appropriate and with some numbers (while appreciating the limitations for the statistical tests / p-values, particularly in this context). I think if I had the choice I probably wouldn't use the LR tests and that reliance on p-values altogether - but it seems some readers feel more comfortable feeling like they have 'the answer' to the linear vs. non-linear question (i.e. "how do we know it is significantly non-linear", etc...) so I am just trying to navigate that as best I can.

In my particular case when I plot the splines I see what look like (to me) non-linear associations (e.g. below), but the p-value from the LR test comparing linear vs. cubic (via mkspline or uvrs) is non-significant (p=0.77), suggesting a linear model would be sufficient - but I still find it useful to see this visualisation either way despite the LR test. I am guessing in this case due to the smallish sample (n=6000 subjects and n=440 events) and wide 95%CI's. It is a bit ironic (maybe as defined incorrectly by Alanis Morrissette) that greater sample sizes make you more capable of detecting assumption violations (like non-normality) but also make the models more robust to departures from those assumptions (like non-normality).

[ATTACH=CONFIG]n1639107[/ATTACH]
restricted cubic spline (3 evenly space knots) of association between alpha (activity metric) with cancer incidence

Hi patrick handcock I'm sure it's been a long time, but I wonder if you could share your code to create this graph here....
Really helpful post
Comment

Announcement