Assessing non-linearity and threshold effects in logistic regression (using restricted cubic splines)

Megan Moreton

Join Date: Jan 2020

Posts: 56
#1

Assessing non-linearity and threshold effects in logistic regression (using restricted cubic splines)

25 Feb 2025, 09:33

I am conducting an analysis where I want to assess the relationship between a biomarker and the risk of negative binary outcomes in the patient (eg, intensive care admission - outcome). I have data for ~800 patients.
I also want to understand whether: i) the relationship is linear, and ii) whether there is a particular threshold of the biomarker at which risk increases.

To start, I perform a simple logistic regression:

Code:

* logit - continuous predictor logistic outcome biomarker

To understand if the relationship is linear, I try a couple of options. First, I split the biomarker into equal groups and use this categorical variable in the regression:

Code:

* logit - categorical predictor egen cat_biomarker = cut(biomarker), group(6) label logistic outcome i.cat_biomarker margins, at(cat_biomarker=(0(1)5)) marginsplot

Does this seem sensible? & are the margins interpreted as the expected probability of the outcome in each biomarker category (for the average person)?

I now want to explore using a restricted cubic spline model to see if the relationship is non-linear. My approach here is to compare the linear-only model with the full spline model using a LR test - if the LR test is significant (p<0.1) I have evidence of non-linearity & I could try including the splines in the model. I code this as:

Code:

* cubic splines mkspline sp_biomarker = biomarker, cubic nknots(5) // compare linear model vs full spline model using LR test logit outcome sp_biomarker* est sto rcsmodel logit outcome sp_biomarker1 lrtest rcsmodel

I can then run the spline model - but I'm not clear on how to conduct the margins for the model. If I do as I did above & won't the splines be held at their means, which I'm not sure is correct. What is the best way to plot the margins from the spline model?

Code:

logistic outcome sp_biomarker* margins, at(sp_biomarker1=(3(1)11)) marginsplot

I could plot the odds against the biomarker - but I think it would be more comparable if I could plot the predicted probabilities.

Code:

logistic outcome sp_biomarker* predict xb, xb gen odds = exp(xb) line odds biomarker, sort

Does anyone have any advice on whether I am moving in the right direction? Is there any other good/more appropriate way to determine if there are any thresholds of interest of the biomarker where the risk changes?

Thank you for your help!
Tags: mkspline, Threshold
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#2

25 Feb 2025, 12:05

I'm a little confused about what you are doing; part of my confusion is because I think that non-linearity and the existence of a threshold are not necessarily the same thing; for non-linearity, yes, restricted cubic splines are, I think a good way to go; if by "threshold" you mean that the slope changes from 0 (1 if in OR) to something other than 0, then I think restricted cubic splines are not a good way to go; in certain situations, in fact, linear splines (piecewise linear model) would be better - but you don't really supply enough information for me to assess what you mean here
1 like
Comment
Megan Moreton

Join Date: Jan 2020

Posts: 56
#3

26 Feb 2025, 08:24

Thank you for your response - and apologies for not being clear.

Clinically, evidence suggests that high levels of the biomarker can lead to an increased risk of poor outcomes (but we are not sure the relationship between biomarker and outcome is linear - it probably isn't). There is a new medicine that we can prescribe to reduce the likelihood of poor outcomes in this population. We are trying to understand at what cut-off of the biomarker the medicine should be prescribed - we don't want to prescribe it too early/in patients with normal biomarker levels as that too can have negative effects.

So I want to understand if there is a threshold/cut-off in the biomarker that the relationship between the biomarker and the outcome changes (& leads to worse outcomes).

I will try use the linear splines using knots at percentiles of the data in the first instance:

Code:

mkspline sp_gttfasting 5 = gttfasting, pctile logistic outc_pc90 sp_gttfasting* predict p line p gttfasting, sort
Comment

Announcement

Assessing non-linearity and threshold effects in logistic regression (using restricted cubic splines)

Comment

Comment