I am conducting an analysis where I want to assess the relationship between a biomarker and the risk of negative binary outcomes in the patient (eg, intensive care admission - outcome). I have data for ~800 patients.
I also want to understand whether: i) the relationship is linear, and ii) whether there is a particular threshold of the biomarker at which risk increases.
To start, I perform a simple logistic regression:
To understand if the relationship is linear, I try a couple of options. First, I split the biomarker into equal groups and use this categorical variable in the regression:
Does this seem sensible? & are the margins interpreted as the expected probability of the outcome in each biomarker category (for the average person)?
I now want to explore using a restricted cubic spline model to see if the relationship is non-linear. My approach here is to compare the linear-only model with the full spline model using a LR test - if the LR test is significant (p<0.1) I have evidence of non-linearity & I could try including the splines in the model. I code this as:
I can then run the spline model - but I'm not clear on how to conduct the margins for the model. If I do as I did above & won't the splines be held at their means, which I'm not sure is correct. What is the best way to plot the margins from the spline model?
I could plot the odds against the biomarker - but I think it would be more comparable if I could plot the predicted probabilities.
Does anyone have any advice on whether I am moving in the right direction? Is there any other good/more appropriate way to determine if there are any thresholds of interest of the biomarker where the risk changes?
Thank you for your help!
I also want to understand whether: i) the relationship is linear, and ii) whether there is a particular threshold of the biomarker at which risk increases.
To start, I perform a simple logistic regression:
Code:
* logit - continuous predictor logistic outcome biomarker
Code:
* logit - categorical predictor egen cat_biomarker = cut(biomarker), group(6) label logistic outcome i.cat_biomarker margins, at(cat_biomarker=(0(1)5)) marginsplot
I now want to explore using a restricted cubic spline model to see if the relationship is non-linear. My approach here is to compare the linear-only model with the full spline model using a LR test - if the LR test is significant (p<0.1) I have evidence of non-linearity & I could try including the splines in the model. I code this as:
Code:
* cubic splines mkspline sp_biomarker = biomarker, cubic nknots(5) // compare linear model vs full spline model using LR test logit outcome sp_biomarker* est sto rcsmodel logit outcome sp_biomarker1 lrtest rcsmodel
Code:
logistic outcome sp_biomarker* margins, at(sp_biomarker1=(3(1)11)) marginsplot
Code:
logistic outcome sp_biomarker* predict xb, xb gen odds = exp(xb) line odds biomarker, sort
Does anyone have any advice on whether I am moving in the right direction? Is there any other good/more appropriate way to determine if there are any thresholds of interest of the biomarker where the risk changes?
Thank you for your help!
Comment