Dear Statalist members,
I am currently working on a dataset comprising over 1000 patients to develop a clinical nomogram to predict muscle invasiveness (binary outcome: yes/no) at final histology following a specific surgical procedure. The study population involves patients undergoing the same surgical procedure within a specified risk group.
My objective is to develop a nomogram by comparing different predictive models based on various variables, primarily categorical. The selection criterion for the model is the one with the best AUC and decision curve.
I encountered a syntax error in the decision curve analysis that I'm struggling to resolve. I would like the final graph of decision curve analyis to contain the curve of each predictive model. I suspect the issue arises from the absence of the variable I intend to use in my original dataset, and I'm unsure how to correctly generate it as it needs to contain predictions of each model in terms of probability.
Below is a sample dataset generated with dataex, followed by the code I'm using for the dataset. After the syntax error, I've had to replace the actual variable names with generic ones because I still don't know which will be the best model that will be used to develop the nomogram.
I would be grateful if anyone could find a solution to this and eventually complete the code after the "* Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit" line with correct varnames. I will also need to correct the DCA for overfitting, may you please add the command for that to the code?
Your assistance is highly appreciated. Thank you in advance!
These are the commands I used:
Thank you in advance
Francesco Ditonno
I am currently working on a dataset comprising over 1000 patients to develop a clinical nomogram to predict muscle invasiveness (binary outcome: yes/no) at final histology following a specific surgical procedure. The study population involves patients undergoing the same surgical procedure within a specified risk group.
My objective is to develop a nomogram by comparing different predictive models based on various variables, primarily categorical. The selection criterion for the model is the one with the best AUC and decision curve.
I encountered a syntax error in the decision curve analysis that I'm struggling to resolve. I would like the final graph of decision curve analyis to contain the curve of each predictive model. I suspect the issue arises from the absence of the variable I intend to use in my original dataset, and I'm unsure how to correctly generate it as it needs to contain predictions of each model in terms of probability.
Below is a sample dataset generated with dataex, followed by the code I'm using for the dataset. After the syntax error, I've had to replace the actual variable names with generic ones because I still don't know which will be the best model that will be used to develop the nomogram.
I would be grateful if anyone could find a solution to this and eventually complete the code after the "* Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit" line with correct varnames. I will also need to correct the DCA for overfitting, may you please add the command for that to the code?
Your assistance is highly appreciated. Thank you in advance!
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float muscle_invasive byte grade_bio float(clinicalt_high size_tumor_high_EAU size_tumor_high_NCCN) byte(preop_cyto_result multifocal) float previous_cystectomy byte variant_histology 1 1 0 . . . 1 0 . 0 1 1 0 0 . 0 0 0 0 0 0 1 1 . 0 0 0 0 1 0 0 0 . 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 . . . 0 0 . 0 1 1 1 1 0 0 0 . 0 1 1 1 1 . 0 0 0 0 1 0 1 1 . 0 0 0 0 1 0 1 1 1 0 0 0 1 1 0 1 1 . 1 0 . 1 1 1 1 1 1 0 0 0 0 1 0 1 1 . 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 3 0 . 0 0 0 0 1 1 0 1 0 0 0 0 1 0 1 . 0 0 0 1 1 0 1 1 . 1 1 0 1 1 1 1 1 . 0 0 0 1 1 1 1 1 . 0 0 0 1 1 0 1 1 . 0 0 0 1 1 0 1 1 . 0 0 0 0 1 0 1 1 . 1 0 0 1 1 0 . . . 1 0 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 0 0 0 0 1 0 . . . 1 0 0 0 0 0 1 1 . 0 0 0 0 1 1 1 1 . 1 0 . 0 0 1 1 1 1 1 0 0 1 1 0 . . 1 0 0 0 1 1 1 0 0 . 0 0 0 1 1 0 . . . 0 0 0 0 1 1 . . . 1 0 0 1 1 0 1 1 . 0 0 0 1 1 0 . . . 0 0 0 1 0 0 0 1 . 0 0 0 1 1 0 1 1 . 0 0 0 1 1 1 . . . 0 0 0 1 1 0 1 1 1 0 0 0 0 1 0 . . 2 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 . . . 0 0 0 0 0 1 . . . 0 0 0 0 0 0 . . 2 1 0 0 0 1 0 . . 2 0 0 0 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 . . 2 0 0 . 1 0 1 . . . 0 0 0 0 1 0 0 0 . 0 0 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 . 0 0 0 0 1 0 1 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 1 1 0 1 0 0 1 1 0 1 1 3 0 0 0 0 1 0 . . 1 0 0 0 0 1 0 . . 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 1 . . . 0 0 0 0 0 0 0 1 . 0 0 0 1 1 0 . . . 0 0 0 1 1 0 1 1 . 0 0 0 0 0 0 1 1 0 1 0 0 1 1 1 1 1 0 0 0 . 0 1 0 1 1 2 0 0 0 0 1 0 . . 1 0 0 0 1 1 0 0 0 2 0 0 0 0 0 0 1 1 2 0 0 0 1 0 0 1 1 . 0 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 1 1 . 0 1 0 1 1 0 1 1 2 0 0 . 1 1 1 1 1 1 0 0 0 1 1 1 1 1 . 0 0 0 0 0 0 1 1 2 0 0 0 1 1 1 1 1 . 0 1 0 0 1 0 1 1 1 0 0 0 1 1 1 1 1 . 0 0 0 1 0 0 0 0 . 1 0 0 1 0 0 . . 1 1 0 0 0 1 0 1 1 . 1 0 0 0 1 0 . . . 0 0 0 1 1 1 1 1 . 1 0 0 1 1 1 1 1 . 0 0 0 1 0 1 1 1 2 0 0 0 1 1 1 1 1 . 0 0 0 1 1 0 1 1 . 0 0 0 1 1 0 1 1 . 0 0 0 1 1 1 1 1 . 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 0 1 . 0 0 0 0 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 1 0 . . . 0 0 0 1 1 0 1 1 2 0 0 0 1 1 0 0 0 . 0 0 0 end label values grade_bio grade_bio_ label def grade_bio_ 0 "Low Grade", modify label def grade_bio_ 1 "High Grade", modify label values preop_cyto_result preop_cyto_result_ label def preop_cyto_result_ 0 "Negative", modify label def preop_cyto_result_ 1 "Positive", modify label def preop_cyto_result_ 2 "Atypia/Suspicious", modify label def preop_cyto_result_ 3 "Not diagnostic", modify label values multifocal multifocal_ label def multifocal_ 0 "No", modify label def multifocal_ 1 "Yes", modify
These are the commands I used:
Code:
*population setting keep if grade_bio==1 | clinicalt_high== 1 | size_tumor_high_EAU==1 | size_tumor_high_NCCN==1 | preop_cyto_result==1 | multifocal==1 | previous_cystectomy==1 |variant_histology==1 keep if type_surg==2 drop if pt_path==. //PREDICTIVE MODELS *univariate analysis logistic muscle_invasive grade_bio logistic muscle_invasive clinicalt_high logistic muscle_invasive size_tumor_high_EAU logistic muscle_invasive size_tumor_high_NCCN logistic muscle_invasive preop_cyto_result logistic muscle_invasive multifocal logistic muscle_invasive previous_cystectomy logistic muscle_invasive variant_histology *multivariate analysis //eventuale aggiunta di lsens per calcolo sensibilità e specificità //clinical model (based on variables only obtainable at CT and anamnestic evaluation) logistic muscle_invasive clinicalt_high size_tumor_high_EAU multifocal previous_cystectomy preop_cyto_result, coef lroc looclass muscle_invasive clinicalt_high size_tumor_high_EAU multifocal previous_cystectomy preop_cyto_result, model(logit) fig capture drop clinical_model_EAU_prediction predict clinical_model_EAU_prediction label variable clinical_model_EAU_prediction "Clinical model EAU" logistic muscle_invasive clinicalt_high size_tumor_high_NCCN multifocal previous_cystectomy preop_cyto_result, coef lroc looclass muscle_invasive clinicalt_high size_tumor_high_NCCN multifocal previous_cystectomy preop_cyto_result, model(logit) fig capture drop clinical_model_NCCN_prediction predict clinical_model_NCCN_prediction label variable clinical_model_NCCN_prediction "Clinical model NCCN" //endoscopic model (based on variables only verifiable after URS) logistic muscle_invasive grade_bio variant_histology, coef lroc looclass muscle_invasive grade_bio variant_histology, model(logit) fig capture drop endoscopic_model_prediction predict endoscopic_model_prediction label variable endoscopic_model_prediction "Endoscopic model" //tumor-related model (based only on tumor features) logistic muscle_invasive grade_bio clinicalt_high size_tumor_high_EAU multifocal variant_histology lroc looclass muscle_invasive grade_bio clinicalt_high size_tumor_high_EAU multifocal variant_histology, model(logit) fig capture drop tumor_model_EAU_prediction predict tumor_model_EAU_prediction label variable tumor_model_EAU_prediction "Tumor model EAU" logistic muscle_invasive grade_bio clinicalt_high size_tumor_high_NCCN multifocal variant_histology lroc looclass muscle_invasive grade_bio clinicalt_high size_tumor_high_NCCN multifocal variant_histology, model(logit) fig capture drop tumor_model_NCCN_prediction predict tumor_model_NCCN_prediction label variable tumor_model_NCCN_prediction "Tumor model NCCN" //staging model (based only on clinical tumor grade and stage, which are the strongest predictors of worse prognosis) logistic muscle_invasive grade_bio clinicalt_high lroc looclass muscle_invasive grade_bio clinicalt_high, model(logit) fig capture drop staging_model_prediction predict staging_model_prediction label variable staging_model_prediction "Staging model" *Run the decision curve with dca command (https://www.danieldsjoberg.com/dca-t...ial-stata.html) and save out net benefit dca muscle_invasive clinical_model_EAU_prediction clinical_model_NCCN_prediction endoscopic_model_prediction tumor_model_EAU_prediction tumor_model_NCCN_prediction staging_model_prediction, xstart(0.05) xstop(0.35) xlabel(0(0.01)0.35) smooth /// saving("DCA Output marker.dta", replace) *nomogram visual description is executed on the predictive model with the best AUC and net benefit nomolog * Calculate the increase net benefit with different cut-off (5% increase) of the predictive model with the best AUC and net benefit use "DCA Output marker.dta", clear g advantage = model - all label var advantage "Increase in net benefit from using Marker model" *Calculate the interventions avoided of the predictive model with the best AUC and net benefit dca muscle_invasive model, prob(no) intervention xstart(0.05) xstop(0.35)
Francesco Ditonno