lroc after logit

Martin Mueller

Join Date: Jun 2017

Posts: 75
#1

lroc after logit

28 Aug 2017, 10:35

Dear Forum members,

I have a dataset containing about 1000 observations with a binary reference variable (gold_standard = cult_positive =1/0) and two continouus predictor variables of a test between 0 and 1000 (UF_lc UF_bact). I want to define cut-off variables c1 and c2 for the test, that maximise the youden-index.

If I run the following commands

Code:

logistic cult_pos UF_lc UF_bact lroc

I got a very nice ROC curve and Area under ROC curve = 0.9217.

My questions are:
i) How is Stata drawing this curve? If there is only one prediction parameter (T) in the logit model, it is clear to me, that the ROC curve plots parametrically the false positive and true positive rate with T as the varying parameter. For two prediction parameters I don't understand the interpretation. Is it possible to label the different "cut-off values" for UF_lc and UF_bact for some points at the curve?

ii) Is there an easy way to get the combination c1, c2 of paramters so that a positive test defined as "UF_lc <= c1 and UF_bact <= c2" has the maximum youden index (sens+spec-1)?

Tanks in advance for your thoughts on this.
Best wishes
Martin
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29201
#2

28 Aug 2017, 10:48

i) How is Stata drawing this curve? If there is only one prediction parameter (T) in the logit model, it is clear to me, that the ROC curve plots parametrically the false positive and true positive rate with T as the varying parameter. For two prediction parameters I don't understand the interpretation. Is it possible to label the different "cut-off values" for UF_lc and UF_bact for some points at the curve?

-lroc-, regardless of how many predictors are in the logistic model, calculates the ROC curve using the predicted probability generated by the model as the varying parameter. So there are no separate "cut-off values" for UF_lc and UF_bact: instead it is the predicted probability calculated from both of them by the logistic regression equation.

ii) Is there an easy way to get the combination c1, c2 of paramters so that a positive test defined as "UF_lc <= c1 and UF_bact <= c2" has the maximum youden index (sens+spec-1)?

I don't think there is any way to do this short of calculating it for a range of values of c1 and c2 and picking the largest one. That said, in my opinion, the Youden index is somewhere on the spectrum between misleading and worthless, so I wouldn't invest much effort in this.
Comment
Martin Mueller

Join Date: Jun 2017

Posts: 75
#3

29 Aug 2017, 12:48

Dear Clyde,

thank you very much for your clarifying and helpful response and opinion! This was really of great help!

Do you have any recommendation better than the Youden's Index to find the "best" cut-offs for the values?

I like to find different cut-offs for different scenarios:

i) Cut-offs for UF_lc and UF_bact to be really sure that I don't need a work-up for this patient (thus, high sensitivity for the test; rule-out)

ii) A good trade of between sens and spec to find the test "best" cut-off

I read some literature about the (sadly, non-existing) "best" cut-off. In my opinion, even for one parameter, it is sadly unclear and somehow arbitrary. My understanding is to i) maximize youden's index, ii) minimize distance to Corner (ROC curve), iii) maximize negative Likelihood Ratio (LR), maximize diagnostic Odds Ratio (DOR), some Cost-thoughts (not possible in the scenario). Don't see one advantage over the other.

In addition, I have two paramters (independantly predicting the outcome - see below) and this makes the problem even more complex and inconclusive (as I van vary both paramters) and I cannot find any good literature or "algorithms" to solve this problem. Do you know some references or have some thoughs about it?

Thank you very much!

Best wishes
Martin
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29201
#4

29 Aug 2017, 13:41

You're not going to like my answer, I think. I regard any attempt to "optimize" using statistics that derive only from sensitivity and specificity, or likelihoods, are not even admissible candidates. I subscribe to the Bayesian decision theoretic approach for optimization. So first that means that the relevant test operating statistics are the positive and negative predictive values, and, much more important, to me the very word "optimize" means maximizing expected utility.

So you have to have utilities to assign to the consequences of correctly and incorrectly identifying both true cases and true non-cases. And you need positive and negative predictive values, which you might either calculate directly from your data, or by applying Bayes' theorem to the sensitivities and specificities to a known or estimated prevalence of true cases in the population to which these tests will be applied. Without those ingredients, in my view, anything that is called "optimal cutpoint" is just a fraud. It's not science. It's not statistics. It's "mathiness."
Comment

Announcement

Comment

Comment

Comment