Score after binary logistic Regression

Alex Moschovas

Join Date: Mar 2021

Posts: 13
#1

Score after binary logistic Regression

11 Apr 2022, 18:41

Hallo,

I have a binary logistic regression model for probability of death after cardiac surgery. It cosists of continuous and binary variables (0 1). How can I use Stata to express my Logistic Regression as a Score table, which interprets the prediction of mortality?

Best regards,

Alex Moschovas
Tags: None

Alex Moschovas

Join Date: Mar 2021
Posts: 13

11 Apr 2022, 19:21

Here I present an example of my code , LDHASAT and MELDXIScore are continuous variables and VHF, HLM binary.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte verstorben double LDHASAT byte(MELDXIScore VHF HLM)
0              7.98 9 0 0
0 2.132978723404255 9 0 0
0 7.914285714285715 9 0 0
0 9.605263157894736 9 0 0
0 9.043478260869565 9 0 0
end

Comment

David Radwin

Join Date: Mar 2014
Posts: 368

13 Apr 2022, 15:53

I don't think your example data is sufficient to estimate a logistic regression model because, among other things, the outcome verstorben (death) does not vary.

In any case, are you looking for predicted probabilities? If so, you might try predict like this:

Code:

. sysuse nlsw88, clear
(NLSW, 1988 extract)

. logit married age grade collgrad south wage hours

Iteration 0:   log likelihood = -1459.9347  
Iteration 1:   log likelihood = -1433.9283  
Iteration 2:   log likelihood = -1433.7744  
Iteration 3:   log likelihood = -1433.7744  

Logistic regression                                     Number of obs =  2,240
                                                        LR chi2(6)    =  52.32
                                                        Prob > chi2   = 0.0000
Log likelihood = -1433.7744                             Pseudo R2     = 0.0179

------------------------------------------------------------------------------
     married | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |  -.0124664   .0145931    -0.85   0.393    -.0410684    .0161357
       grade |   .0334173   .0300351     1.11   0.266    -.0254505    .0922851
    collgrad |  -.0198933   .1759544    -0.11   0.910    -.3647575     .324971
       south |   .0348279    .092213     0.38   0.706    -.1459062    .2155619
        wage |   -.011135   .0081047    -1.37   0.169    -.0270199    .0047499
       hours |  -.0305355   .0046682    -6.54   0.000    -.0396849   -.0213861
       _cons |   1.868139   .7118118     2.62   0.009     .4730136    3.263265
------------------------------------------------------------------------------

. predict yhat
(option pr assumed; Pr(married))
(6 missing values generated)

. list married age grade collgrad south wage hours yhat in 1/10, clean noobs

    married   age   grade           collgrad       south       wage   hours       yhat  
     Single    37      12   Not college grad   Not south   11.73913      48   .5526719  
     Single    37      12   Not college grad   Not south   6.400963      40   .6260239  
     Single    42      12   Not college grad   Not south   5.016723      40   .6149762  
    Married    43      17       College grad   Not south   9.033813      42    .621802  
    Married    42      12   Not college grad   Not south   8.083731      48   .5473188  
    Married    39      12   Not college grad   Not south    4.62963      30   .6932468  
     Single    37      12   Not college grad   Not south   10.49114      40   .6153015  
    Married    40      18       College grad   Not south   17.20612      45   .5951784  
    Married    40      14   Not college grad   Not south   13.08374       8    .809591  
    Married    40      15   Not college grad   Not south   7.745568      50   .5640763

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him

Comment

Alex Moschovas

Join Date: Mar 2021

Posts: 13
#4

14 Apr 2022, 00:56

Thank you for your answer,
I would actually like to create a sum score table by giving points to each variable , which would be translated to a probability of death at the very end. For example by New coming patients I could add his points and then could find out what the probability is.
Thank you in advance
Comment
David Radwin

Join Date: Mar 2014

Posts: 368
#5

14 Apr 2022, 11:00

I see. Unlike ordinary least squares (linear) regression, there isn't a simple way to assign probabilities in logistic regression based on a single covariate because the predicted outcome also depends on all the other covariates.

I think the closest approach to what you want is to use the margins command. If there isn't enough information in the help file, the following article is very good.

Williams, R. (2012). Using the margins command to estimate and interpret adjusted predictions and marginal effects. The Stata Journal, 12(2), 308-331. https://journals.sagepub.com/doi/pdf...867X1201200209

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4414
#6

14 Apr 2022, 12:02

I'm not sure I fully understand what you want but my guess is that you want something like the Framingham risk score - see Sullivan, LM, et al., (2004), "Presentation of multivariate data for clinical use: the Framingham Study risk score functions," Statistics in Medicine, 23: 1631-1660; note that some substantive knowledge/expertise makes this process much easier
Comment
Alex Moschovas

Join Date: Mar 2021

Posts: 13
#7

14 Apr 2022, 18:03

Thank you very much for the answers. Margins is a spectacular Tool and I appreciate the link and further reading Suggestion. I think I can make also great graphs with marginsplot and obtain a lot of info for future patients using the predict command for my logit model. That would be the one way to solve the problem.
I think the Framingham Study risk score Functions paper is closer to answer my question, as I can translate a sum score to probability. Recently , I read the" Development of scoring system for risk stratification in clinical medicine: a step-by-step tutorial - Zhang - Annals of Translational Medicine" paper. They perform the steps in R. Is there an equivalent tutorial in Stata?
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4414
#8

14 Apr 2022, 20:09

first, I went to that journal and was unable to find that article using their search function - please provide a full citation

second, I do not know of any such tutorial in Stata; however, I have done a number of these and can help with specific questions you may have if you supply the appropriate information
Comment
Alex Moschovas

Join Date: Mar 2021

Posts: 13
#9

15 Apr 2022, 05:39

For the first question concerning a tutorial in R you find the link here: https://atm.amegroups.com/article/view/16442/html
for the Second Part of the Subject concerning more Information, I Could send another dataex example.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4414
#10

15 Apr 2022, 07:04

thank you for the link
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4414
#11

15 Apr 2022, 12:13

for anyone who has been following this and might be interested in scoring systems (of which there are many, not just those cited above), you might also be interested in the following viewpoint: Gordon, WA, et al. (2010), "Coronary risk assessment by point-based vs. equation-based Framingham models: significant implications for clinical care," Journal of General Internal Medicine, 25(11): 1145-1151

finally, I note that some of my clients very much prefer scoring (point-based) systems since (1) they think it is much easier to follow and especially to relate to clinical knowledge and (2) on occasion I have found that it generalized better than equation-based (e.g., using -predict-) systems
Comment
Alex Moschovas

Join Date: Mar 2021

Posts: 13
#12

20 Apr 2022, 01:28

Hallo Mr. Goldstein,
Concerning the articles that we exchanged , should I do my scoring based on the Framingham article which you sent me or based on the Chinese Paper with the Lowess method for continuous variables which I sent you? Which one would best suit?
and if we choose the last method , how can I run it on Stata?

Last edited by Alex Moschovas; 20 Apr 2022, 01:52.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4414
#13

20 Apr 2022, 07:10

Alex Moschovas sorry but (1) I don't know enough about your situation to give that level of advice and (2) each of these scoring systems, and others that I know of, require substantive knowledge to implement - and I don't have that knowledge; my suggestion is that whichever you choose, you break the task into steps or sub-tasks and implement each of those (or at least those that don't require substantive knowledge) and ask for help here for those that don't require substantive knowledge and ask at the appropriate place for those steps that do require substantive knowledge
Comment

Announcement