Can predict be used after logit regression with a response scale of 1/0?

Nicola Shelley Williams

Join Date: Aug 2022

Posts: 4
#1

Can predict be used after logit regression with a response scale of 1/0?

31 Aug 2022, 09:17

I am using predict in Stata 17.0 after logit regression. I am trying to create predictions of 1 or 0 instead of probabilities of 1. Can specifications be made using predict which are not covered in the literature to change the response scale? Is there another program for this purpose?

logit outcome_decisive rebstrength_1 lmtnest thirdparty popcategory polity2 efindex i.Oil i.ConflictIntensity i.ethnic lgdp_percap i.terrcont_1, or vce(cluster conflictep_id)

predict pr1

summarize outcome_decisive pr1

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
outcome_de~e | 383 .464752 .4994084 0 1
pr1 | 1,582 .414112 .2158658 .0391781 .9871149
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

31 Aug 2022, 10:04

I struggle to understand why you would want to do this. Logistic regression fundamentally models a function of the probability that something happens. So, naturally, with predict you would get a probability, which is going to be between 0 and 1.

I guess you could do something like

Code:

predict prob_outcome_decisive gen predicted = runiform() < prob_outcome_decisive

If you summarize that predicted variable, you should have predicted's mean very close to the mean of the outcome. But why is this a meaningful goal?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10085
#3

31 Aug 2022, 10:11

I do not understand the question. Can you illustrate what you want with an example? If you are thinking about something along the lines of calculating the hit ratio (percentage correctly classified), then it relies on some rule about the predicted probabilities, namely \(\hat{p}_i \geq 0.5 \rightarrow \hat{y}_i = 1\) and \(\hat{p}_i < 0.5 \rightarrow \hat{y}_i = 0\). In any case, you can obtain this statistic directly by running

Code:

estat classification

after logit.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#4

31 Aug 2022, 10:25

Note that round() alone would map probabilities to whichever of 0 or 1 is nearer.

Code:

. di round(0.4999) 0 . di round(0.5001) 1

But note also that people don't always use 0.5 as a cut-off!
2 likes
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#5

31 Aug 2022, 19:11

Strictly speaking, it is impossible for logistic regression to yield exactly 0 or 1, for the reasons already given here.

Once you have fit your model and obtained predicted probabilities, it is entirely up to you to decide how to interpret or use those probabilities. It is always better to use the raw probabilities rather than collapse them to a binary response because of the loss of information involved. You may decide to use a cut-off, as suggested by Nick for example, but it forces you to decide what does and does not constitute an event. Even choosing the marginal average probability may not be the most sensible, so there is no single best answer here.
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#6

01 Sep 2022, 02:20

There is a nuance here. Some researchers want to use a cut-off to classify predictions as nearer 0 or nearer 1 compared with that cut-off. My contribution is not to suggest the use of cut-offs, but rather to suggest code to do that if that is what you want. Given some other cut-off for some reason, a different rounding is easy with say

Code:

gen wanted = predicted > 0.4

while

Code:

gen wanted1 = predicted >= 0.5 gen wanted2 = round(predicted)

are equivalent for predicted probabilities.

So, in terms of #1 there is no need for another program, as all that is needed is to get the predicted probabilities and then coarsen them with a single statement after predict.
2 likes
Comment
Nicola Shelley Williams

Join Date: Aug 2022

Posts: 4
#7

01 Sep 2022, 15:37

Thank you for all the responses. That makes good sense to use the predicted probabilities and generate new vars for analysis. Thank you once again!
Comment

Announcement

Can predict be used after logit regression with a response scale of 1/0?

Comment

Comment

Comment

Comment

Comment

Comment