Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can predict be used after logit regression with a response scale of 1/0?

    I am using predict in Stata 17.0 after logit regression. I am trying to create predictions of 1 or 0 instead of probabilities of 1. Can specifications be made using predict which are not covered in the literature to change the response scale? Is there another program for this purpose?


    logit outcome_decisive rebstrength_1 lmtnest thirdparty popcategory polity2 efindex i.Oil i.ConflictIntensity i.ethnic lgdp_percap i.terrcont_1, or vce(cluster conflictep_id)

    predict pr1

    summarize outcome_decisive pr1



    Variable | Obs Mean Std. dev. Min Max
    -------------+---------------------------------------------------------
    outcome_de~e | 383 .464752 .4994084 0 1
    pr1 | 1,582 .414112 .2158658 .0391781 .9871149


  • #2
    I struggle to understand why you would want to do this. Logistic regression fundamentally models a function of the probability that something happens. So, naturally, with predict you would get a probability, which is going to be between 0 and 1.

    I guess you could do something like

    Code:
    predict prob_outcome_decisive
    gen predicted = runiform() < prob_outcome_decisive
    If you summarize that predicted variable, you should have predicted's mean very close to the mean of the outcome. But why is this a meaningful goal?
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      I do not understand the question. Can you illustrate what you want with an example? If you are thinking about something along the lines of calculating the hit ratio (percentage correctly classified), then it relies on some rule about the predicted probabilities, namely \(\hat{p}_i \geq 0.5 \rightarrow \hat{y}_i = 1\) and \(\hat{p}_i < 0.5 \rightarrow \hat{y}_i = 0\). In any case, you can obtain this statistic directly by running

      Code:
      estat classification
      after logit.

      Comment


      • #4
        Note that round() alone would map probabilities to whichever of 0 or 1 is nearer.


        Code:
        . di round(0.4999)
        0
        
        . di round(0.5001)
        1
        But note also that people don't always use 0.5 as a cut-off!

        Comment


        • #5
          Strictly speaking, it is impossible for logistic regression to yield exactly 0 or 1, for the reasons already given here.

          Once you have fit your model and obtained predicted probabilities, it is entirely up to you to decide how to interpret or use those probabilities. It is always better to use the raw probabilities rather than collapse them to a binary response because of the loss of information involved. You may decide to use a cut-off, as suggested by Nick for example, but it forces you to decide what does and does not constitute an event. Even choosing the marginal average probability may not be the most sensible, so there is no single best answer here.

          Comment


          • #6
            There is a nuance here. Some researchers want to use a cut-off to classify predictions as nearer 0 or nearer 1 compared with that cut-off. My contribution is not to suggest the use of cut-offs, but rather to suggest code to do that if that is what you want. Given some other cut-off for some reason, a different rounding is easy with say

            Code:
            gen wanted = predicted > 0.4
            while

            Code:
            gen wanted1 = predicted >= 0.5 
            gen wanted2 = round(predicted)
            are equivalent for predicted probabilities.

            So, in terms of #1 there is no need for another program, as all that is needed is to get the predicted probabilities and then coarsen them with a single statement after predict.

            Comment


            • #7
              Thank you for all the responses. That makes good sense to use the predicted probabilities and generate new vars for analysis. Thank you once again!

              Comment

              Working...
              X