Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predicted Values are all zero

    I am using Firthlogit and trying to calculate predicted values for DV. After firthlogt, I typed the following commands.

    estimate store full
    predict double pr, xb
    quietly replace pr = invlogit(pr)

    tab pr

    Variable | Obs Mean Std. dev. Min Max
    -------------+---------------------------------------------------------------------------
    pr | 161,007 .0006949 .0005993 .0001676 .1027197

    I need to create a new variable that is coded 1 if the predicted value is greater than 0.5 and 0 if it is less than 0.5 in order to predicted vs. observed values. But all predicted values are less 0.5 (Maximum value is 0.1027197). My DV is rare events with excess zeros.

    Can you help me what is wrong and what to do?
    Last edited by April Kimm; 12 Mar 2022, 02:33.

  • #2
    Nothing is wrong. Your maximum predicted probability is less than 0.5, so if makes sense that no new value is, or would be, 1. Your new variable would be useless but the calculation was correct.

    A plot of predicted versus observed or vice versa should use the predicted probabilities.

    In any case, consider ideally observed = (0 or 1) and predicted_is_greater_than_0.5 = (0 or 1). The corresponding scatter plot is at most 4 blobs and although you can jitter points to give an impression of frequency you're still better off with a 2 x 2 table.
    Last edited by Nick Cox; 12 Mar 2022, 03:22.

    Comment


    • #3
      Using cutpt command in stata, I calculated an optimal cutpoint but only 53 cases are correctly predicted for DV value (1) out of 77,307.

      | 0 1 | Total
      -----------+----------------------+----------
      0 | 83,193 77,254 | 160,447
      1 | 52 53 | 105
      -----------+----------------------+----------
      Total | 83,245 77,307 | 160,552

      Comment


      • #4
        The conclusion is that your model is just not a very good predictor of the event. It is highly specific (gets 83,193 out of 83,245 DV = 0 cases right) but has poor sensitivity (53 out of 77,307 DV = 1 cases).

        You can always trade off specificity for more sensitivity by moving the cutpoint still closer to zero. Do not believe that any automagically (that's not a typo, hat tip Maarten Buis) generated cutpoint from some software program is ever actually optimal, unless it comes from a decision analysis that takes into account the disutilities of false positive and false negative results and the prevalence of the condition being modeled. The -cutpt- program you describe uses three types of "optimal" cutpoints: the Liu method, the Youden method, and the "closest to (0, 1) on the ROC curve." But none of these are utility based, so none of them is optimal in any useful sense of the word. They are pretenses at optimality deriving from schools of thought that shun the "subjectivity" that is inherent in the very word "optimal." They are, in other words, category errors.

        All of that said, this is also a good lesson in the extreme difficulty of coming up with models that predict rare events. You are hardly the first person to encounter this difficulty, and I am sure you will not be the last. If you experiment, you may find a cutpoint that provides you with a combination of sensitivity and specificity that is more to your liking. It is possible, but unlikely you will find one that provides really high levels of both sensitivity and specificity. (And even if you do, the positive predictive value will almost surely be disappointing.) There's nothing wrong with how your analyzing the data. You are stumbling some unpleasant aspects of reality.
        Last edited by Clyde Schechter; 13 Mar 2022, 16:07.

        Comment

        Working...
        X