Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Percent correctly predicted in probit estimation

    Dear statalisters,
    I checked my probit model by applying the percent correctly predicted method for its goodness-of-fit by using the following commands:

    Code:
    qui probit atrisk_only age_mth female javanese wi1 muslim i.region logfoodconpc year14 lognonfoodexppc percentagefruits percstarch smallhh bmi_fa2 bmi_mo secondary_fa  underweight_fa if age_mth>=24, cl(cl_id)
    estat class, cutoff(0.5)
    Here is my stata-output:

    Code:
    Probit model for atrisk_only
    -------- True --------
    Classified | D ~D | Total
    -----------+--------------------------+-----------
    + | 1 1 | 2
    - | 245 2474 | 2719
    -----------+--------------------------+-----------
    Total | 246 2475 | 2721
    
    Classified + if predicted Pr(D) >= .5
    True D defined as atrisk_only != 0
    --------------------------------------------------
    Sensitivity Pr( +| D) 0.41%
    Specificity Pr( -|~D) 99.96%
    Positive predictive value Pr( D| +) 50.00%
    Negative predictive value Pr(~D| -) 90.99%
    --------------------------------------------------
    False + rate for true ~D Pr( +|~D) 0.04%
    False - rate for true D Pr( -| D) 99.59%
    False + rate for classified + Pr(~D| +) 50.00%
    False - rate for classified - Pr( D| -) 9.01%
    --------------------------------------------------
    Correctly classified 90.96%
    It seems that my model is good in predicting when y=0 (what is not a surprise, since in most cases of my sample y=0), but fails to predict when y=1.
    Is it correct to take this as a sign that my model does not have a good fit? What could I do to improve my model? Are there other goodness-of-fit measures that might be more suitable for my case?

    Thank you in advance!
    Best Jan

  • #2
    But this command only tests the predictive accuracy using a threshold of predicted probability = 0.5. It may be that with a different threshold things will look better. I would begin by getting the area under the ROC curve (-lroc-). If that area is not good (say < 0.7) then I would consider the model inadequately discriminating. But if you get an area >= 0.7, then it would be worth trying a different threshold to see how the classification probabilities work out--there should be some threshold where the results will look fairly reasonable.

    By the way, discriminatory power is only one aspect of fit in these models. Don't forget to test calibration as well (-estat gof-).

    Comment


    • #3
      Thank you very much, this helps me a lot!

      Comment


      • #4
        I find that the -estat clas- command is often worthless, especially when one of the events is much rarer than the other. For example if 90% of the cases are 1s then 100% of the cases may get classified as 1s, so you have 90% accuracy, which really doesn't mean much.

        The Adjusted Count R^2 may be better. It basically tells you how much better the model does than just predicting that every case falls into the majority category. See pp. 11-12 of

        https://www3.nd.edu/~rwilliam/stats3/L05.pdf

        That handout also discusses other measures of fit.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment

        Working...
        X