Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation of classification table in stata for a logistic regression and ROC curve

    Dear Everyone,

    I performed some tests for my logistic regression: lroc, estat class, cutoff(0.15), and estat gof, group (10).


    My results are as follows:

    1. lroc

    Logistic model for phdv

    number of observations = 10051
    area under ROC curve = 0.6266


    2. estat class, cutoff(0.15)
    -------- True --------
    Classified D ~D Total
    625 2615 3240
    -699 6112 6811
    Total 1324 8727 10051
    Classified + if predicted Pr(D) >= .15
    True D defined as phdv != 0
    Sensitivity Pr( + D) 47.21%
    Specificity Pr( -~D) 70.04%
    Positive predictive value Pr( D +) 19.29%
    Negative predictive value Pr(~D -) 89.74%
    False + rate for true ~D Pr( +~D) 29.96%
    False - rate for true D Pr( - D) 52.79%
    False + rate for classified + Pr(~D +) 80.71%
    False - rate for classified - Pr( D -) 10.26%
    Correctly classified 67.03%

    3. estat gof, group(10)

    Logistic model for phdv, goodness-of-fit test

    (Table collapsed on quantiles of estimated probabilities)

    number of observations = 10051
    number of groups = 10
    Hosmer-Lemeshow chi2(8) = 4.36
    Prob > chi2 = 0.8228



    Please advice and help me if i can go ahead with the model. I mean if the model is good or not. I fail to draw conclusion.

    Thanking you in advance

    S. Rinchen




  • #2
    You do not say what you are studying, what question you are trying to answer, or why you think logistic regression might be helpful.
    You have a very large dataset, and a logistic regression model that passes some basic tests - specifically the Hosmer-Lemeshow test
    suggests that the predicted probabilities broadly match the event rates.


    There is nothing here to cause alarm.

    Comment


    • #3
      Dear Paul,

      Thank you for your response.

      I am examining physical domestic violence (dependent variable; coded 1 if physical violence, and" 0" otherwise) and independent variables womens age (wb2); No education ( noedu12 ); Primary education; (w_pedu); Wealth Quantiles ( poorest second middle fourth); area (urban); Region (central western); pregnent women (cpreg ); womens age at first marriage (funion1); living children ( tchild); Children death (cdead). Most of the independent variables are dummy except age at first marriage, children dead, womens age and age at first marriage.

      My concern is that, after running logistic regression the psudo R2 is just 0.033, which is extremely low. However, most of the independent variables are significant. At the same time, the low ROC curve (0.62) and classification results (command used: estat class, cutoff (0.15)). The cutoff was set is judgemental after considering the output of "lsens" command.

      Now, with all these results, should i consider my results worthwhile? Should i produce this analysis? all these things bothers me. What should i do next? Could you please help me.

      Thanking you a lot

      S. Rinchen

      Comment


      • #4
        What you are asking is, for the most part, not a statistical question.

        Your area under the ROC curve is mediocre, suggesting that your predictive model doesn't provide a whole lot of discrimination between domestic violence cases and non-cases, but it's better than flipping a coin. Passing the Hosmer-Lemeshow test suggests, as Paul Seed says, that your predicted probabilities broadly match the observed probabilities in each decile of predicted risk. Given the pretty large sample, that's actually fairly impressive: small discrepancies are easily picked up in large samples. But if the predicted probability in the lowest decile and that in the highest don't differ by very much, then that isn't terribly useful. (And the mediocre ROC area suggests this is the case.)

        But the real "paydirt" is the -lsens- output. And this is where we leave the realm of statistics. Assuming you stick with the 0.15 cutoff, if you use your model to predict domestic violence, 19.29% of your positive predictions will be correct, and 89.74% of your negative predictions will be correct. The rest will be wrong. The question is: what are the utilities of the real world consequences of making correct and incorrect predictions in each case? How will you act on the predictions? If the consequences of getting things wrong are serious (people will die, or be wrongly incarcerated), then these error rates look disturbingly high. But if the predictions will only lead to a slightly more intrusive investigation into things, then these error rates could be perfectly acceptable. The point is, that the usefulness of a model with these properties cannot be assessed solely on the basis of its statistical performance: you must weigh that performance in light of the consequences of the predictions and the actions the predictions will precipitate.

        Comment


        • #5
          Dear Schechter,

          Thank you very much for clarifying my doubts. Would look forward for the same henceforth.

          Thanking you

          Comment

          Working...
          X