Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation of classification table in stata for a logistic regression

    Hi

    To check if my logistic model make any sense, i tried to check the classification table with the command "estat class".

    I received the following result (see picture)

    But now I have some problems to understand what it's saying and how good my model represent the use of earnouts. Especially I don't understan why I receive a sensitivity of 0% and what I can conlude out of it.

    In the logistic model the dependent variable earnout takes the value 1, if there was an earnout and 0, if not.

    I really could need some help.

    Thanks in advance.

    Markus

  • #2
    Welcome to the Forum. Your model was only able to correctly classify the true negatives.
    Best regards,

    Marcos

    Comment


    • #3
      Thanks for your answer. So I guess these are not good news for my model. But how can I change this? I can't change the dependent variable which is a dummy with the values 0 and 1. Do I need other independent variables? Or did I enter the wrong command for the logistic regression? I enter the following command: "logistic Earnout CrossBoarderTranaktion IFRSvorRevision2008 year AnteilEquitiy LogDealValue Age TargetinHighTechoderService LogAcquirorTotalAssets TargetQuotiert"
      Earnout is the dependent variable and the others are indpendent or control variables. Most of them are dummies.

      In my data sample there are exactly 123 Earnouts which means that the variable Earnout reaches 123 times the value 1.
      Last edited by Markus Mueck; 08 Jan 2016, 06:55.

      Comment


      • #4
        Markus,

        Forgive me if you already know this, but here goes. Your logistic regression model is predicting a probability of having earnout = 1 for each observation. In estat class, the program calculated sensitivity and specificity as if your cutoff were 0.5, i.e. a predicted probability of earnout of 0.5. This may or may not be the optimal cutoff point. Clearly, for every single person, the predicted probability is under 0.5.

        Is estat class the only diagnostic you ran? Personally, I would do this:

        1) Calculate area under the ROC curve, aka c-statistic. Command is lroc.

        C-statistic is a summary measure of how well a model discriminates between cases and non-cases. You're looking for a c-statistic of 0.7. If your c-statistic is 0.5, your model does no better than random chance, i.e. it's worthless.

        2) Try this command:
        estat gof, table(10)

        This assesses how well the model fits. Like I said, every person has a predicted probability of the event, which I'll call a risk score. Say that among the top 10% of risk scores, the average score was 0.4. Do 40% of those people actually get the event? Is it more, or less? Estat gof tries to assess that. If the chi square statistic it calculates is significant (<0.05), then your model fit is poor (this is a simplification)

        3) Do you actually want to use the model to predict who is at risk of having the event of interest? If so, try lsens, which gives graphs of sensitivity vs risk score and specificity vs risk score. One good place to choose for a starting risk score is where the sensitivity and specificity curves cross. For example, in example 1 in the Stata manual, they're crossing at a risk score of about 0.3.

        http://www.stata.com/manuals13/rlsens.pdf

        You could then rerun estat class, but select a different cutoff. For example, maybe try :
        estat class, cutoff(0.3)
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Hi
          Thanks for your answer. I calculated the ROC curve. As a result, I got an area under ROC curve of 0.7187. So this means my model should be ok I guess. I also tried your second command, but stata always says "option table() incorrectly specified". Without the command "table(10)," I got a p-Value of 0.0000, which is good.

          And you've been also right with the third point. If i select an other cutoff point (as you recommend a cutoff point of 0.3), the classification looks way better.

          So I'm really thankful for your post! Without it, I already thought I have to throw away my model.

          Comment


          • #6
            Markus Mueck : I believe the most important issues concerning modeling are: The rationale. After that, the selection of the variables (you said most of them are dummies. Maybe they don't discriminate much in your model, who knows). Last but not least, the background literature (whether it is according to the benchmark papers), let alone power, sample size issues. In fact, I fear you didn't share much information about these aspects. Maybe you could get further help by presenting the main commands as well as results, as suggested per FAQ.

            By the way, the correct command so as to provide a Hosmer-Lemeshow test is:

            Code:
            . estat gof, group(10)

            Please keep in mind that in #5 what you got was the Pearson gof test. Such a low p-value is the opposite you believed, that is, it is "no good" in terms of having a "fit" model. I suspect the Hosmer-Lemeshow test won't differ much.

            Hopefully that helps.

            Best,

            Marcos
            Best regards,

            Marcos

            Comment


            • #7
              Hi Marcos

              I recognized my mistake of interpretation of the Pearson gof test. Fortunately the Hosmer-Lemeshow test differ. There I got an p-Value of .41. That's maybe because the number of covarietes patterns ( 1443) are almost the number of observations (1446).

              My command for the model was the following:

              logit Earnout TargetinHighTechoderService CrossBoarderTranaktion AnteilEquitiy LogAcquirorTotalAssets T
              > argetQuotiert Alter18 IFRSvorRevision2008 LogDealValue Jahr

              The result out of it you can see in the picture. There are 1446 observations (126 Earnouts). I prepared the data in Excel cause of my lack of knowledge in Stata.

              Best
              Markus

              Comment


              • #8
                Markus,

                It sounds like you said that you re-ran the command for the Hosmer-Lemeshow goodness of fit test using the , group(10) option (thanks Marcos for the correction), and you got a p-value of 0.41. That's good. You have a c-statistic of 0.71, which is acceptable. You do have a pseudo r squared of 0.08something, which is not good, but you may be unable to get better than that.

                I think that if you wanted to show the effect of those variables on the odds of having earnout, then you've got what you need. If you want to build an actual prediction model, then you may be out of luck (but you haven't shown your classification table with a different cutoff, so I don't know). If you want the marginal effects expressed in percentage terms, you should Google the margins command. Remember your model is estimating odds ratios, not relative risks.

                FYI, You have 1,443 covariate patterns because you've got some continuous variables in your model. Actually, I think you do want to run estat gof, group(10) table - the plain old GOF test won't work well with continuous covariates, I believe. Long explanation that I'm not sure how to articulate.
                Last edited by Weiwen Ng; 11 Jan 2016, 13:18.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment


                • #9
                  Hi, I know this a post from 2015 but I ran into a same problem and have done what everyone has suggested (thank you all for the answers!!). Since my reviewer has suggested me to include spec and sensitivity, I need to but is it inappropriate to include the ones with the command of "estat class, cutoff(0.15)" instead of "estat classification"? Also, I do understand spec and sensitivity and I know that if they are 0 or 100 it is no good but in logistic regression and just in general, what would be the actual numbers that are relatively accepted. I know that this may not be a good question to ask but just wondering.

                  Also, how do you include BIC, spec and sensitivity in the results of logistic regression Table? If someone has an example I would appreciate that.

                  Last, can someone tell me how to get the logistic regression results in figures in STATA? My model has both numeric and group factors (Ex logistic canceroutcome age sex country...).

                  Thanks for reading!

                  Comment

                  Working...
                  X