Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trying to get same results from Book: Logistic Regression Models. Hilbe J, 2009.

    Hi all,

    I am I am trying to replicate some results from book: Logistic Regression Models. Hilbe J, 2009. Pages: 202-204. Using titanic dataset.
    Lamentably my results are not the same that book results.

    Below I attached some results from this book and my resullts.

    Please any comments I would grateful.
    Code:
    
    . logit sur  i.sex##i.class,  nolog
    
    Logistic regression                               Number of obs   =       1316
                                                      LR chi2(5)      =     515.16
                                                      Prob > chi2     =     0.0000
    Log likelihood = -615.79775                       Pseudo R2       =     0.2949
    
    --------------------------------------------------------------------------------
          survived |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
               sex |
              man  |  -4.206016   .5307502    -7.92   0.000    -5.246267   -3.165764
                   |
             class |
        2nd class  |  -1.594815   .5871695    -2.72   0.007    -2.745646   -.4439844
        3rd class  |  -3.726095    .526913    -7.07   0.000    -4.758825   -2.693365
                   |
         sex#class |
    man#2nd class  |   .4202889    .644876     0.65   0.515    -.8436449    1.684223
    man#3rd class  |   2.801977   .5621158     4.98   0.000      1.70025    3.903703
                   |
             _cons |   3.562466   .5070426     7.03   0.000      2.56868    4.556251
    --------------------------------------------------------------------------------
    
    
    . logit sur  i.sex##i.class, or nolog
    
    Logistic regression                               Number of obs   =       1316
                                                      LR chi2(5)      =     515.16
                                                      Prob > chi2     =     0.0000
    Log likelihood = -615.79775                       Pseudo R2       =     0.2949
    
    --------------------------------------------------------------------------------
          survived | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
               sex |
              man  |   .0149056   .0079112    -7.92   0.000     .0052671    .0421819
                   |
             class |
        2nd class  |    .202946   .1191637    -2.72   0.007     .0642068    .6414754
        3rd class  |   .0240867   .0126916    -7.07   0.000     .0085757    .0676529
                   |
         sex#class |
    man#2nd class  |   1.522401   .9817601     0.65   0.515     .4301398    5.388261
    man#3rd class  |   16.47718   9.262086     4.98   0.000     5.475316    49.58575
                   |
             _cons |      35.25   17.87325     7.03   0.000     13.04859    95.22579
    --------------------------------------------------------------------------------
    Click image for larger version

Name:	Páginas desde[Hilbe,_Joseph_M]_Logistic_regression_models(b-ok.org)-2_Página_1.png
Views:	1
Size:	40.2 KB
ID:	1475931

    Click image for larger version

Name:	Páginas desde[Hilbe,_Joseph_M]_Logistic_regression_models(b-ok.org)-2_Página_2.png
Views:	1
Size:	59.4 KB
ID:	1475932

    Click image for larger version

Name:	Páginas desde[Hilbe,_Joseph_M]_Logistic_regression_models(b-ok.org)-3.png
Views:	1
Size:	18.5 KB
ID:	1475933

    Last edited by Rodrigo Badilla; 21 Dec 2018, 11:24.

  • #2
    data can be download from https://www.crcpress.com/Logistic-Re.../9781138106710
    Last edited by Rodrigo Badilla; 21 Dec 2018, 12:23.

    Comment


    • #3
      Hi Rodrigo,

      I entered those values into Excel and was able to match the book's results. Remember that with logs and exponentiation, e(a + b) != e(a) + e(b), but rather e(a + b) = e(a) * e(b). In Excel, exp() is the formula for the exponential function.

      But, one reason for your difference may be that in your regression output (and the book's) you find the coefficient on man#2nd class== 0.4202889 but in the book they use .4203889 for their calculations.


      So, for your 1st example:
      Click image for larger version

Name:	Exponentiation.png
Views:	1
Size:	11.4 KB
ID:	1475948




      (1) -1.594815 + 0.4203889 = -1.1744261. exp(-1.1744261) = 0.30899626

      (2) Alternatively, exp(-1.594815) = 0.20294607 and exp(0.4203889) = 1.522401314
      0.20294607 * 1.522401314 = 0.30899626


      For those coming along later (who may want to use -margins-)
      • The titanic survival dataset can be downloaded from Stanford as a CSV here (the link is to a Stanford computer science class that uses it)
      • If it's easier, I've also attached the actual .CSV file to this post
      Attached Files
      Last edited by David Benson; 21 Dec 2018, 12:44.

      Comment


      • #4
        Originally posted by Rodrigo Badilla View Post
        Hi all,

        I am I am trying to replicate some results from book: Logistic Regression Models. Hilbe J, 2009. Pages: 202-204. Using titanic dataset.
        Lamentably my results are not the same that book results.

        Below I attached some results from this book and my resullts.

        Please any comments I would grateful.
        Code:
        
        . logit sur i.sex##i.class, nolog
        
        Logistic regression Number of obs = 1316
        LR chi2(5) = 515.16
        Prob > chi2 = 0.0000
        Log likelihood = -615.79775 Pseudo R2 = 0.2949
        
        --------------------------------------------------------------------------------
        survived | Coef. Std. Err. z P>|z| [95% Conf. Interval]
        ---------------+----------------------------------------------------------------
        sex |
        man | -4.206016 .5307502 -7.92 0.000 -5.246267 -3.165764
        |
        class |
        2nd class | -1.594815 .5871695 -2.72 0.007 -2.745646 -.4439844
        3rd class | -3.726095 .526913 -7.07 0.000 -4.758825 -2.693365
        |
        sex#class |
        man#2nd class | .4202889 .644876 0.65 0.515 -.8436449 1.684223
        man#3rd class | 2.801977 .5621158 4.98 0.000 1.70025 3.903703
        |
        _cons | 3.562466 .5070426 7.03 0.000 2.56868 4.556251
        -------------------------------------------------------------------------------
        Thanks for showing results in code delimiters. If you go over this set of results, your output appears identical to the output in the first attachment, except that they were using manually created interaction terms (which I guess was what they had to do in 2009; you correctly used factor variable syntax). For example, their term class2 appears to be the main effect of being in second class if one is female, and sc2 is the male*2nd class interaction effect.

        If I'm right that your output is identical (and I think I am), then any conclusions about the log odds of survival will be identical to the book's if you do the math correctly. To get the log odds of survival for men in 2nd class relative to the base (which I believe is women in 1st class), you just add the coefficients for male, 2nd class, and male*2nd class.

        However, that's not necessarily what your book was calculating. For the first equation you put a question mark over, the book is trying to show the odds ratio of 2nd class male passengers vs 1st class male passengers. So, you just add the coefficients on 2nd class and male*2nd class, then exponentiate.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Thanks Weiwen Ng and David Benson for your replies.

          But according my odds ratio table, conclusion could be :

          Among males, the odds of surviving are 1.522401 as great for the second class passenger as they are for fi rst-class passengers?

          In the book:

          exp(−3.726095 + 2.801977) = 0.39688131
          Among males, the odds of surviving are .4 as great for third-class pas -
          sengers as they are for fi rst-class passengers,
          Last edited by Rodrigo Badilla; 21 Dec 2018, 13:37.

          Comment


          • #6
            EDITED TO ADD:
            1. My post#3 crossed with your post #2 (where you provide a link to the titanic data).
            2. This post crossed with some edits you made to the 2nd half of post #5.


            So, I think the difference between some of the answers in your 2nd logit (where you specify -odds ratio-) and the calculations they give in the book (exponentiating the results from the log odds in the 1st logit output) is probably an issue of rounding.

            So, I'm pasting an image of your actual logit output (sorry for the image, but when I tried to use code delimiters or quote your output, it loses the tabs)
            Click image for larger version

Name:	image_12895.png
Views:	1
Size:	232.3 KB
ID:	1475960



            So, the coefficients in the 2nd table should just be exp(coeff) for the coefficients from the 1st. So the coefficient on "man"==-4.206016 and exp(-4.206016) = 0.014905634 which matches up nicely between the two tables.

            We saw in post#3 that the book has a typo in that even though the coefficient they get on man#2nd class is 0.4202889, they used 0.4203889 in their calculations (hence your difference).

            To use an example that didn't have a typo, but for which rounding might have been an issue, let's do the calculation for MEN x CLASS 3
            Click image for larger version

Name:	image_12896.png
Views:	1
Size:	16.7 KB
ID:	1475961



            So in the 2nd table (with OR), the table lists exp(-3.726095) = 0.0240867, and exp(2.801977) = 16.47718. So 7 decimal places in the 1st instance and 5 in the 2nd.
            But 0.0240867 * 16.47718 = 0.396880892 which is close to the answer the book gives, but not exactly the same (but rounded to 6 decimal places they would match to 0.396881).

            But in Excel if I do exp(-3.726095 + 2.801977) = 0.396881314. And that's because exp(-3.726095) = 0.024086711 and exp(2.801977) = 16.477189996 and 0.024086711 * 16.477189996 = 0.396881314.

            Hope that helps!
            Last edited by David Benson; 21 Dec 2018, 14:09.

            Comment


            • #7
              Originally posted by Rodrigo Badilla View Post
              Thanks Weiwen Ng and David Benson for your replies.

              But according my odds ratio table, conclusion could be :

              Among males, the odds of surviving are 1.522401 as great for the second class passenger as they are for fi rst-class passengers?

              In the book:

              exp(−3.726095 + 2.801977) = 0.39688131
              Among males, the odds of surviving are .4 as great for third-class pas -
              sengers as they are for fi rst-class passengers,
              exp(−1.594815 + .4203889*sex = 1)
              exp(−1.594815 + .4203889) = 0.30899626
              Among males, the odds of surviving are .3 times as great for second-class passengers as they are for first-class passengers,

              Comment


              • #8
                You just always have to remember what the omitted categories are. In this case the omitted category is 1st class female passengers.

                So in your question:
                exp(−1.594815 + .4203889*sex = 1)
                exp(−1.594815 + .4203889) = 0.30899626
                Among males, the odds of surviving are .3 times as great for second-class passengers as they are for first-class passengers,
                Yes, your statement is true. Although it's easier for me to understand when re-stated as "odds of surviving are .3 times as great for male second-class passengers as they are for male first-class passengers." Or alternatively, "male second-class passengers have approximately 1/3 the odds of surviving as male first-class passengers." (Yikes!)

                Comment


                • #9
                  Thanks David Benson for your time an replies!

                  I am clear thanks for your last explanation. I am not so clear with what I am doing wrong in try to replicate Books results with stata. But I have the weekend to check.

                  Thanks again.
                  Best

                  Comment


                  • #10
                    I understand that this is all being done for the specific purpose of replicating a textbook example.

                    I just want to point out that were this a real-world problem, the sensible way to do these calculations would be with the -margins- command, which saves you from the tedious and error-prone process of figuring out what coefficients to add and when to exponentiate. It is good to do the calculations a few times while you're learning to gain an understanding, but for production purposes, go straight to -margins-.

                    Comment


                    • #11
                      Hi Clyde,

                      I am agree with you, I did it and I worked with margins to get odds and odds ratio but gave me the same results (for odds ratio)

                      Code:
                      *to get odds:
                      
                      margins i.sex#i.class, expression(exp(predict(xb)))
                      
                      Expression   : exp(predict(xb))
                      
                      ----------------------------------------------------------------------------------
                                       |            Delta-method
                                       |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      -----------------+----------------------------------------------------------------
                             sex#class |
                      women#1st class  |      35.25   17.87325     1.97   0.049     .2190705    70.28093
                      women#2nd class  |   7.153846   2.118261     3.38   0.001     3.002132    11.30556
                      women#3rd class  |   .8490566      .1217     6.98   0.000      .610529    1.087584
                        man#1st class  |   .5254237   .0824155     6.38   0.000     .3638922    .6869552
                        man#2nd class  |   .1623377   .0350038     4.64   0.000     .0937314    .2309439
                        man#3rd class  |   .2085308   .0244376     8.53   0.000     .1606341    .2564275
                      ----------------------------------------------------------------------------------
                      
                      *to get odds ratio:
                      
                      . mat b = r(b)
                      
                      . scalar base = b[1,1]
                      
                      . margins i.sex#i.class, expression((exp(predict(xb))/base))
                      
                      Adjusted predictions                              Number of obs   =       1316
                      Model VCE    : OIM
                      
                      Expression   : (exp(predict(xb))/base)
                      
                      ----------------------------------------------------------------------------------
                                       |            Delta-method
                                       |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      -----------------+----------------------------------------------------------------
                             sex#class |
                      women#1st class  |          1   .5070426     1.97   0.049     .0062148    1.993785
                      women#2nd class  |    .202946   .0600925     3.38   0.001     .0851669    .3207251
                      women#3rd class  |   .0240867   .0034525     6.98   0.000       .01732    .0308535
                        man#1st class  |   .0149056    .002338     6.38   0.000     .0103232    .0194881
                        man#2nd class  |   .0046053    .000993     4.64   0.000      .002659    .0065516
                        man#3rd class  |   .0059158   .0006933     8.53   0.000      .004557    .0072745
                      ----------------------------------------------------------------------------------
                      
                      .

                      Comment


                      • #12
                        Finally I could get the same results that Hilbe's book, only changing the base level of comparation.

                        Thanks to everyone for your replies, my last table is:

                        Code:
                        . logit sur  b1.sex##i.class, or nolog
                        
                        Logistic regression                               Number of obs   =       1316
                                                                          LR chi2(5)      =     515.16
                                                                          Prob > chi2     =     0.0000
                        Log likelihood = -615.79775                       Pseudo R2       =     0.2949
                        
                        ----------------------------------------------------------------------------------
                                survived | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -----------------+----------------------------------------------------------------
                                     sex |
                                  women  |   67.08871   35.60735     7.92   0.000     23.70686    189.8562
                                         |
                                   class |
                              2nd class  |   .3089652   .0823826    -4.40   0.000     .1832082    .5210439
                              3rd class  |   .3968812   .0777086    -4.72   0.000     .2703939    .5825379
                                         |
                               sex#class |
                        women#2nd class  |   .6568571   .4235914    -0.65   0.515     .1855886    2.324825
                        women#3rd class  |     .06069   .0341148    -4.98   0.000     .0201671    .1826379
                                         |
                                   _cons |   .5254237   .0824155    -4.10   0.000     .3863618    .7145378
                        ----------------------------------------------------------------------------------

                        Comment


                        • #13
                          Rodrigo,

                          Not to add more work to you, but could you upload the .dta or .csv file that you used for your calculations? (I went to the textbook site that you listed, but didn't want to sift through all the zip files). I am always interested in calculating / understanding the relative risk ratio's and how they compare to the (far less intuitive to me) odds and odds ratios.

                          My data below won't match yours, because the file I listed in post#3 only had info on 887 passengers, not the 1316 in your dataset. But in the table I show below I break out the survival rate by sex and passenger class. The way to read it is that 96.8% of women in first-class survived (91/94) , 92.1% in 2nd class survived (70/76), and 50% in 3rd class survived (72/144), leading to a 74.2% overall survival rate for women (vs 19.0% overall survival rate for men). (233/314 women survived vs 109/573 men).

                          So the relative risk of survival for women vs men is (74.2% / 19.0%) = 3.9 This is how I think most people interpret odds-ratios (even though they are not the same. See below)

                          Code:
                          . table pclass male , c(n survived mean survived) row col
                          
                          -------------------------------------
                                    |           male          
                             Pclass |       0        1    Total
                          ----------+--------------------------
                                  1 |      94      122      216
                                    | .968085  .368852   .62963
                                    |
                                  2 |      76      108      184
                                    | .921053  .157407  .472826
                                    |
                                  3 |     144      343      487
                                    |      .5  .137026  .244353
                                    |
                              Total |     314      573      887
                                    | .742038  .190227  .385569
                          -------------------------------------
                          
                          . table pclass sex lived, c(n age) row col scol
                          
                          --------------------------------------------------------------------------------------
                                    |                               lived and Sex                              
                                    | -------- Died --------    ------ Survived ------    -------- Total -------
                             Pclass | female    male   Total    female    male   Total    female    male   Total
                          ----------+---------------------------------------------------------------------------
                                  1 |      3      77      80        91      45     136        94     122     216
                                  2 |      6      91      97        70      17      87        76     108     184
                                  3 |     72     296     368        72      47     119       144     343     487
                                    |
                              Total |     81     464     545       233     109     342       314     573     887
                          --------------------------------------------------------------------------------------

                          However, to get the odds ratio, we would just use the summary table below (it's easily derivable from the info above)
                          Code:
                          . tabulate sex lived
                          
                                     |         lived
                                 Sex |      Died   Survived |     Total
                          -----------+----------------------+----------
                              female |        81        233 |       314
                                male |       464        109 |       573
                          -----------+----------------------+----------
                               Total |       545        342 |       887
                          Calculating odds of survival, and odds ratio of survival for women vs men
                          Odds of survival = p / (1 - p); where p = probability of survival

                          Odds of survival for women = [ (233 / 314) / (81 / 314)] = 74.2% / 25.8% which also simplifies to 233 / 81 = 2.8765 because 2.8765 as many women lived as died. (NOTE: If this were a horse race and the horse had a 75% chance of winning, we would say the horse had odds of 3:1 (75% / 25%) of winning.)

                          Odds of survival for men = [ (109 / 573) / (464 / 573)] = 19.02% / 80.98% which also simplifies to 109 / 464 = 0.2349 because 0.2349 as many men lived as died (if you do 464 / 109 or 1 / 0.2349, you get 4.2569 because 4.26 as many men died as lived). (Using the horse race example again, if the horse had a 20% chance of winning, we would say the horse had 1:4 odds of winning, or the odds were 4:1 against him winning.)

                          Odds ratio of survival for women vs men = 2.8765 / 0.2349 = 12.246
                          Last edited by David Benson; 22 Dec 2018, 18:08.

                          Comment


                          • #14
                            Sorry for the long post above. It was mostly me talking "out loud" for my own understanding. If you Google "Odds ratio" you will see a lot of posts on "The difference between relative risk and odds ratios" as well as "How do i interpret odds ratios in logistic regression." For example, see here.

                            I've often wondered if we end up reporting odds ratios in journals (especially medical journals) is because that is what logistic regression reports.
                            Last edited by David Benson; 22 Dec 2018, 18:20.

                            Comment


                            • #15
                              I've often wondered if we end up reporting odds ratios in journals (especially medical journals) is because that is what logistic regression reports.
                              While the fact that logistic regressions lend themselves to reporting odds ratios is surely part of the reason, it is worth noting that in, for example, case control study designs, whether analyzed by logistic regression or just with contingency tables, the odds ratio is estimable but the risk ratio is not. Case-control designs, for all their weaknesses, play an important role in clinical research because they are relatively quick and inexpensive to carry out, and for rare diseases are often the only approach feasible in the real world.

                              And although it doesn't play that large a role in medical research, odds ratios also arise naturally in some aspects of Bayesian analysis.

                              In my opinion, the difference between reporting odds ratios and risk ratios is overblown. It isn't clear to me that probabilities are any more intuitive than odds. So it isn't clear to me that odds ratios are any less intuitive than risk (probability) ratios. Yes, they are different, and you have to be clear which you are talking about. It's distressing when an odds ratio of 3 is misinterpreted as three times the risk. But properly used, both are equally informative. In fact, risk ratios have some very unintuitive properties of their own. For example, if the baseline risk is, say 20%, then the maximum possible risk ratio is 5. By contrast, the range of odds ratios goes from 0 to infinity regardless of the baseline odds.

                              People who are conversant with statistics usually adapt easily enough to working with both, using either depending on which best suits the purpose at hand, and translating from one to the other easily (in those situations where both are identifiable).

                              Comment

                              Working...
                              X