Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting the intercept in a logistic regression with multiple factor variables

    Hi all,

    I am looking to calculate the odds ratios in a logistic regression where the dependent variable is a dummy for digital financial service usage. I am using multiple independent variables, including 3 continuous variables, 2 dummy variables, and 2 factor variables. I'm not totally sure how to interpret the constant in the output, as both factor variables have omitted a category. The odds ratio outputs are:

    logistic digital income bankacct sendfreq sendtime rec_bankacct i.recepient i.birthplace

    digital | Odds Ratio
    --------------+------------
    income | 3.412442
    bankacct| 1.67583
    sendfreq| .3388592
    sendtime| .3928593
    rec_bankacct | 5.432351
    _Irecipient_1 | .3849204
    _Irecipient_2 | 1.121452
    _Irecipient_3 | 1.668443
    Ibirthplace_2 | .8439282
    Ibirthplace_3 | .6452834
    Ibirthplace_4 | 1.152431
    _cons| .006926
    ---------------------------


    In this case, would the _cons variable be interpreted as one of the omitted factor variables? And if so, can I still interpret the other variables as is? For example, can the bankacct dummy variable (0=no account, 1=account), be interpreted as: having a bank account increases the odds of using digital financial services by ~68% (OR=1.675) as compared to having no account? Or, would it be ~68% as compared to having no account and (the characteristic of the omitted factor variable).

    Hopefully this makes sense, I am happy to clarify if not. Thank you in advance, Statalist!

    -DB



  • #2
    I am concerned that the output you presented looks like the dummies were generated with the -xi:- prefix (i.e. interaction expansion), but you didn't refer to this in your post. That said, that should make a difference here.

    I'm not sure what you mean by

    would the _cons variable be interpreted as one of the omitted factor variables?
    To be clear, with factor variables, you always omit one category, so the difference in the log odds or odds ratios should always be interpreted as relative to the base category. Then, the constant in the output is the log odds of the outcome with all factor variables at the base category and all continuous variables at 0.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      As Weiwen notes, the intercept is the predicted value (in this case, of the log odds) when all other independent variables = 0. That is often an impossible value (e.g. nobody has a score of 0 on a scale that runs from 400 to 1200) so people usually don't pay much attention to it.

      Put another way if you have race (1 = white, 0 = nonwhite) and gender (1 = male, 0 = female) in the model the constant would be the expected value for a nonwhite female who had a score of 0 on all the other variables in the model.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Thanks so much to you both!!

        Weiwen, I indeed used the -xi- prefix, which appears not to have made it when I pasted the command in the post. Apologies for any confusion that this may have caused!



        Comment


        • #5
          Originally posted by DB Sobol View Post
          Thanks so much to you both!!

          Weiwen, I indeed used the -xi- prefix, which appears not to have made it when I pasted the command in the post. Apologies for any confusion that this may have caused!


          No big deal. I intended to say that the -xi- prefix is not needed on almost all estimation commands. Logistic is no exception. You can use the factor variable syntax directly.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment

          Working...
          X