Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unexplainable large Odds ratio in logistic regression

    Hi,

    when I run a logistic regression one of my control variables, the return on asset of a company, has an insanely large Odds ratio (I omitted the rest of the regression output and will post it upon request.)


    Code:
    --------------------------------------------------------------------------------------------------------------------------------------------
                                                                               |               Robust
                                                             AcquirerInitiated | Odds ratio   std. err.      z    P>|z|     [95% conf. interval]
    ---------------------------------------------------------------------------+----------------------------------------------------------------
                                                                   Acq_ROA_WWU |   650097.5    4809464     1.81   0.070     .3278942    1.29e+12

    Code:
    . sum Acq_ROA_WWU if !missing(p2) //The variable p2 tags the observations that were considered in the regression
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
     Acq_ROA_WWU |        265    .0452223    .0816598  -.3572705    .244479
    So far, I did not see any issues that may cause this behavior. I included Acq_ROA_WWU as a control in other regression which doesn't cause any problems. Is there some obvious reason for this?
    I tried to keep the post as simple as possible without getting lost into details. Please let me know if you need more specific information on my model.

    Thanks.


  • #2
    the OR is for a 1 unit difference in the value of the predictor/control; however, that is impossible given your data which, as you shows, only ranges from -0.36 to +0.24 - try re-scaling your predictor in a fashion that makes substantive sense (maybe multiply by 100?????)

    Comment


    • #3
      Thanks for the helpful advice, Rich

      Comment


      • #4
        Another possibility:

        Recalling that 2x2 table example where we learned how to come up with the simplest odds ratio (OR):
        Outcome yes Outcome no
        Exposure yes A B
        Exposure no C D
        The OR is computed as AD / BC. When B or C approaches 0, the OR became huge. In fact, when both B and C are zero (aka, the exposure is 100% associated with the outcome), OR is not defined. It's kind of an interesting dilemma with OR.

        This can also happen to continuous variable, known as "complete separation". Somehow in some sub-group the outcome probably got nearly perfectly predicted, either due to i) the exposure was indeed so apparently effective and essential, or ii) small cell counts.

        I could be check by plotting the outcome 1/0 against the variable to see if there is a good "overlap" of the two parallel lines. If there is a big gap, then it could be the issue. Generally, if it is not the OR of your main interest, it may not need to be remediated. It could just mean that this particular predictor (alone, or when present with some other predictors) may not be very "helpful" because it's doing the job too well.
        Last edited by Ken Chui; 01 Sep 2022, 06:38.

        Comment

        Working...
        X