Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Confusion of Odds Ratio (comparison of OR from epidemiology table and logreg)

    Hello.

    Suppose that I have a set of data where I want to know the association between Sex and Social Media Disorder (SMD). My codebook for Sex and SMD is this way:
    Coding Variable
    0 Female
    1 Male
    0 SMD
    1 Not SMD
    and my epidemiology, 2x2 table looks like this one:
    Sex SMD NOT SMD
    Female 90 198
    Male 38 93
    The odds ratio should be: Female/Male= 1.112, right? (manually calculated), meaning that the odds for female to have SMD is 1.112 times higher than male

    However, when I try to use logistic regression under this command, I found that something is not right
    Code:
    logistic RECODE_KATSMD ib1.SEX
    where RECODE_KATSMD is my variable for SMD

    which yields the result as below:
    Code:
    logistic RECODE_KATSMD ib1.SEX
    
    Logistic regression        Number of obs     =    419
            LR chi2(1)        =    0.21
            Prob > chi2       =    0.6432
    Log likelihood =  -257.7637        Pseudo R2         =    0.0004
    
                
    RECODE_KATSMD  Odds Ratio   Std. Err.    z    P>z     [95% Conf.    Interval]
                
    SEX 
    Perempuan     .8989249   .2073968    -0.46    0.644     .5719225    1.412894
    _cons       2.447368     .471196       4.65    0.000     1.678093    3.569296
                
    Note: _cons estimates baseline odds.
    Which is a little bit confusing because I use male as a references, but the odds ratio for female is 0.898 (different than the one I calculated manually from the epidemiology table)

    I try to do:
    Code:
    adjust, by(SEX) exp
    it returns:
    Code:
       
    SEX          exp(xb)
        
    Female       2.2
    Male        2.44737
        
    Key:    exp(xb)  =    exp(xb)
    meaning that the odds for women= 2.2, and men= 2.447 and when i calculate odds ratio female/male= 2.22/2.44737 is 0.898, which is also different from the OR we calculated manually before

    I am now confused and feeling dumb.

    Thank you very much
    Last edited by Zhianni Yang; 20 Feb 2024, 07:28.

  • #2
    I agree with the hand calculation, the output is not matching your calculation. However, there isn't enough Stata code for me to decide if you have coded something incorrectly. But the code is where I'd scrutinize.

    Check this yourself.

    Code:
    clear
    input nosmd male freq
    0 1 38
    1 1 93
    0 0 90
    1 0 198
    end
    generate smd = 1 - nosmd
    
    label define l_male 1 "Male" 0 "Female"
    label define l_nosmd 1 "No SMD" 0 "Yes SMD"
    label define l_smd 1 "Yes SMD" 0 "No SMD"
    
    foreach x in male nosmd smd{
        label values `x' l_`x'
    }
    
    * NOSMD as dependent
    * Female to male
    logistic nosmd ib1.male [fweight=freq], base
    
    * Male to female
    logistic nosmd ib0.male [fweight=freq], base
    
    
    * SMD as dependent
    * Female to male
    logistic smd ib1.male [fweight=freq], base
    
    * Male to female
    logistic smd ib0.male [fweight=freq], base
    My guess is probably due to sloppy recoding when creating that "RECODE_KATSMD", you didn't show that part of the code, so I can't be sure.

    There are a few tips:
    • Do not play mental gymnastics with yourself. If you mean for SMD yes to be the outcome, code that as 1; not 0. Do not even entertain the idea of having a "no smd" variable where no = 1 and yes = 0. Those double-double negative can be extremely confusing.
    • Don't name your variable "sex". Instead, call it either male (male = 1 and female = 0) or female (female = 1 and male = 0). That way you will not get confused.
    • Use label scheme, and then use the ", base" option in regression command to also print the reference group.
    • Beware if you have implemented any self-looping code. Such as:
    Code:
    generate nosmd = smd
    replace nosmd = 1 - nosmd
    In this case, if the analyst carelessly just ran the second line again, it'd flip the coding by accident. Use more secure recoding that wouldn't loop back to itself, such as:

    Code:
    recode smd (1 = 0)(0 = 1), gen(nosmd)
    * OR:
    generate nosmd = 1 - smd
    Last edited by Ken Chui; 20 Feb 2024, 08:40.

    Comment


    • #3
      I think the problem is not with your predictor, but with your response. You have "1" corresponding to NOT SMD, so -logistic- is modeling the log odds for "not SMD" instead of the log odds of "SMD". That's my guess in any case. -- P

      Comment


      • #4
        Hello Ken and Paul, thanks for noticing. I tried to recode again for the variable SMD as follows:
        Code:
        recode 1=0 0=1
        and that gives me a codebook of:
        Code:
                                   128         0  NOT SMD
                                   291         1  SMD
        I tried to run logreg again, but the difference does not exist between my previous and current code:
        Code:
        logistic RECODE_KATSMD ib1.Male, base
        
        Logistic regression                             Number of obs     =        419
                                                        LR chi2(1)        =       0.21
                                                        Prob > chi2       =     0.6432
        Log likelihood =  -257.7637                     Pseudo R2         =     0.0004
        
        ------------------------------------------------------------------------------
        RECODE_KAT~D | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                Male |
         Female  |   .8989249   .2073968    -0.46   0.644     .5719225    1.412894
          Male  |          1  (base)
                     |
               _cons |   2.447368    .471196     4.65   0.000     1.678093    3.569296
        ------------------------------------------------------------------------------
        Note: _cons estimates baseline odds.
        Which also, gives no differences.

        For RECODE_KATSMD, I recode that one in SPSS and move it to STATA, so I dont us coding

        Comment


        • #5
          I just run some code and I thought the problem was that STATA use (0) as the baseline for dependen variables

          I tried to run this:
          Code:
          mlogit RECODE_KATSMD ib1.Male, base baseoutcome(1)
          
          Iteration 0:   log likelihood = -257.87097  
          Iteration 1:   log likelihood = -257.76372  
          Iteration 2:   log likelihood =  -257.7637  
          
          Multinomial logistic regression                 Number of obs     =    419
          LR chi2(1)        =    0.21
          Prob > chi2       =    0.6432
          Log likelihood =  -257.7637                     Pseudo R2         =    0.0004
          
              
          RECODE_KAT~D       Coef.   Std. Err.      z    P>z     [95% Conf.    Interval]
              
          NOT_SMD      
          Male 
          Perempuan     .1065557   .2307165     0.46   0.644    -.3456402    .5587517
          Laki-Laki            0  (base)
          
          _cons   -.8950131   .1925317    -4.65   0.000    -1.272368    -.5176578
              
          SMD            (base outcome)
          and later I run again my code

          Code:
          logistic KAT_SMD ib1.Male, base
          
          Logistic regression        Number of obs     =    419
                  LR chi2(1)        =    0.21
                  Prob > chi2       =    0.6432
          Log likelihood =  -257.7637        Pseudo R2         =    0.0004
          
                      
          KAT_SMD  Odds Ratio   Std. Err.    z    P>z     [95% Conf.    Interval]
                      
          Male 
          Perempuan      1.11244   .2566582    0.46    0.644     .7077671    1.748489
          Laki-Laki            1  (base)
          
          _cons    .4086022   .0786689    -4.65    0.000     .2801673    .5959147
                      
          Note: _cons estimates baseline odds.

          Comment


          • #6
            Edit: now i tried to close my stata and run the code again and it returns to the previous one. I feel like I am getting deceived

            Comment


            • #7
              everyone sorry for causing confusion. After consulting to my fellow researcher I was so dumb and did a mistake in my codification of Sex, hence explain why the hand calculation does not make sense with my stata output. I learnt this the hardest way. Thank you everyone for helping (@Paul, @Ken).

              Comment

              Working...
              X