Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression and interaction OR

    Hi there,

    I am trying to replicate an analysis which was conducted by a colleague in SPSS. I don't seem to be able to reproduce it in Stata and I am hoping you may be able to help.

    The model is a logisic regression with the outcome regressed onto an interaction of 2 main predictors (each variable indicates whether the participant received an intervention; 0=no vs. 1=yes) adjusted for 3 binary covariates.

    I set the code to be
    Code:
    logistic y i.x1#i.x2 i.cv1 i.cv2 i.cv3
    SPSS produces an OR with 95% confidence intervals, while Stata only shows the OR for the following, which I am assuming are contrasts vs. the reference 0 0:
    0 1
    1 0
    1 1

    I have a number of questions:

    How can I obtain an OR with 95%CI?

    How can I estimate the prevalence of the the outcome for the groups 0 0 vs. 1 1 (so comparing those who did not receive intervention any intervention vs. those who received both)?

    Lastly, I am not sure why the two main effects were not included - is this for what I want to do?

    Thanks in advance!

  • #2
    The syntax you very likely want is

    Code:
    logistic y i.x1##i.x2 i.cv1 i.cv2 i.cv3
    Note two # operators. This tells Stata to expand the interaction into its main effects (i.x1 and i.x2) as well as the interation. A single # implies only the interaction term.

    Comment


    • #3
      Thanks Leonardo Guizzetti ; I appreciate that entering the ## operators solve the problem and includes the two main effects in the model; however, in the SPSS script written by my ex-colleague only the interaction was entered and I would like to reproduce this in Stata. Can you advice on how if only the interaction term is entered I can estimate the relevant OR and 95%CI?

      Comment


      • #4
        I am not an SPSS user, so unless you can show output, I cannot interpret what it does model. If it's just an issue of seeing beta coefficients (log-odds scale) and not odds ratios, you can ask Stata to show you the ORs by running -logit, or- after estimating the model.

        The problem with using just the interaction term is that you end up estimating so-called cell means. That is, the mean odds for each combination of levels involved in the interaction. When modelled this way, you no longer get an interaction term in your list of coefficients. To model the interaction, you need the main effects included in the model, and then your coefficient of interaction, for example, will be the one labelled 1.x1#1.x2. You can derive the interaction from the cell means model by considering differences in differences. That is, the result of (0.x1#0.x2 - 0.x1#1.x2) - (1.x1#0.x2 - 1.x1#1.x2).

        You can see a worked example here. If you still need more help, then I suggest you post the output from the SPSS model and your own Stata models run both ways.

        Comment


        • #5
          Thanks again Leonardo Guizzetti . I had another look at my Stata code and I found that I estimated the same OR using your suggested approach which includes i.x1##i.x2

          Code:
          logit y i.x1##i.x2 i.cv1 i.cv2 i.cv
          as well as using

          Code:
           
           logistic y ibn.x1#ibn.x2 i.cv1 i.cv2 i.cv3, nocons lincom (1.x1#1.x2-1.x1#0.x2) - (0.x1#1.x2-0.x1#0.x2), or
          However, the OR I estimate in Stata is different from what I get from SPSS, which is why I assume something is wrong in my Stata code.

          This is the SPSS syntax

          Code:
          LOGISTIC REGRESSION VARIABLES y
          /METHOD=ENTER x1*x2 sex status pass
          /CONTRAST (x1)=indicator(1)
          /CONTRAST (x2)=indicator(1)
          /CONTRAST (sex)=indicator(1)
          /CONTRAST (status)=indicator(1)
          /CONTRAST (pass)=indicator(1)
          /PRINT=GOODFIT CI(95)
          /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).
          Any suggestion much appreciated!

          Comment


          • #6
            Like I said, I'm not a user of SPSS, so posting syntax isn't helpful. There are other users of the forum that also use SPSS that may be able to help.

            Can you post the model results from SPSS and the OR you seem to want to re-create? Then can you also include the logit results from Stata? If not, I cannot help any further.

            Comment


            • #7
              Hello Jen Ward. When including a product term in a model, it is conventional to include all of the lower-order components of that product term. From that point of view, your colleague should have included the first-order terms for x1 and x2 in the model:

              Code:
              LOGISTIC REGRESSION VARIABLES y
              /METHOD=ENTER x1 x2 x1*x2 sex status pass
              /CONTRAST (x1)=indicator(1)
              /CONTRAST (x2)=indicator(1)
              /CONTRAST (sex)=indicator(1)
              /CONTRAST (status)=indicator(1)
              /CONTRAST (pass)=indicator(1)
              /PRINT=GOODFIT CI(95)
              /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

              Having said that, you may find the examples here instructive: HTH.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Thanks for all the replies. I am not sure I can post the original output...

                Bruce - following your comment, I decided to 'question' the SPSS syntax and when I include the main effects, the estimated OR is in line with the one I estimate from Stata using the two approaches above.

                This suggests that SPSS is doing something different when no main effects are included although I am not sure what.

                Comment


                • #9
                  If you are trying to mimic your colleague's SPSS results, I believe you need to do this:

                  Code:
                  logit y x1#x2 i.sex i.status i.pass
                  logit, or // replay the model and display the odds ratios
                  If sex, status and pass all have 0/1 coding, you could omit the i. prefix if you like.

                  But as I said earlier, it is conventional to include the lower order terms (x1 and x2) in the model.
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Version: Stata/MP 18.5 (Windows)

                  Comment


                  • #10
                    Thanks again Bruce Weaver , the SPSS output estimates an OR of 1.65 for their model but when I use my approach Stata estimates an OR = 1.93; this is why I am unsure what is going on.

                    I used your approach above but again, Stata estimates ORs for 0 1, 1 0, and 1 1 and using lincom afterwards, still estimates OR = 1.93; which is also the OR I obtain from SPSS when I specify the full model.

                    Unfortunately I cannot share the data or the code so I appreciate it is hard to know what's happening.

                    Comment


                    • #11
                      Can you use one of the datasets that comes with Stata to generate an example that is structurally the same as your model? For example, is the model I estimate below structurally the same as your model?

                      Code:
                      clear *
                      webuse lbw
                      
                      * Generate a couple dichotomous variables to make the
                      * data a better match for Jen's problem
                      tabulate race
                      generate byte nonwhite = race > 1 if ~missing(race)
                      tabulate race nonwhite
                      
                      tabulate ftv
                      generate byte anyftv = ftv > 0 if ~missing(ftv)
                      tabulate ftv anyftv
                      
                      * Q. Is the following model similar in structure to
                      *    the model your colleage estimated using SPSS?
                      
                      logit low nonwhite#anyftv smoke ht ui
                      logit, or
                      --
                      Bruce Weaver
                      Email: [email protected]
                      Version: Stata/MP 18.5 (Windows)

                      Comment


                      • #12
                        Jen, here are some more examples you can play around with. As they show, you can parameterize the model in various ways but still get the same overall model Chi2 and the same fitted values.

                        Cheers,
                        Bruce

                        Code:
                        clear *
                        webuse lbw
                        
                        * Generate a couple dichotomous variables to make the
                        * data a better match for Jen's problem
                        tabulate race
                        generate byte nonwhite = race > 1 if ~missing(race)
                        tabulate race nonwhite
                        
                        tabulate ftv
                        generate byte anyftv = ftv > 0 if ~missing(ftv)
                        tabulate ftv anyftv
                        
                        * Model 1: Include the lower order terms for
                        * variables involved in the interaction
                        
                        logit low nonwhite##anyftv smoke ht ui
                        estimates store m1
                        * Save the fitted values (log-odds) as xb1
                        predict double xb1, xb
                        label variable xb1 "Log-odds for Model 1"
                        
                        * Model 2: EXCLUDE the lower order terms for
                        * variables involved in the interaction--this
                        * is what Jen's colleague did using SPSS  
                        
                        logit low nonwhite#anyftv smoke ht ui
                        estimates store m2
                        * Save the fitted values (log-odds) as xb2
                        predict double xb2, xb
                        label variable xb2 "Log-odds for Model 2"
                        
                        * Model 3: Combine the interacting variables into
                        * a single variale with 4 categories.
                        generate byte onevar = 2*nonwhite+anyftv
                        * Check that it worked
                        tabulate nonwhite anyftv
                        tab3way onevar nonwhite anyftv
                        
                        logit low i.onevar smoke ht ui
                        estimates store m3
                        * Save the fitted values (log-odds) as xb3
                        predict double xb3, xb
                        label variable xb3 "Log-odds for Model 3"
                        
                        generate double diff12 = xb1 - xb2
                        generate double diff13 = xb1 - xb3
                        generate double diff23 = xb2 - xb3
                        
                        summarize xb* diff*, sep(3)
                        
                        * The model Chi-square tests for all models are the same,
                        * and the fitted values from the 3 models are the same.
                        * So, you can generate the same fitted value comparisons
                        * using any of these models, and therefore, the same ORs.
                        --
                        Bruce Weaver
                        Email: [email protected]
                        Version: Stata/MP 18.5 (Windows)

                        Comment

                        Working...
                        X