Logistic regression and interaction OR

Jen Ward

Join Date: Apr 2021

Posts: 68
#1

Logistic regression and interaction OR

12 Oct 2023, 10:53

Hi there,

I am trying to replicate an analysis which was conducted by a colleague in SPSS. I don't seem to be able to reproduce it in Stata and I am hoping you may be able to help.

The model is a logisic regression with the outcome regressed onto an interaction of 2 main predictors (each variable indicates whether the participant received an intervention; 0=no vs. 1=yes) adjusted for 3 binary covariates.

I set the code to be

Code:

logistic y i.x1#i.x2 i.cv1 i.cv2 i.cv3

SPSS produces an OR with 95% confidence intervals, while Stata only shows the OR for the following, which I am assuming are contrasts vs. the reference 0 0:
0 1
1 0
1 1

I have a number of questions:

How can I obtain an OR with 95%CI?

How can I estimate the prevalence of the the outcome for the groups 0 0 vs. 1 1 (so comparing those who did not receive intervention any intervention vs. those who received both)?

Lastly, I am not sure why the two main effects were not included - is this for what I want to do?

Thanks in advance!
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#2

12 Oct 2023, 10:55

The syntax you very likely want is

Code:

logistic y i.x1##i.x2 i.cv1 i.cv2 i.cv3

Note two # operators. This tells Stata to expand the interaction into its main effects (i.x1 and i.x2) as well as the interation. A single # implies only the interaction term.
1 like
Comment
Jen Ward

Join Date: Apr 2021

Posts: 68
#3

13 Oct 2023, 03:38

Thanks Leonardo Guizzetti ; I appreciate that entering the ## operators solve the problem and includes the two main effects in the model; however, in the SPSS script written by my ex-colleague only the interaction was entered and I would like to reproduce this in Stata. Can you advice on how if only the interaction term is entered I can estimate the relevant OR and 95%CI?
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#4

13 Oct 2023, 07:23

I am not an SPSS user, so unless you can show output, I cannot interpret what it does model. If it's just an issue of seeing beta coefficients (log-odds scale) and not odds ratios, you can ask Stata to show you the ORs by running -logit, or- after estimating the model.

The problem with using just the interaction term is that you end up estimating so-called cell means. That is, the mean odds for each combination of levels involved in the interaction. When modelled this way, you no longer get an interaction term in your list of coefficients. To model the interaction, you need the main effects included in the model, and then your coefficient of interaction, for example, will be the one labelled 1.x1#1.x2. You can derive the interaction from the cell means model by considering differences in differences. That is, the result of (0.x1#0.x2 - 0.x1#1.x2) - (1.x1#0.x2 - 1.x1#1.x2).

You can see a worked example here. If you still need more help, then I suggest you post the output from the SPSS model and your own Stata models run both ways.
Comment
Jen Ward

Join Date: Apr 2021

Posts: 68
#5

13 Oct 2023, 07:43

Thanks again Leonardo Guizzetti . I had another look at my Stata code and I found that I estimated the same OR using your suggested approach which includes i.x1##i.x2

Code:

logit y i.x1##i.x2 i.cv1 i.cv2 i.cv

as well as using

Code:

logistic y ibn.x1#ibn.x2 i.cv1 i.cv2 i.cv3, nocons lincom (1.x1#1.x2-1.x1#0.x2) - (0.x1#1.x2-0.x1#0.x2), or

However, the OR I estimate in Stata is different from what I get from SPSS, which is why I assume something is wrong in my Stata code.

This is the SPSS syntax

Code:

LOGISTIC REGRESSION VARIABLES y /METHOD=ENTER x1*x2 sex status pass /CONTRAST (x1)=indicator(1) /CONTRAST (x2)=indicator(1) /CONTRAST (sex)=indicator(1) /CONTRAST (status)=indicator(1) /CONTRAST (pass)=indicator(1) /PRINT=GOODFIT CI(95) /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

Any suggestion much appreciated!
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#6

13 Oct 2023, 07:59

Like I said, I'm not a user of SPSS, so posting syntax isn't helpful. There are other users of the forum that also use SPSS that may be able to help.

Can you post the model results from SPSS and the OR you seem to want to re-create? Then can you also include the logit results from Stata? If not, I cannot help any further.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1132
#7

13 Oct 2023, 08:02

Hello Jen Ward. When including a product term in a model, it is conventional to include all of the lower-order components of that product term. From that point of view, your colleague should have included the first-order terms for x1 and x2 in the model:

Code:

LOGISTIC REGRESSION VARIABLES y /METHOD=ENTER x1 x2 x1*x2 sex status pass /CONTRAST (x1)=indicator(1) /CONTRAST (x2)=indicator(1) /CONTRAST (sex)=indicator(1) /CONTRAST (status)=indicator(1) /CONTRAST (pass)=indicator(1) /PRINT=GOODFIT CI(95) /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

Having said that, you may find the examples here instructive:
https://stats.oarc.ucla.edu/stata/fa...n-interaction/

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Jen Ward

Join Date: Apr 2021

Posts: 68
#8

13 Oct 2023, 08:32

Thanks for all the replies. I am not sure I can post the original output...

Bruce - following your comment, I decided to 'question' the SPSS syntax and when I include the main effects, the estimated OR is in line with the one I estimate from Stata using the two approaches above.

This suggests that SPSS is doing something different when no main effects are included although I am not sure what.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1132
#9

13 Oct 2023, 10:17

If you are trying to mimic your colleague's SPSS results, I believe you need to do this:

Code:

logit y x1#x2 i.sex i.status i.pass logit, or // replay the model and display the odds ratios

If sex, status and pass all have 0/1 coding, you could omit the i. prefix if you like.

But as I said earlier, it is conventional to include the lower order terms (x1 and x2) in the model.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Jen Ward

Join Date: Apr 2021

Posts: 68
#10

13 Oct 2023, 10:42

Thanks again Bruce Weaver , the SPSS output estimates an OR of 1.65 for their model but when I use my approach Stata estimates an OR = 1.93; this is why I am unsure what is going on.

I used your approach above but again, Stata estimates ORs for 0 1, 1 0, and 1 1 and using lincom afterwards, still estimates OR = 1.93; which is also the OR I obtain from SPSS when I specify the full model.

Unfortunately I cannot share the data or the code so I appreciate it is hard to know what's happening.
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1132

#11

13 Oct 2023, 13:45

Can you use one of the datasets that comes with Stata to generate an example that is structurally the same as your model? For example, is the model I estimate below structurally the same as your model?

Code:

clear *
webuse lbw

* Generate a couple dichotomous variables to make the
* data a better match for Jen's problem
tabulate race
generate byte nonwhite = race > 1 if ~missing(race)
tabulate race nonwhite

tabulate ftv
generate byte anyftv = ftv > 0 if ~missing(ftv)
tabulate ftv anyftv

* Q. Is the following model similar in structure to
*    the model your colleage estimated using SPSS?

logit low nonwhite#anyftv smoke ht ui
logit, or

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Bruce Weaver

Join Date: May 2014
Posts: 1132

#12

13 Oct 2023, 16:25

Jen, here are some more examples you can play around with. As they show, you can parameterize the model in various ways but still get the same overall model Chi2 and the same fitted values.

Cheers,
Bruce

Code:

clear *
webuse lbw

* Generate a couple dichotomous variables to make the
* data a better match for Jen's problem
tabulate race
generate byte nonwhite = race > 1 if ~missing(race)
tabulate race nonwhite

tabulate ftv
generate byte anyftv = ftv > 0 if ~missing(ftv)
tabulate ftv anyftv

* Model 1: Include the lower order terms for
* variables involved in the interaction

logit low nonwhite##anyftv smoke ht ui
estimates store m1
* Save the fitted values (log-odds) as xb1
predict double xb1, xb
label variable xb1 "Log-odds for Model 1"

* Model 2: EXCLUDE the lower order terms for
* variables involved in the interaction--this
* is what Jen's colleague did using SPSS  

logit low nonwhite#anyftv smoke ht ui
estimates store m2
* Save the fitted values (log-odds) as xb2
predict double xb2, xb
label variable xb2 "Log-odds for Model 2"

* Model 3: Combine the interacting variables into
* a single variale with 4 categories.
generate byte onevar = 2*nonwhite+anyftv
* Check that it worked
tabulate nonwhite anyftv
tab3way onevar nonwhite anyftv

logit low i.onevar smoke ht ui
estimates store m3
* Save the fitted values (log-odds) as xb3
predict double xb3, xb
label variable xb3 "Log-odds for Model 3"

generate double diff12 = xb1 - xb2
generate double diff13 = xb1 - xb3
generate double diff23 = xb2 - xb3

summarize xb* diff*, sep(3)

* The model Chi-square tests for all models are the same,
* and the fitted values from the 3 models are the same.
* So, you can generate the same fitted value comparisons
* using any of these models, and therefore, the same ORs.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Announcement