interaction in logistic regression - differences between over() and at() using margins

jody mdoda

Join Date: Sep 2015

Posts: 7
#1

interaction in logistic regression - differences between over() and at() using margins

05 Apr 2017, 05:43

Dear Statalist-members!

My aim: I want to predict a certain type of Y (behavior, 4 categories) by X1 (gender, 2 categories), X2 (hours of exposure to intervention material, continuous) and X3 (controll variable). I expect that the hours of exposure will have a differential effect for both gender groups.

What I did: First, I ran a multinomial logistic regression without an interaction term. Second, I ran the same regression including an interaction term.

My code:
qui mlogit Y i.X1 X2 X3, rrr base(1) // Base model
margins, dydx(*) post

qui mlogit Y i.X1 X2 i.X1#c.X2 X3, rrr base(1) // Interaction model
margins, dydx(*) post
margins, dydx(X2) over(X1)
margins, dydx(X2) at (X1=(0 1))

My questions:
1. In terms of the size of the interaction effect it is above my understanding, why the the "over" and the "at"-specification result in slightly different estimates (though the general effect structure remains the same)?
2. I suppose that I can interpret the margins effects from the "over" or "at" commands as: the average increase in likelihood to show a behaviour - with one hour more exposure - for either men or women - subsequently for all four behaviour types. What I found interesting and very different to the interactions I have dealt with in OLS, is that the main effects (i.e. X1 and X2) did only minimally differ between the base model and the interaction model (though the effect of X2 is also very small).

I would be thankful for any small piece of advice!
Tags: None

1 like
Clyde Schechter

Join Date: Apr 2014

Posts: 29909
#2

05 Apr 2017, 09:25

Regarding the difference between -over()- and -at()-:

When -margins- calculates results using -over(X1)-, it loops over the values of X1. For each value of X1 it then calculates the specified margin using only those observations in the data for which X1 takes on the current value in the loop. The rest of the data is ignored.

When -margins- calculates results using -at(X1 = (0 1))-, it again loops over the specified values of X1, namely 0 and 1. However, here, instead of restricting to the data where X1 takes the corresponding value, it creates a new virtual data set in which every observation has X1 at that value, and calculates the margin in this virtual data set.

If, for example, X1 is sex, with male = 0, female = 1, the -over(sex)- option calculates the male margin using only those observations with sex == male, and the female margin using only those observations with sex == female. With the -at(sex = (0 1))- option, the male margin is calculated by setting sex to male in every observation of the entire data set and then calculating the margin, and correspondingly for female.

So, the output from the -at()- option is a set of margins that are adjusted to the distributions of the covariates in the entire sample. With the -over()- option, the margins are calculated by restricting to the values of the -over()- variable and there is no adjustment for any differences in the distributions of other variables.

At least in epidemiology, -over()- has very limited use and -at()- is nearly always what is wanted.

By the way, a note about -margins- syntax. Since X1 was a discrete variable, instead of -margins, dydx(X2) at(X1=(0 1))- you could have morme simply written -margins X1, dydx(X2)-.

the main effects (i.e. X1 and X2) did only minimally differ between the base model and the interaction model (though the effect of X2 is also very small).

Well, this is true only because the interaction effect was very small and follows directly from the algebra of interaction models. More important, although it is standard usage to refer to the X1 and X2 terms in the interaction model as the "main effects" of X1 and X2, it is very misleading terminology and sooner or later you are likely to make important mistakes in interpreting models if you think of them as main effects. In an interaction model, what you get as the coefficient of X1 is not the main effect of X1. In fact there is, in an interaction model, no such thing as "the main effect" of X1. Rather X1 has many different effects, depending on the value of X2. The one that is shown in the regression output associated with X1 is the effect of X1 conditional on X2 = 0. And vice versa for X2. It is really important not to allow yourself to think that anything in the output of an interaction model represents "the main effect" of anything.
2 likes
Comment
jody mdoda

Join Date: Sep 2015

Posts: 7
#3

05 Apr 2017, 11:08

Dear Clyde,

many thanks for your precise and accessible answer. I believe, it will be helpful for others who face the same uncertainty.

Please allow me re-iterate in my words in order to make sure that I haven't misunderstood anything.

Concerning point 1: In other words, over() does a quasi-split of the sample into subsamples along X1, while at() follows the traditional logic of holding the variable constant (subsequently for the values of X1).

Concerning point 2: I'm afraid, my formulation here was sloppy. From OLS, I've learned that the effects of X1 and X2 can only be interpreted as "main effects" as long as no interactions between X1 and X2 is present. If my regression were an OLS regression, the effects of X1 would represent the change associated with being female 1 (vs. male 0) for someone who has 0 hours of exposure. Respectively, the average increase in outcome after an increase of one hour of exposure for someone who is male. I guess the same applies to the logistic regression.

My confusion came from the small differences between the effects X1 in the base model and in the interaction model. However, it makes sense because the average effect of X2 is small, so a male person with 0 hours of exposure will not differ strongly from a male person with 3 hours of exposure. (Similarly for females.) I now have to decide for myself how to report the findings - either in two almost identical tables or potentially with a note that explains the situation and gives values for those margins that have minimally changed (<0.01 with changes of effect sizes becoming only relevant around 0.08 - 0.1).

(I guess, part of my confusion also came from the computerization that showed effects for both gender groups at the same time which I have not used for an OLS regression.)

Many thanks again!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29909
#4

05 Apr 2017, 11:34

Regarding Point 1: your interpretation is correct.

Regarding Point 2: what you learned regarding interpreting the regression coefficient as the change associated with being female vs male for someone who has 0 hours of exposure is correct for an OLS linear regression. (Well, to be strictly correct I would change the word "change" to "difference" to avoid any implication of causality that might not be justified.)

In logistic regression it's a bit more complicated. If the "change" (really, "difference") you refer to is in the log-odds of the outcome, then the statement remains true. But if you're thinking about difference in probability of outcome, then it's not true because the invlogit() function intervenes. And because invlogit() is non-linear one can't even speak of a single marginal effect on the outcome probability: the marginal effect would depend on the baseline probability you're starting from even in a model with no interaction terms! So there are multiple layers here.
Comment
jody mdoda

Join Date: Sep 2015

Posts: 7
#5

05 Apr 2017, 12:56

Many thanks for taking the time - it's really highly appreciated!

So far, I thought the Average Margins Effect gives the difference in the probability of a positive outcome, computed for each case and then averaged over all cases. I.e. averaged over all cases in my sample, the probability to show behaviour 1 (vs. not showing it) increases by 9% (AME = 0.09) if the person is female (rather than male). My intepretation of the interaction from the computerization above would have been that on average (and only on average) the probability to show behaviour 1 would increase by 2% with a one unit decrease in exposure, and for male 3% with a one unit increase in exposure.

I thought this would be enough to show that a predictor has a positive effect and - if the difference is over a certain treshold (also taking predictor SD into account for continuous variables as X2 or X3) - to state that it seems substantially meaningful. If I understand you correctly, you make the point that I need to pay attention to the baseline probability that someone shows behaviour 1 to assess how substantial the change in probability really is? (However, if I take the sample distribution of behaviour 1 as baseline, it will be influenced by the distribution of the predictors in the sample?!)

Please excuse if I have completely misunderstood you! Lacking a formal statistics education, I was quite happy to find the AME for (what I took as) easy interpretability and would have accepted that I loose some information on the way (as far as I have understood e.g. from http://www.stata.com/statalist/archi.../msg00293.html ).

Many thanks again!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29909
#6

05 Apr 2017, 14:04

I'm sorry if my discussion about marginal effects in logistic regression being conditional on a baseline probability, even in the absence of interaction terms, confused you.

Without seeing your outputs, I can't really comment on your interpretation of them. Referring to the code you posed in #1, the output of -margins X1, dydx(X2)- would indeed give you the average marginal effect of X2 in each level of X1 with appropriate adjustments for everything else going on. (Your code -margins, dydx(X2) at(X1 = (0 1))- does the same thing, assuming that 0 and 1 are the only values X1 takes on.) Those may well be the results you are most interested in. Given that X2 is a continuous variable, it is also possible to evaluate -margins X1, dydx(X2) at(X2 = (whatever))-, where whatever gets replaced by particular values of X2 that are of interest. When I do this kind of modeling, I am typically interested in certain specific values of X2, and I often use this approach rather than average marginal effects. With a third continuous variable X3, one could also add in -at()- options for interesting values of X3 if desired. But all of these approaches are perfectly legitimate and all that matters is that you understand which one you are doing and what the results mean.

It's really a matter of what your research goals are: you use approach that gives the statistic that is most germane to those.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4900
#7

05 Apr 2017, 15:20

At least in epidemiology, -over()- has very limited use and -at()- is nearly always what is wanted.

As a sidelight I have a very strong preference for at() as well. But if anybody has any rousing defenses for over() I'd be interested in hearing them.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 18.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
jody mdoda

Join Date: Sep 2015

Posts: 7
#8

06 Apr 2017, 03:30

Dear Clyde, many thanks for the clarification!

In my case, the main interest lies in the effects of X1. The effect of X2 is of secondary importance for this model and only gets two or three sentences in the paper. So in my situation, I'm fine with the general direction of the effect and a very broad idea of the size of the difference it can make (I also don't have reason to assume that any threshold value exists for X2 which could be missed using the AME). Having said this, I will definitely take up your idea to compute the margins for interesting values of X2 - i.e. no exposure, mean exposure, max exposure.

Thank you again for getting me further in understanding the different possibilities that come with the use of margins!
Comment
Nayara Gomes

Join Date: Aug 2021

Posts: 13
#9

21 Mar 2022, 09:34

Richard Williams About your comment in favor of "at" instead of "over", I have no rousing defenses to "over" but there is an example in Stata Journal where they use "over" instead of "at" to analyse an interaction in nonlinear models. I confess I would like to know why the preference for "over" in this case. But it's a situation in favor of it.
Comment
Nayara Gomes

Join Date: Aug 2021

Posts: 13
#10

21 Mar 2022, 09:34

https://journals.sagepub.com/doi/pdf...867X1001000211
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29909
#11

21 Mar 2022, 09:49

In the article you link to, no explanation was given for using -over()- instead of -at()- (or just -margins black#collgrad, expression(exp(xb()))-. What I will point out is that, in this particular situation, where black and collgrad are the only variables in the model (baseline is not really a variable: it is just the constant term masquerading as a variable), they amount to the same thing. Try it and you will see you get the same results either way.

The difference between -over()- and -at()- arises when there are other variables in the model whose effects can be adjusted for (-at()- or -margins varlist-) or not (-over()-). When there are no other variables in the model, there is nothing else to adjust for, and -over()- and -at()- will produce the same results.
Comment
Nayara Gomes

Join Date: Aug 2021

Posts: 13
#12

21 Mar 2022, 09:57

Clyde Schechter Thank you for your comment. I was exactly thinking like you. But what confused my mind was that in the end of this article the author mentioned that: "The example here is relatively simple with only binary variables and no controlling variables. However, the basic argument still holds when using continuous variables and when controlling variables are added." what suggests that "over" could be used even when there are other controlling variabels in the model. Am I understanding wrong?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29909
#13

21 Mar 2022, 10:13

Well, I cannot read the author's mind any better than you can. But my interpretation of his closing paragraph is that he felt that this simple example illustrated the general approach to understanding multiplicative effects in interaction models. Early on, he states:

This tip deals with how to interpret these interaction effects when we want to present effects as odds ratios or incidence-rate ratios, which can be an attractive alternative to interpreting interactions effects in terms of marginal effects.

I don't think he was endorsing specifically using the -over()- option instead of the -at()- option. After all, he also used the -post- option in his -margins- command, which clearly has nothing to do with the main purpose of the article, and has no effect on the results obtained, nor does he ever make use of the posted -margins- results in the article.

I can't emphasize enough that there is no generic right or wrong for choosing between -over()- and -at()-. They are, in the case of a model with additional variables, two different things. They are answers to different questions. So the choice between them depends on being clear about what the research question is, and choosing the one that answers that question.
Comment
Nayara Gomes

Join Date: Aug 2021

Posts: 13
#14

21 Mar 2022, 10:33

Thank you for your point.
Just one last question. Then in case of the research question considered in this article would you analyse the interaction using - margins, at(black=(0 1) collgrad=(0 1)) expression(exp(xb())) - in case of other controlling variables in the model?
What would be the research question for use - over() - option for this kind of problem?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29909
#15

21 Mar 2022, 12:41

Let's say I'm interested in racial differences in attainment of high occupations and some factors affecting it. If I want to describe the disparity, and how it plays out among college graduates and non college graduates, I would use -over()-. That is because I am interested only in description/prediction of the occupational outcomes by race and college graduation attainment. This might be the perspective that a firm thinking about where to open a branch office might take if they needed to hire people in high occupations: they would want to be able to know from simple demographic data which places have more of those people available, and a simple predictive approach will give them an actionable answer. Whether the effects of race and graduating fro college are directly causal or are due to, or mediated by, other factors is irrelevant in this context.

If, on the other hand, I want to understand causal factors, then it would be important to take into account other variables. Perhaps, for example, the observed racial disparity is really a collateral effect of blacks being more likely to come from homes with lower income. In that case, I would definitely need the results fully adjusted for this economic factor, and would use -at()-. (Well, I would actually do it as -margins black#collgrad...- rather than at(...)--but only because it's easier to type; the outputs will be the same either way.) This might also be the perspective taken by government policy analysts trying to develop programs or laws that would reduce racial differences in job attainment. Just tinkering with one variable, if it's not really causal of the outcomes, won't work.

Last edited by Clyde Schechter; 21 Mar 2022, 12:45.
Comment

Announcement

interaction in logistic regression - differences between over() and at() using margins

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment