Interpretation of interaction term coefficients of an ordinal logistic regression.

Florian Neubauer

Join Date: Jun 2017
Posts: 25

Interpretation of interaction term coefficients of an ordinal logistic regression.

05 Sep 2017, 08:37

Dear Statalist members,

I am not entirely sure of how to interpret the coefficients (especially of the interaction term) from the ordinal logistic regression that I ran.

In my trials, farmers have rated 5 different maize varieties on different characteristics. The varieties were both grown on-farm and on-station. I would like to analyse the ratings and differentiate between on-station and on-farm.
In order to test the significance of on-station vs. on-farm on the evaluation of the 5 maize varieties, I included a fixed main effect (Variable "ON_STATION"), and the cross effects with the varieties.

The Variable VARIETY has the categories 1 - 5 (1 is omitted).
The Variable ON_STATION has the categories 0 and 1 (0 - on-farm, and 1 - on-station).

The syntax for the regression is:

Code:

ologit OVERALL_EVALUATION i.VARIETY##i.ON_STATION , or
testparm i.ON_STATION i.VARIETY#i.ON_STATION

I get the following output:

Code:

------------------------------------------------------------------------------------
Iteration 0:   log likelihood = -14599.444  
Iteration 1:   log likelihood = -14181.192  
Iteration 2:   log likelihood = -14178.639  
Iteration 3:   log likelihood = -14178.638  

Ordered logistic regression                       Number of obs   =      10095
                                                  LR chi2(9)      =     841.61
                                                  Prob > chi2     =     0.0000
Log likelihood = -14178.638                       Pseudo R2       =     0.0288

------------------------------------------------------------------------------------
OVERALL_EVALUATION | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
           VARIETY |
                2  |   .9797576   .0998608    -0.20   0.841     .8023443      1.1964
                3  |   1.404661   .1457255     3.28   0.001      1.14621    1.721388
                4  |   1.601303   .1661474     4.54   0.000     1.306637    1.962421
                5  |   1.145541   .1170296     1.33   0.184     .9376725    1.399492
                   |
      1.ON_STATION |   2.544482    .227814    10.43   0.000     2.134957    3.032562
                   |
VARIETY#ON_STATION |
              2 1  |   1.687201   .2050854     4.30   0.000     1.329536    2.141083
              3 1  |     .91919   .1165427    -0.66   0.506       .71694    1.178495
              4 1  |   .6450919   .0813495    -3.48   0.001      .503826    .8259668
              5 1  |   .3855812   .0483454    -7.60   0.000     .3015709    .4929948
-------------------+----------------------------------------------------------------
             /cut1 |  -2.420312   .0834386                     -2.583849   -2.256776
             /cut2 |  -.9178322    .074329                     -1.063514     -.77215
             /cut3 |   .4441504   .0736179                      .2998619    .5884389
             /cut4 |   2.179148   .0765604                      2.029093    2.329204
------------------------------------------------------------------------------------

. testparm i.ON_STATION i.VARIETY#i.ON_STATION

 ( 1)  [OVERALL_EVALUATION]1.ON_STATION = 0
 ( 2)  [OVERALL_EVALUATION]2.VARIETY#1.ON_STATION = 0
 ( 3)  [OVERALL_EVALUATION]3.VARIETY#1.ON_STATION = 0
 ( 4)  [OVERALL_EVALUATION]4.VARIETY#1.ON_STATION = 0
 ( 5)  [OVERALL_EVALUATION]5.VARIETY#1.ON_STATION = 0

           chi2(  5) =  518.12
         Prob > chi2 =    0.0000

My interpretation of selected coefficiants would be:

VARIETY 3 (1.404661)--> Unique effect of Variety 3 only when ON_STATION =0. It means that on-farm, the odds of a high score for Variety 3 are 1.404661 times higher than for Variety 1 (the base).

1.ON_STATION (2.544482) --> Unique effect of ON_STATION. On the whole, all varieties do better on-station than on-farm. The odds of a high score for any of the varieties is 2.544482 times higher on-station than on-farm.

VARIETY#ON_STATION 2 1 (1.687201) --> The odds of a higher score on-station are 1.687201 times higher for variety 2 than variety 1 (the base).

Is that correct? It would be great if someone could help me out.

Kind regards,
Florian

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

05 Sep 2017, 09:46

VARIETY 3 (1.404661)--> Unique effect of Variety 3 only when ON_STATION =0. It means that on-farm, the odds of a high score for Variety 3 are 1.404661 times higher than for Variety 1 (the base).

Correct.

1.ON_STATION (2.544482) --> Unique effect of ON_STATION. On the whole, all varieties do better on-station than on-farm. The odds of a high score for any of the varieties is 2.544482 times higher on-station than on-farm.

Incorrect. This is the unique effect of ON_STATION only when variety = 1 (the base variety). The odds of a high score for variety 1 is 2.544482 times higher on-station than on-farm.

VARIETY#ON_STATION 2 1 (1.687201) --> The odds of a higher score on-station are 1.687201 times higher for variety 2 than variety 1 (the base).

I don't understand what you're saying here so I don't know if it's correct or not. I would interpret this as: whatever the variety 1 odds ratio ON_STATION:ON_FARM for a higher outcome is, the variety 2 ON_STATION:ON_FARM odds ratio is that same odds ratio multiplied by 1.687201.

All of that said, the interpretation of these models from the regression output is difficult, and explaining it to others is even harder. I recommend that you use the -margins- command to get the predicted probabilities for each variety in both conditions. I think that's a lot easier to understand:

Code:

margins VARIETY#ON_STATION
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#3

05 Sep 2017, 13:39

I am with Clyde. I the marins and spost13 commands can make interpretation much easier. I mean, a statement like the odds ratios are 1.4 times higher isn't much more intuitive to me than just saying an effect is positive. For examples of using margins and spost13 with ordinal models, see

https://www3.nd.edu/~rwilliam/stats3/Margins05.pdf

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Florian Neubauer

Join Date: Jun 2017

Posts: 25
#4

06 Sep 2017, 02:49

Thanks a lot, Clyde and Richard! This is very helpful.
Comment
Florian Neubauer

Join Date: Jun 2017

Posts: 25
#5

08 Sep 2017, 03:55

Sorry, I have a question to your interpretation for 1.ON_STATION (2.544482).

Clyde, you say

This is the unique effect of ON_STATION only when variety = 1 (the base variety). The odds of a high score for variety 1 is 2.544482 times higher on-station than on-farm.

But, doesn't the variable ON_STATION apply to all varieties? Why does it only refer to the base?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#6

08 Sep 2017, 04:33

Florian:
other things being equal:
under -VARIETY- you find the coefficients related to the levels 2-5 of -VARIETY- when -ON_STATION-=0;
under -ON_STATION- you find the coefficients related to the level 1 of -VARIETY- when -ON_STATION-=1;
under -VARIETY#ON_STATION- you find the coefficients related to the interactions between the levels 2-5 of -VARIETY- and -ON_STATION-, when -ON_STATION-=1.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

08 Sep 2017, 09:23

To elaborate on Carlo's response, the coefficients of the "main effects" of an interaction model never represent average or overall effects of those variables. (The term "main effect" is highly misleading, but it is entrenched in common usage and the literature.) These coefficients represent the effect of their variables conditional on the other interacting variable being in its reference case value (or being 0 if the other variable is continuous).

If you want to know the effect of ON_STATION when VARIETY takes on some other value, you have to add the coefficient of ON_STATION with the coefficient of the corresponding interacation term. The -margins- command does that for you, which is why I recommend always running -margins- after estimating an interaction model.

If you are looking for "the effect" of ON_STATION across all varieties, there is no such thingin an interaction model. By using an interaction model you have stipulated that each level of VARIETY has a different ON_STATION effect. If you don't want that, then don't use an interaction model. (But in your output, these interaction effects look pretty large and important, so using a non-interaction model might be inconsistent with your data.)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#8

08 Sep 2017, 10:10

This hjandout was written with OLS regression in mind, but the same points apply when interpreting other models with main and interaction effects. As Clyde says, the meaning of main effects changes once you add interactions to a model.

https://www3.nd.edu/~rwilliam/stats2/l53.pdf

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment

Announcement

Interpretation of interaction term coefficients of an ordinal logistic regression.

Comment

Comment

Comment

Comment

Comment

Comment

Comment