Interpreting lincom output / interaction terms

Raj Dasgupta

Join Date: Apr 2024
Posts: 5

Interpreting lincom output / interaction terms

07 Apr 2024, 19:10

Hi everyone,

I have a question on using the Stata command lincom and more generally on interaction terms in logit. I have shared a sample problem here to keep it simple. I've reviewed previous posts but I couldn't find any simple examples on estimating the values for OR.

Code:

. codebook AgeGroup
            Tabulation: Freq.   Numeric  Label
                          174         1  Below 50
                          160         2  50-60
                          166         3  Above 60

. codebook Ethnicity
            Tabulation: Freq.   Numeric  Label
                          123         1  White
                          223         2  Black
                          154         3  Other

. logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol


Logistic regression                                     Number of obs =    500
                                                        LR chi2(9)    = 231.40
                                                        Prob > chi2   = 0.0000
Log likelihood = -220.80545                             Pseudo R2     = 0.3438

------------------------------------------------------------------------------------
           Disease | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------------+----------------------------------------------------------------
          AgeGroup |
            50-60  |  -2.816795   1.136811    -2.48   0.013    -5.044903   -.5886857
         Above 60  |  -1.067821   .8854477    -1.21   0.228    -2.803267    .6676246
                   |
         Ethnicity |
            Black  |  -.6493938    .659601    -0.98   0.325    -1.942188    .6434004
            Other  |   2.178724   .5162113     4.22   0.000     1.166968    3.190479
                   |
AgeGroup#Ethnicity |
      50-60#Black  |   3.274167   1.238932     2.64   0.008     .8459049    5.702429
      50-60#Other  |   1.996989   1.176986     1.70   0.090    -.3098599    4.303839
   Above 60#Black  |   1.586898   .8446274     1.88   0.060    -.0685418    3.242337
   Above 60#Other  |   1.023928   .8721728     1.17   0.240    -.6854992    2.733355
                   |
              Chol |   .1186652   .0320337     3.70   0.000     .0558803    .1814501
             _cons |  -19.17964   4.669492    -4.11   0.000    -28.33168    -10.0276
------------------------------------------------------------------------------------

In my analysis the reference categories is White (ib1.AgeGroup), Below 50 (ib1.Ethnicity).

I am calculating Odds Ratios as follows -

Code:

//Black, Age 50-60
. lincom 2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity

 ( 1)  [Disease]2.AgeGroup + [Disease]2.Ethnicity + [Disease]2.AgeGroup#2.Ethnicity = 0

------------------------------------------------------------------------------
     Disease | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -.1920214   .5838902    -0.33   0.742    -1.336425    .9523823
------------------------------------------------------------------------------

//Black, Age Above 60
. lincom 3.AgeGroup + 2.Ethnicity + 3.AgeGroup#2.Ethnicity

 ( 1)  [Disease]3.AgeGroup + [Disease]2.Ethnicity + [Disease]3.AgeGroup#2.Ethnicity = 0

------------------------------------------------------------------------------
     Disease | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -.1303174   .8125591    -0.16   0.873    -1.722904    1.462269
------------------------------------------------------------------------------

//White, Age Above 60
. lincom 3.AgeGroup

 ( 1)  [Disease]3.AgeGroup = 0

------------------------------------------------------------------------------
     Disease | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -1.067821   .8854477    -1.21   0.228    -2.803267    .6676246
------------------------------------------------------------------------------

My first question is as follows:

Q1)
I'd like to interpret the output from lincom and I have shared a few examples below. I'd like to confirm if my intepretations are accurate.

OR of Black Age 50-60 = exp(-.1920214) = 0.83, i.e., Black individuals between ages of 50-60 are 17% less likely to have the disease compared to White individuals below 50 ... (i)

OR of Black Age Above 60 = exp(-.1303174) = 0.88, i.e., Black individuals above age 60 are 12% less likely to have the disease compared to White individuals below 50 ... (ii)

OR of White Age Above 60 = exp(-1.067821) = .34, i.e., White individuals above age 60 are 66% less likely to have the disease compared to White individuals below 50 ... (iii)

I am aware that I can also get the ORs by summing the coefficients, for eg., Black Age 50-60 = exp(-2.816795 - .6493938 + 3.274167) = 0.83 as in (i)

Q2)
My second question is wrt the p-values. Since none of the p-values were significant we cannot conclude that the true value is not 0. Is the output from the lincom command the correct way to interpret the statistical significance ?

Q3)
Is there any other way to get the output for the individual OR values as I have shared here without having to run lincom multiple times.

test.dta link: Link to Dataset

Thanks very much in advance!

- Raj.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

07 Apr 2024, 20:36

Q1: Yes, your interpretations are correct, with one caveat. The widely used expression "x% less likely" in the context of an odds ratio is wrong. It is the odds of the outcome that is "x% less," not the likeliness (i.e. probability) of the outcome. If we are dealing with probabilities that are small, then the odds ratio and the probability ratio will be almost the same. But once the probabilities get much above 10%, the difference grows rapidly and tends to infinity. For example, if we have an outcome probability of 50% in one group and 25% in another, then the probability ratio is .25/.50 = .5. But the odds ratio is (.25/(1-.25))/(.50/(1-.50)) = 0.33. And if we have an outcome probability of .995 in one group and .99 in the other, although the probability ratio is .995, the odds ratio is .497!!!

Q2: The statistics you show in your post are in the coefficient (log-odds) metric. So if you follow the conventional approach to interpreting p-values, you will fail to reject the null hypothesis that the log odds ratio is 0, or, equivalently, that the odds ratio is 1.

Q3: No, there is no way to avoid repeated -lincom- commands here. However, you can save yourself some tedious and error-prone typing by doing it in nested loops:

Code:

forvalues a = 1/3 { forvalues e = 1/3 { lincom `a'.AgeGroup + `e'.Ethnicity + `a'.AgeGroup#`e'.Ethnicity } }

Also, you can save yourself the burden of exponentiating all the results. Specify the -or- option to -lincom- and you will get the results in the odds ratio metric rather than the coefficient metric.
Comment

Raj Dasgupta

Join Date: Apr 2024
Posts: 5

08 Apr 2024, 10:01

Clyde Schechter , thank you so much !! This is incredible!

The actual problem I am working on involves imputed survey datasets. If I may ask a couple of questions. Please feel free to advise if possible. I am providing more details below to make it more comprehensive in case anyone else has similar questions in the future. I have created a simulated dataset just to highlight to main questions.

Q1) I noticed that using logistic or logit within the mi estimate makes no difference ... it produces the same values. Am I imputing / estimating accurately.

Code:

**Create ColWithMissing and Weights variables to test svy and mi
capture drop ColWithMissing
gen ColWithMissing = Chol
replace ColWithMissing = . if Chol > 150
capture drop Weights
gen Weights = runiformint(1,5)

**Create imputed dataset with survey design
set seed 1001
mi set mlong
mi svyset [pw=Weights]

** imputing values
mi register imputed ColWithMissing
mi register regular Age Ethnicity AgeGroup Disease
mi impute reg ColWithMissing AgeGroup Ethnicity AgeGroup#ib1.Ethnicity, add(5) rseed(2560) dots

**mi estimate - same question as before, but this time using mi and svy
mi estimate, or dots mcerror: svy: logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol

Multiple-imputation estimates                   Imputations       =          5
Survey: Logistic regression                     Number of obs     =      5,390

Number of strata  =         1                   Population size   =     16,248
Number of PSUs    =     5,390
                                                Average RVI       =     0.0000
                                                Largest FMI       =     0.0000
                                                Complete DF       =       5389
DF adjustment:   Small sample                   DF:     min       =   5,387.00
                                                        avg       =   5,387.00
                                                        max       =   5,387.00
Model F test:       Equal FMI                   F(   9, 5387.0)   =      97.65
Within VCE type:   Linearized                   Prob > F          =     0.0000

------------------------------------------------------------------------------------
           Disease | Odds ratio   Std. err.      t    P>|t|     [95% conf. interval]
-------------------+----------------------------------------------------------------
          AgeGroup |
            50-60  |   .0662269   .0351915    -5.11   0.000     .0233681    .1876922
                   |          0          0     0.00   0.000            0           0
                   |
         Above 60  |   .3155816   .1686534    -2.16   0.031     .1106903    .8997334
                   |          0          0     0.00   0.000     6.94e-18           0
                   |
                   |
         Ethnicity |
            Black  |   .7352435   .5473435    -0.41   0.680     .1708526    3.164031
                   |          0          0     0.00   0.000            0           0
                   |
            Other  |   10.97304   6.479265     4.06   0.000      3.44829    34.91805
                   |          0          0     0.00   0.000            0           0
                   |
                   |
AgeGroup#Ethnicity |
      50-60#Black  |    18.7279   14.89315     3.68   0.000     3.939389    89.03266
                   |          0          0     0.00   0.000            0           0
                   |
      50-60#Other  |   5.721085   3.744724     2.66   0.008     1.585627    20.64219
                   |          0          0     0.00   0.000            0           0
                   |
   Above 60#Black  |   3.359254   2.542434     1.60   0.109     .7618518    14.81205
                   |          0          0     0.00   0.000            0    8.88e-16
                   |
   Above 60#Other  |   2.528736   1.568785     1.50   0.135     .7494003     8.53283
                   |          0          0     0.00   0.000            0           0
                   |
                   |
              Chol |   1.142313   .0154422     9.84   0.000     1.112437     1.17299
                   |          0   8.67e-19     0.00   0.000            0           0
                   |
             _cons |   4.69e-10   9.44e-10   -10.68   0.000     9.09e-12    2.42e-08
                   |          0          0     0.00   0.000            0    1.65e-24
------------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

I believe lincom does not work with mi , so instead I estimated within the mi framework as shown below.

Q2) Is the method for estimating OR as shown here within mi - exp(-.0922076) accurate

Code:

mi estimate (_b[2.AgeGroup] + _b[2.Ethnicity] + _b[2.AgeGroup#2.Ethnicity]) , or dots mcerror: svy: logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol
Transformations                                 Average RVI       =     0.0000
                                                Largest FMI       =     0.0000
                                                Complete DF       =       5389
DF adjustment:   Small sample                   DF:     min       =   5,387.00
                                                        avg       =   5,387.00
Within VCE type:   Linearized                           max       =   5,387.00

        _mi_1: _b[2.AgeGroup] + _b[2.Ethnicity] + _b[2.AgeGroup#2.Ethnicity]

------------------------------------------------------------------------------
     Disease | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _mi_1 |  -.0922076   .4754096    -0.19   0.846    -1.024203    .8397875
             |          0   2.78e-17     0.00   0.000            0           0
------------------------------------------------------------------------------
Note: Values displayed beneath estimates are Monte Carlo error estimates.

This is almost the same as the estimates from using one of the imputed datasets (just to cross-check)

Code:

mi xeq 2: svy: logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol
mi xeq 2: lincom 2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity

. mi xeq 2: lincom 2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity

m=2 data:
-> lincom 2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity

 ( 1)  [Disease]2.AgeGroup + [Disease]2.Ethnicity + [Disease]2.AgeGroup#2.Ethnicity = 0

------------------------------------------------------------------------------
     Disease | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -.0922076   .4754096    -0.19   0.846    -1.024203    .8397875
------------------------------------------------------------------------------

**Find Odds Ratio
. di exp(-.0922076)
.91191582

Q3) I can't seem to run margins commands on mi data. Also, not quite sure how to interpret the same, say if we take the same example (2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity) - Black, Age 50-60. I have never used margins before. Since I am already estimating the individual ORs, would it add any value (I am only interested in the comparative estimates)

Code:

. mi estimate, svy: margins AgeGroup##Ethnicity
mi estimate: command not supported
    margins is not officially supported by mi estimate; see mi estimation for a list of Stata
    estimation commands that are supported by mi estimate.  You can use option cmdok to allow
    estimation anyway.
r(198);

. mi xeq 2: svy: logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol
. mi xeq 2: margins AgeGroup##Ethnicity

m=2 data:
-> margins AgeGroup##Ethnicity
e(sample) does not identify the estimation sample
r(322);

** Works fine if I instead use it the usual (not mi dataset) way -

mi unset
logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol

. margins AgeGroup##Ethnicity

Predictive margins                                       Number of obs = 2,130
Model VCE: OIM

Expression: Pr(Disease), predict()

------------------------------------------------------------------------------------
                   |            Delta-method
                   |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------------+----------------------------------------------------------------
          AgeGroup |
         Below 50  |   .5362142   .0615364     8.71   0.000     .4156049    .6568234
            50-60  |   .4590993    .016568    27.71   0.000     .4266265     .491572
         Above 60  |   .5089403   .0198297    25.67   0.000     .4700748    .5478059
                   |
         Ethnicity |
            White  |   .2099397   .0206301    10.18   0.000     .1695053     .250374
            Black  |   .4117932   .0137975    29.85   0.000     .3847506    .4388358
            Other  |   .7583987   .0152519    49.72   0.000     .7285054    .7882919
                   |
AgeGroup#Ethnicity |
   Below 50#White  |   .4795646   .0923817     5.19   0.000     .2984998    .6606294
   Below 50#Black  |   .3552843    .106231     3.34   0.001     .1470753    .5634933
   Below 50#Other  |   .8472187    .039458    21.47   0.000     .7698824    .9245549
      50-60#White  |   .0753552   .0282056     2.67   0.008     .0200734    .1306371
      50-60#Black  |   .4194325   .0260676    16.09   0.000     .3683409     .470524
      50-60#Other  |   .7170444    .021332    33.61   0.000     .6752345    .7588544
   Above 60#White  |    .247023   .0295681     8.35   0.000     .1890705    .3049755
   Above 60#Black  |   .4083886   .0245444    16.64   0.000     .3602824    .4564947
   Above 60#Other  |    .813742   .0336551    24.18   0.000     .7477792    .8797048
------------------------------------------------------------------------------------

So, to summarise, my questions are:

Q1) I noticed that using logistic or logit within the mi estimate makes no difference ... if produces the same values

Q2) Is the method for estimating OR as shown here - exp(-.0922076) accurate

Q3) I can't seem to run margins commands on mi data. Also, not quite sure how to interpret the same, say if we take the same example (2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity) - Black, Age 50-60. If there are any references, it would be really helpful !

Thanks all so much again ! This is really immensely helpful.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

08 Apr 2024, 11:41

Q1. That is correct. Even outside the -mi- context, the only difference between -logit- and -logistic- is the display of the results. -logit- shows you the regression coefficients, and -logistic- shows you the odds ratios. In the -mi estimate- context, Stata does not, by default, show the odds ratios. My speculation is that this is because the process of combining the results from the imputed data sets is performed on the coefficients, not the odds ratios (and would produce seriously incorrect results if it were performed on the odds ratios). But, if you prefer to see your results in the odds ratio metric, using -mi estimate, or: logistic ...- will do that for you.

Q2. Yes that is correct. But, again, if you add the -or- option to your -mi estimate- command, you will, I believe get your results directly in the odds ratio metric.

Q3. No, you can't run -margins- after -mi estimate-. However, there is an -mimrgns- command, written by Dan Klein that will give you the most of what you would otherwise be able to get from -margins-. It is available from SSC. Do read the help file before using it.
Comment
Raj Dasgupta

Join Date: Apr 2024

Posts: 5
#5

08 Apr 2024, 15:27

Thanks Clyde Schechter !! I tried out -mimrgns- by daniel klein , worked very well.

I am going through all the posts on -margins- on this forum and it starts getting a bit confusing - esp. considering I have an interaction term. With the dataset I have, I am only interested in the interaction effects wrt only 1 group - White Individuals, Below Age 50. The Odds Ratio interpretations are fine. But then comes -margins-, which gives us the probability of Disease (=1) and is arguably worth including when I present the results. That said, there were posts by Dan Klein stating that margins for interaction terms and -marginsplot- outputs should be interpreted carefully with -mimargins- unless I missed something.

Could I please confirm that my interpretations are correct. Is there anything else worth adding.

1. Interpretation of probs -- The probability of Disease for a White Individual, Age Below 50 is 19%

2. How do we interpret/present the CIs which can take values < 0 or > 1 (negative Lower CI for Black#Below 50 -.0518495)

3. The y-axis is the Pr(Disease) in -marginsplot- after running the previous -mimrgins- command

4. Are results of both logistic reg and margins presented in papers in practice. Most papers discuss mainly odds ratios, so when we use margins, the conversation shifts to probs. Are there any papers/examples of how these have been presented in journals/anywhere else ?

Code:

** Test with mimrgns mi estimate, or dots mcerror: svy: logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol ** Margins with mimrgns, use cmdmargins to plot ... ** Using invlogit to convert to probs mimrgns i.AgeGroup, over(i.Ethnicity) expression(invlogit(predict(xb))) cmdmargins Expression : invlogit(predict(xb)) over : Ethnicity ------------------------------------------------------------------------------------ | Margin Std. err. t P>|t| [95% conf. interval] -------------------+---------------------------------------------------------------- Ethnicity#AgeGroup | White#Below 50 | .1935772 .0761949 2.54 0.011 .0438733 .3432811 White#50-60 | .0276272 .0270549 1.02 0.308 -.0255289 .0807834 White#Above 60 | .1631219 .0741983 2.20 0.028 .017341 .3089029 Black#Below 50 | .1342476 .094718 1.42 0.157 -.0518495 .3203447 Black#50-60 | .3803456 .0587013 6.48 0.000 .2650123 .4956789 Black#Above 60 | .3996706 .0861985 4.64 0.000 .2303122 .5690291 Other#Below 50 | .7155326 .0784504 9.12 0.000 .5613973 .869668 Other#50-60 | .6426625 .0645628 9.95 0.000 .5158128 .7695121 Other#Above 60 | .8170674 .099628 8.20 0.000 .6213233 1.012811 ------------------------------------------------------------------------------------ marginsplot

Attached Files

Last edited by Raj Dasgupta; 08 Apr 2024, 15:30.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

08 Apr 2024, 15:51

1. Interpretation of probs -- The probability of Disease for a White Individual, Age Below 50 is 19%

Correct.

2. How do we interpret/present the CIs which can take values < 0 or > 1 (negative Lower CI for Black#Below 50 -.0518495)

The -margins- command, and -mimrgns- as well, calculates confidence intervals using the delta method. This leads to the possibility of getting confidence bounds that are outside the 0-1 range. Since the only important aspect of a confidence interval is its coverage probability, we can note that the probability that an estimated probability will fall between -.0518495 and .3203447 is exactly the same as the probability that estimated statistic will fall between 0 and .3203447 because there is no possibility that the estimated probability itself will ever be negative or even exactly zero, because the image of the invlogit() function is the open (0,1) interval. Consequently, it would be perfectly honest to report the confidence interval as being from 0 to .3203447: the coverage probability will be correct, and that is all that matters. Similarly, if you are confronted with a confidence interval whose upper limit is greater than 1, you can replace that limit by 1 and the coverage probability remains the same.

3. The y-axis is the Pr(Disease) in -marginsplot- after running the previous -mimrgins- command

This is a sentence fragment. I can't discern what the question is.

4. Are results of both logistic reg and margins presented in papers in practice. Most papers discuss mainly odds ratios, so when we use margins, the conversation shifts to probs. Are there any papers/examples of how these have been presented in journals/anywhere else ?

I think your observation that margins are not often reported in the medical literature is correct. I would consider that a weakness of the medical literature. You will find them more often in the health policy literature, and I think they are pretty common in the econometrics literature. Be that as it may, I firmly believe that the purpose of writing articles is to share information with an audience and explain it to them. If your target audience would be unfamiliar with predictive margins, and if you do not have space in your article to explain what they are, then it would probably be best to omit them. If your target audience understands predictive margins, then I would include them, as I think they contain important supplementary information.
Comment
Raj Dasgupta

Join Date: Apr 2024

Posts: 5
#7

08 Apr 2024, 18:35

> This is a sentence fragment. I can't discern what the question is.

Sorry about that : ) . I meant the y-axis on the marginsplot. You had already confirmed the -mimrgns- output were interpreted in terms of probability which confirms that it indeed is P(Disease) = 1

Thanks again for all the help. The support from people on this forum on Stata question has been remarkable.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#8

10 Apr 2024, 04:13

Only minor additions:

1. Instead of writing

Code:

mimrgns ... , expression(invlogit(predict(xb))) ...

you can simply write

Code:

mimrgns ... , predict(pr) ...

2. Given that you have used svy for your model, you might want the vce(unconditional) option with mimrgns; read more on that in the help for margins.

Edit:

As you brought up CIs in marginsplot, you might want to read this post for a detailed example illustrating the problem. Sorry for messing up the formatting in that post.

Last edited by daniel klein; 10 Apr 2024, 04:16.
Comment

Announcement

Interpreting lincom output / interaction terms

Comment

Comment

Comment

Comment

Comment

Comment

Comment