Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting lincom output / interaction terms

    Hi everyone,

    I have a question on using the Stata command lincom and more generally on interaction terms in logit. I have shared a sample problem here to keep it simple. I've reviewed previous posts but I couldn't find any simple examples on estimating the values for OR.

    Code:
    . codebook AgeGroup
                Tabulation: Freq.   Numeric  Label
                              174         1  Below 50
                              160         2  50-60
                              166         3  Above 60
    
    . codebook Ethnicity
                Tabulation: Freq.   Numeric  Label
                              123         1  White
                              223         2  Black
                              154         3  Other
    
    . logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol
    
    
    Logistic regression                                     Number of obs =    500
                                                            LR chi2(9)    = 231.40
                                                            Prob > chi2   = 0.0000
    Log likelihood = -220.80545                             Pseudo R2     = 0.3438
    
    ------------------------------------------------------------------------------------
               Disease | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------------+----------------------------------------------------------------
              AgeGroup |
                50-60  |  -2.816795   1.136811    -2.48   0.013    -5.044903   -.5886857
             Above 60  |  -1.067821   .8854477    -1.21   0.228    -2.803267    .6676246
                       |
             Ethnicity |
                Black  |  -.6493938    .659601    -0.98   0.325    -1.942188    .6434004
                Other  |   2.178724   .5162113     4.22   0.000     1.166968    3.190479
                       |
    AgeGroup#Ethnicity |
          50-60#Black  |   3.274167   1.238932     2.64   0.008     .8459049    5.702429
          50-60#Other  |   1.996989   1.176986     1.70   0.090    -.3098599    4.303839
       Above 60#Black  |   1.586898   .8446274     1.88   0.060    -.0685418    3.242337
       Above 60#Other  |   1.023928   .8721728     1.17   0.240    -.6854992    2.733355
                       |
                  Chol |   .1186652   .0320337     3.70   0.000     .0558803    .1814501
                 _cons |  -19.17964   4.669492    -4.11   0.000    -28.33168    -10.0276
    ------------------------------------------------------------------------------------
    In my analysis the reference categories is White (ib1.AgeGroup), Below 50 (ib1.Ethnicity).

    I am calculating Odds Ratios as follows -

    Code:
    //Black, Age 50-60
    . lincom 2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity
    
     ( 1)  [Disease]2.AgeGroup + [Disease]2.Ethnicity + [Disease]2.AgeGroup#2.Ethnicity = 0
    
    ------------------------------------------------------------------------------
         Disease | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             (1) |  -.1920214   .5838902    -0.33   0.742    -1.336425    .9523823
    ------------------------------------------------------------------------------
    
    //Black, Age Above 60
    . lincom 3.AgeGroup + 2.Ethnicity + 3.AgeGroup#2.Ethnicity
    
     ( 1)  [Disease]3.AgeGroup + [Disease]2.Ethnicity + [Disease]3.AgeGroup#2.Ethnicity = 0
    
    ------------------------------------------------------------------------------
         Disease | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             (1) |  -.1303174   .8125591    -0.16   0.873    -1.722904    1.462269
    ------------------------------------------------------------------------------
    
    //White, Age Above 60
    . lincom 3.AgeGroup
    
     ( 1)  [Disease]3.AgeGroup = 0
    
    ------------------------------------------------------------------------------
         Disease | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             (1) |  -1.067821   .8854477    -1.21   0.228    -2.803267    .6676246
    ------------------------------------------------------------------------------

    My first question is as follows:

    Q1)
    I'd like to interpret the output from lincom and I have shared a few examples below. I'd like to confirm if my intepretations are accurate.


    OR of Black Age 50-60 = exp(-.1920214) = 0.83, i.e., Black individuals between ages of 50-60 are 17% less likely to have the disease compared to White individuals below 50 ... (i)

    OR of Black Age Above 60 = exp(-.1303174) = 0.88, i.e., Black individuals above age 60 are 12% less likely to have the disease compared to White individuals below 50 ... (ii)

    OR of White Age Above 60 = exp(-1.067821) = .34, i.e., White individuals above age 60 are 66% less likely to have the disease compared to White individuals below 50 ... (iii)


    I am aware that I can also get the ORs by summing the coefficients, for eg., Black Age 50-60 = exp(-2.816795 - .6493938 + 3.274167) = 0.83 as in (i)

    Q2)
    My second question is wrt the p-values. Since none of the p-values were significant we cannot conclude that the true value is not 0. Is the output from the lincom command the correct way to interpret the statistical significance ?

    Q3)
    Is there any other way to get the output for the individual OR values as I have shared here without having to run lincom multiple times.


    test.dta link: Link to Dataset


    Thanks very much in advance!

    - Raj.


  • #2
    Q1: Yes, your interpretations are correct, with one caveat. The widely used expression "x% less likely" in the context of an odds ratio is wrong. It is the odds of the outcome that is "x% less," not the likeliness (i.e. probability) of the outcome. If we are dealing with probabilities that are small, then the odds ratio and the probability ratio will be almost the same. But once the probabilities get much above 10%, the difference grows rapidly and tends to infinity. For example, if we have an outcome probability of 50% in one group and 25% in another, then the probability ratio is .25/.50 = .5. But the odds ratio is (.25/(1-.25))/(.50/(1-.50)) = 0.33. And if we have an outcome probability of .995 in one group and .99 in the other, although the probability ratio is .995, the odds ratio is .497!!!

    Q2: The statistics you show in your post are in the coefficient (log-odds) metric. So if you follow the conventional approach to interpreting p-values, you will fail to reject the null hypothesis that the log odds ratio is 0, or, equivalently, that the odds ratio is 1.

    Q3: No, there is no way to avoid repeated -lincom- commands here. However, you can save yourself some tedious and error-prone typing by doing it in nested loops:
    Code:
    forvalues a = 1/3 {
        forvalues e = 1/3 {
            lincom `a'.AgeGroup + `e'.Ethnicity + `a'.AgeGroup#`e'.Ethnicity
        }
    }
    Also, you can save yourself the burden of exponentiating all the results. Specify the -or- option to -lincom- and you will get the results in the odds ratio metric rather than the coefficient metric.

    Comment


    • #3
      Clyde Schechter , thank you so much !! This is incredible!

      The actual problem I am working on involves imputed survey datasets. If I may ask a couple of questions. Please feel free to advise if possible. I am providing more details below to make it more comprehensive in case anyone else has similar questions in the future. I have created a simulated dataset just to highlight to main questions.


      Q1) I noticed that using logistic or logit within the mi estimate makes no difference ... it produces the same values. Am I imputing / estimating accurately.

      Code:
      **Create ColWithMissing and Weights variables to test svy and mi
      capture drop ColWithMissing
      gen ColWithMissing = Chol
      replace ColWithMissing = . if Chol > 150
      capture drop Weights
      gen Weights = runiformint(1,5)
      
      **Create imputed dataset with survey design
      set seed 1001
      mi set mlong
      mi svyset [pw=Weights]
      
      ** imputing values
      mi register imputed ColWithMissing
      mi register regular Age Ethnicity AgeGroup Disease
      mi impute reg ColWithMissing AgeGroup Ethnicity AgeGroup#ib1.Ethnicity, add(5) rseed(2560) dots
      
      **mi estimate - same question as before, but this time using mi and svy
      mi estimate, or dots mcerror: svy: logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol
      
      Multiple-imputation estimates                   Imputations       =          5
      Survey: Logistic regression                     Number of obs     =      5,390
      
      Number of strata  =         1                   Population size   =     16,248
      Number of PSUs    =     5,390
                                                      Average RVI       =     0.0000
                                                      Largest FMI       =     0.0000
                                                      Complete DF       =       5389
      DF adjustment:   Small sample                   DF:     min       =   5,387.00
                                                              avg       =   5,387.00
                                                              max       =   5,387.00
      Model F test:       Equal FMI                   F(   9, 5387.0)   =      97.65
      Within VCE type:   Linearized                   Prob > F          =     0.0000
      
      ------------------------------------------------------------------------------------
                 Disease | Odds ratio   Std. err.      t    P>|t|     [95% conf. interval]
      -------------------+----------------------------------------------------------------
                AgeGroup |
                  50-60  |   .0662269   .0351915    -5.11   0.000     .0233681    .1876922
                         |          0          0     0.00   0.000            0           0
                         |
               Above 60  |   .3155816   .1686534    -2.16   0.031     .1106903    .8997334
                         |          0          0     0.00   0.000     6.94e-18           0
                         |
                         |
               Ethnicity |
                  Black  |   .7352435   .5473435    -0.41   0.680     .1708526    3.164031
                         |          0          0     0.00   0.000            0           0
                         |
                  Other  |   10.97304   6.479265     4.06   0.000      3.44829    34.91805
                         |          0          0     0.00   0.000            0           0
                         |
                         |
      AgeGroup#Ethnicity |
            50-60#Black  |    18.7279   14.89315     3.68   0.000     3.939389    89.03266
                         |          0          0     0.00   0.000            0           0
                         |
            50-60#Other  |   5.721085   3.744724     2.66   0.008     1.585627    20.64219
                         |          0          0     0.00   0.000            0           0
                         |
         Above 60#Black  |   3.359254   2.542434     1.60   0.109     .7618518    14.81205
                         |          0          0     0.00   0.000            0    8.88e-16
                         |
         Above 60#Other  |   2.528736   1.568785     1.50   0.135     .7494003     8.53283
                         |          0          0     0.00   0.000            0           0
                         |
                         |
                    Chol |   1.142313   .0154422     9.84   0.000     1.112437     1.17299
                         |          0   8.67e-19     0.00   0.000            0           0
                         |
                   _cons |   4.69e-10   9.44e-10   -10.68   0.000     9.09e-12    2.42e-08
                         |          0          0     0.00   0.000            0    1.65e-24
      ------------------------------------------------------------------------------------
      Note: _cons estimates baseline odds.
      I believe lincom does not work with mi , so instead I estimated within the mi framework as shown below.

      Q2) Is the method for estimating OR as shown here within mi - exp(-.0922076) accurate

      Code:
      mi estimate (_b[2.AgeGroup] + _b[2.Ethnicity] + _b[2.AgeGroup#2.Ethnicity]) , or dots mcerror: svy: logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol
      Transformations                                 Average RVI       =     0.0000
                                                      Largest FMI       =     0.0000
                                                      Complete DF       =       5389
      DF adjustment:   Small sample                   DF:     min       =   5,387.00
                                                              avg       =   5,387.00
      Within VCE type:   Linearized                           max       =   5,387.00
      
              _mi_1: _b[2.AgeGroup] + _b[2.Ethnicity] + _b[2.AgeGroup#2.Ethnicity]
      
      ------------------------------------------------------------------------------
           Disease | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
             _mi_1 |  -.0922076   .4754096    -0.19   0.846    -1.024203    .8397875
                   |          0   2.78e-17     0.00   0.000            0           0
      ------------------------------------------------------------------------------
      Note: Values displayed beneath estimates are Monte Carlo error estimates.
      This is almost the same as the estimates from using one of the imputed datasets (just to cross-check)

      Code:
      mi xeq 2: svy: logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol
      mi xeq 2: lincom 2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity
      
      . mi xeq 2: lincom 2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity
      
      m=2 data:
      -> lincom 2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity
      
       ( 1)  [Disease]2.AgeGroup + [Disease]2.Ethnicity + [Disease]2.AgeGroup#2.Ethnicity = 0
      
      ------------------------------------------------------------------------------
           Disease | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               (1) |  -.0922076   .4754096    -0.19   0.846    -1.024203    .8397875
      ------------------------------------------------------------------------------
      
      **Find Odds Ratio
      . di exp(-.0922076)
      .91191582

      Q3) I can't seem to run margins commands on mi data. Also, not quite sure how to interpret the same, say if we take the same example (2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity) - Black, Age 50-60. I have never used margins before. Since I am already estimating the individual ORs, would it add any value (I am only interested in the comparative estimates)

      Code:
      . mi estimate, svy: margins AgeGroup##Ethnicity
      mi estimate: command not supported
          margins is not officially supported by mi estimate; see mi estimation for a list of Stata
          estimation commands that are supported by mi estimate.  You can use option cmdok to allow
          estimation anyway.
      r(198);
      
      . mi xeq 2: svy: logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol
      . mi xeq 2: margins AgeGroup##Ethnicity
      
      m=2 data:
      -> margins AgeGroup##Ethnicity
      e(sample) does not identify the estimation sample
      r(322);
      
      ** Works fine if I instead use it the usual (not mi dataset) way -
      
      mi unset
      logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol
      
      . margins AgeGroup##Ethnicity
      
      Predictive margins                                       Number of obs = 2,130
      Model VCE: OIM
      
      Expression: Pr(Disease), predict()
      
      ------------------------------------------------------------------------------------
                         |            Delta-method
                         |     Margin   std. err.      z    P>|z|     [95% conf. interval]
      -------------------+----------------------------------------------------------------
                AgeGroup |
               Below 50  |   .5362142   .0615364     8.71   0.000     .4156049    .6568234
                  50-60  |   .4590993    .016568    27.71   0.000     .4266265     .491572
               Above 60  |   .5089403   .0198297    25.67   0.000     .4700748    .5478059
                         |
               Ethnicity |
                  White  |   .2099397   .0206301    10.18   0.000     .1695053     .250374
                  Black  |   .4117932   .0137975    29.85   0.000     .3847506    .4388358
                  Other  |   .7583987   .0152519    49.72   0.000     .7285054    .7882919
                         |
      AgeGroup#Ethnicity |
         Below 50#White  |   .4795646   .0923817     5.19   0.000     .2984998    .6606294
         Below 50#Black  |   .3552843    .106231     3.34   0.001     .1470753    .5634933
         Below 50#Other  |   .8472187    .039458    21.47   0.000     .7698824    .9245549
            50-60#White  |   .0753552   .0282056     2.67   0.008     .0200734    .1306371
            50-60#Black  |   .4194325   .0260676    16.09   0.000     .3683409     .470524
            50-60#Other  |   .7170444    .021332    33.61   0.000     .6752345    .7588544
         Above 60#White  |    .247023   .0295681     8.35   0.000     .1890705    .3049755
         Above 60#Black  |   .4083886   .0245444    16.64   0.000     .3602824    .4564947
         Above 60#Other  |    .813742   .0336551    24.18   0.000     .7477792    .8797048
      ------------------------------------------------------------------------------------

      So, to summarise, my questions are:

      Q1) I noticed that using logistic or logit within the mi estimate makes no difference ... if produces the same values

      Q2) Is the method for estimating OR as shown here - exp(-.0922076) accurate

      Q3) I can't seem to run margins commands on mi data. Also, not quite sure how to interpret the same, say if we take the same example (2.AgeGroup + 2.Ethnicity + 2.AgeGroup#2.Ethnicity) - Black, Age 50-60. If there are any references, it would be really helpful !


      Thanks all so much again ! This is really immensely helpful.



      Comment


      • #4
        Q1. That is correct. Even outside the -mi- context, the only difference between -logit- and -logistic- is the display of the results. -logit- shows you the regression coefficients, and -logistic- shows you the odds ratios. In the -mi estimate- context, Stata does not, by default, show the odds ratios. My speculation is that this is because the process of combining the results from the imputed data sets is performed on the coefficients, not the odds ratios (and would produce seriously incorrect results if it were performed on the odds ratios). But, if you prefer to see your results in the odds ratio metric, using -mi estimate, or: logistic ...- will do that for you.

        Q2. Yes that is correct. But, again, if you add the -or- option to your -mi estimate- command, you will, I believe get your results directly in the odds ratio metric.

        Q3. No, you can't run -margins- after -mi estimate-. However, there is an -mimrgns- command, written by Dan Klein that will give you the most of what you would otherwise be able to get from -margins-. It is available from SSC. Do read the help file before using it.

        Comment


        • #5
          Thanks Clyde Schechter !! I tried out -mimrgns- by daniel klein , worked very well.

          I am going through all the posts on -margins- on this forum and it starts getting a bit confusing - esp. considering I have an interaction term. With the dataset I have, I am only interested in the interaction effects wrt only 1 group - White Individuals, Below Age 50. The Odds Ratio interpretations are fine. But then comes -margins-, which gives us the probability of Disease (=1) and is arguably worth including when I present the results. That said, there were posts by Dan Klein stating that margins for interaction terms and -marginsplot- outputs should be interpreted carefully with -mimargins- unless I missed something.

          Could I please confirm that my interpretations are correct. Is there anything else worth adding.

          1. Interpretation of probs -- The probability of Disease for a White Individual, Age Below 50 is 19%

          2. How do we interpret/present the CIs which can take values < 0 or > 1 (negative Lower CI for Black#Below 50 -.0518495)

          3. The y-axis is the Pr(Disease) in -marginsplot- after running the previous -mimrgins- command

          4. Are results of both logistic reg and margins presented in papers in practice. Most papers discuss mainly odds ratios, so when we use margins, the conversation shifts to probs. Are there any papers/examples of how these have been presented in journals/anywhere else ?

          Code:
          ** Test with mimrgns
          mi estimate, or dots mcerror: svy: logit Disease i.AgeGroup i.Ethnicity ib1.AgeGroup#ib1.Ethnicity Chol
          
          ** Margins with mimrgns, use cmdmargins to plot ...
          ** Using invlogit to convert to probs
          mimrgns i.AgeGroup, over(i.Ethnicity)  expression(invlogit(predict(xb))) cmdmargins
          
          Expression   : invlogit(predict(xb))
          over         : Ethnicity
          
          ------------------------------------------------------------------------------------
                             |     Margin   Std. err.      t    P>|t|     [95% conf. interval]
          -------------------+----------------------------------------------------------------
          Ethnicity#AgeGroup |
             White#Below 50  |   .1935772   .0761949     2.54   0.011     .0438733    .3432811
                White#50-60  |   .0276272   .0270549     1.02   0.308    -.0255289    .0807834
             White#Above 60  |   .1631219   .0741983     2.20   0.028      .017341    .3089029
             Black#Below 50  |   .1342476    .094718     1.42   0.157    -.0518495    .3203447
                Black#50-60  |   .3803456   .0587013     6.48   0.000     .2650123    .4956789
             Black#Above 60  |   .3996706   .0861985     4.64   0.000     .2303122    .5690291
             Other#Below 50  |   .7155326   .0784504     9.12   0.000     .5613973     .869668
                Other#50-60  |   .6426625   .0645628     9.95   0.000     .5158128    .7695121
             Other#Above 60  |   .8170674    .099628     8.20   0.000     .6213233    1.012811
          ------------------------------------------------------------------------------------
          
          marginsplot


          Attached Files
          Last edited by Raj Dasgupta; 08 Apr 2024, 16:30.

          Comment


          • #6
            1. Interpretation of probs -- The probability of Disease for a White Individual, Age Below 50 is 19%
            Correct.

            2. How do we interpret/present the CIs which can take values < 0 or > 1 (negative Lower CI for Black#Below 50 -.0518495)
            The -margins- command, and -mimrgns- as well, calculates confidence intervals using the delta method. This leads to the possibility of getting confidence bounds that are outside the 0-1 range. Since the only important aspect of a confidence interval is its coverage probability, we can note that the probability that an estimated probability will fall between -.0518495 and .3203447 is exactly the same as the probability that estimated statistic will fall between 0 and .3203447 because there is no possibility that the estimated probability itself will ever be negative or even exactly zero, because the image of the invlogit() function is the open (0,1) interval. Consequently, it would be perfectly honest to report the confidence interval as being from 0 to .3203447: the coverage probability will be correct, and that is all that matters. Similarly, if you are confronted with a confidence interval whose upper limit is greater than 1, you can replace that limit by 1 and the coverage probability remains the same.

            3. The y-axis is the Pr(Disease) in -marginsplot- after running the previous -mimrgins- command
            This is a sentence fragment. I can't discern what the question is.

            4. Are results of both logistic reg and margins presented in papers in practice. Most papers discuss mainly odds ratios, so when we use margins, the conversation shifts to probs. Are there any papers/examples of how these have been presented in journals/anywhere else ?
            I think your observation that margins are not often reported in the medical literature is correct. I would consider that a weakness of the medical literature. You will find them more often in the health policy literature, and I think they are pretty common in the econometrics literature. Be that as it may, I firmly believe that the purpose of writing articles is to share information with an audience and explain it to them. If your target audience would be unfamiliar with predictive margins, and if you do not have space in your article to explain what they are, then it would probably be best to omit them. If your target audience understands predictive margins, then I would include them, as I think they contain important supplementary information.

            Comment


            • #7
              > This is a sentence fragment. I can't discern what the question is.

              Sorry about that : ) . I meant the y-axis on the marginsplot. You had already confirmed the -mimrgns- output were interpreted in terms of probability which confirms that it indeed is P(Disease) = 1

              Thanks again for all the help. The support from people on this forum on Stata question has been remarkable.

              Comment


              • #8
                Only minor additions:

                1. Instead of writing

                Code:
                mimrgns ... , expression(invlogit(predict(xb))) ...
                you can simply write

                Code:
                mimrgns ... , predict(pr) ...
                2. Given that you have used svy for your model, you might want the vce(unconditional) option with mimrgns; read more on that in the help for margins.

                Edit:

                As you brought up CIs in marginsplot, you might want to read this post for a detailed example illustrating the problem. Sorry for messing up the formatting in that post.
                Last edited by daniel klein; 10 Apr 2024, 05:16.

                Comment

                Working...
                X