Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coefplot for model with quadratic and interaction term

    Statalisters, I am trying to plot ORs from a logistic model that has one continuous predictor variable, the square of that variable, one factor variable (with 3 levels) and the interaction between those. For example:
    logit dx c.var1##c.var1##i.var2, or
    I would like to use the coefplot command to plot the ORs for a given set of x values (i.e. 100(100)1000), similar to how a marginsplot would look (I cannot use margins because this is a case-control study). I have tried manipulating coefplot in many different ways to achieve this, but, it will either not plot it quite right or it will return an error of some sort. The code I am using to store the estimates is something like this:

    forvalues i = 100(100)1000{
    [INDENT=2]local i2 = (`i')^2
    logit dx c.var1##c.var1##i.var2, or [/INDENT][INDENT=3]lincomest `i'*var1 + `i2'*c.var1#c.var1, eform(OR) [/INDENT][INDENT=3]est sto var2_0_`i'[/INDENT][INDENT=2]logit dx c.var1##c.var1##i.var2, or [/INDENT][INDENT=3]lincomest `i'*var1 + `i2'*c.var1#c.var1 + i1.var2 + `i'*i1.var2#c.var1 + `i2'*i1.var2#c.var1#c.var1, eform(OR)
    est sto var2_1_`i'[/INDENT][INDENT=2]logit dx c.var1##c.var1##i.var2, or [/INDENT][INDENT=3]lincomest `i'*var1 + `i2'*c.var1#c.var1+ i2.var2 + `i'*i2.var2#c.var1 + `i2'*i2.var2#c.var1#c.var1, eform(OR)
    est sto var2_2_`i'[/INDENT]
    }
    Please let me know if this is even possible, and if so, how to do it.

    Thanks,
    Amanda


  • #2
    Originally posted by amandacpac View Post
    ...I cannot use margins because this is a case-control study)...
    No, you can use margins. Where did you read that advice?
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Here is one example: https://www.statalist.org/forums/for...e-control-data.

      Comment


      • #4
        Originally posted by amandacpac View Post
        I read through that post, as well as another related one.

        Maarten (the post you linked) is correct that you should present only the odds ratios from a case-control study. Initially, I thought he was wrong to assume that you couldn't use the margins command at all. I recalled that the margins command gives you an option to calculate the margins of any specified expression (correctly specified, anyway!). Margins defaults to predicting the probability, but it can also predict xb, i.e. the linear prediction, aka log odds.

        The problem is, it only predicts log odds, and not odds ratios. However, Richard Williams showed, in post 10, how to get margins to predict the log odds of whatever quantity you want divided by the log odds of the base category - i.e. the odds ratios. You clearly know how to use coefplot already. All you need is to correctly specify the margins command. Note that in Williams' command, he first ran the margins command to get the log odds, then he saved the log odds of the base category to a scalar, then he re-ran the same margins command dividing everything by the log odds of the scalar.

        I hope this example correctly translates his code to your situation, since he used age groups and you are interested in age as continuous.

        Code:
         webuse nhanes2, clear
        logistic highbp i.sex##c.age
        margins, at(age = 30) expression(exp(predict(xb)))
        matrix b = r(b)
        scalar base = b[1,1]
        margins, at(age = (30(5)100))
        expression((exp(predict(xb))/base))  
        marginsplot
        Does that get you closer to what you want?
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Hmmm....It definitely gets me closer! So, since I have a quadratic term in the model, would the third line of code read something like: margins, at(age=30 age_sq = 900) expression(exp(predict(xb))) ? Also, I'm a little confused about the 'base' term. I see that you created the scalar for it, but, what exactly is that referencing? What would the referent group be in this case, age=0?

          Comment


          • #6
            Originally posted by amandacpac View Post
            ... since I have a quadratic term in the model, would the third line of code read something like: margins, at(age=30 age_sq = 900) expression(exp(predict(xb))) ?
            No, you definitely do not need to specify margins that way. Because of the way you specified your logistic command, margins knows that you want age and age squared in the model, and it understands that when you reference just age in the margins.

            Btw, do use code delimiters in replying, as it makes your code easier to reply; tap the # button in the control panel available when you post reply (it's towards the right side) and it will automatically generate code delimiters, or you can type them manually (type "CODE" and "/CODE", enclosed in square brackets, but no quotes, to start and end a block of code.

            Originally posted by amandacpac View Post
            Also, I'm a little confused about the 'base' term. I see that you created the scalar for it, but, what exactly is that referencing? What would the referent group be in this case, age=0?
            If I got the command right, then the answer is no. By typing

            Code:
             
             margins, at(age = 30) expression(exp(predict(xb)))
            I'm pretty sure I am asking -margins- to calculate the specified expression with age set to 30, and all other covariates as observed in the data. (You must specify at with a continuous covariate.) The -margins- command then saves the predicted quantity in a matrix called r(b). I saved r(b) in a matrix called b.

            b[1,1] refers to row 1, column 1, of matrix b. In this case, there's only one row and one column, since I asked for only one prediction. Anyway, that is the predicted odds of whatever the outcome was at age of 30. Then, you save that as a scalar, which is like an invisible variable in Stata's memory that you can call anytime - including in asking margins to predict some sort of expression.

            I forget what the base odds were, but in the subsequent margins command, you are dividing every predicted odds at 5 year age intervals by the odds at age = 30. That gets you the predicted odds ratios relative to age 30. Again, you're not presenting any base odds, or predicted probability - you're only presenting the odds ratio as you observed it.

            PS, in my prior post, I misspoke. When you say
            Code:
            predict(xb)
            , you ask -margins- to predict the log odds instead of its default (which is probability).
            Code:
            expression(exp(...))
            asks -margins- to predict you e to the power of (...). e to the power of the log odds is odds. Then, you divide the odds of whatever predicted values you desired by the odds at the base value of age. I can't read your code clearly, but it looks like you wanted to predict the odds ratios with var2 = 100 to 1000, in steps of 100. Just change the values in my sample code as appropriate for your data/model.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              So sorry about not using the delimiters. I wasn't sure how to, so, I tried indenting it. Apparently, that didn't work as I intended. Here is what the code should have looked like.

              Code:
              forvalues i = 100(100)1000{
              local i2 = (`i')^2
              
              logit dx c.var1##c.var1##i.var2, or
              lincomest `i'*var1 + `i2'*c.var1#c.var1, eform(OR)
              est sto var2_0_`i'
              
              logit dx c.var1##c.var1##i.var2, or
              lincomest `i'*var1 + `i2'*c.var1#c.var1 + `i'*i1.var2#c.var1 + `i2'*i1.var2#c.var1#c.var1, eform(OR)
              est sto var2_1_`i'
              
              logit dx c.var1##c.var1##i.var2, or
              lincomest `i'*var1 + `i2'*c.var1#c.var1 + `i'*i2.var2#c.var1 + `i2'*i2.var2#c.var1#c.var1, eform(OR)
              est sto var2_2_`i'
              }
              And no, I actually wanted to predict the odds ratios with VAR1 = 100 to 1000, in steps of 100 (not var2), but I wanted to do it separately for the three levels of the factor variables, var2, since I have an interaction with that variable. To do that with the margins command would I then just change the second margins command to include the r.var2, as follows:

              Code:
              logit dx c.var1##c.var1##i.var2, or
              margins, at(var1 = 100) expression(exp(predict(xb)))
              matrix b = r(b)
              scalar base = b[1,1]
              margins r.var2, at(var1 = (100(100)1000))
              expression((exp(predict(xb))/base))
              marginsplot
              Thanks! Amanda

              Comment


              • #8
                I tried the code listed above and adding in the 'r.var2' wasn't quite right because it only gave two lines representing the contrasts of the 2nd and 3rd levels of var2 vs level 1, so I tried including the var in the 'at()' expression as follows:

                Code:
                logit dx c.var1##c.var1##i.var2, or
                margins, at(var1 = 100) expression(exp(predict(xb)))
                matrix b = r(b)
                scalar base = b[1,1]
                margins, at(var1 = (100(100)1000) var2=(0(1)2)) expression((exp(predict(xb))/base))
                marginsplot
                This also doesn't appear to be totally correct because the Y axis on the plot ranges from -1 to 3. If these are ORs they shouldn't go below 0. I should note that when I take out the var2 out of that margins command, the scale of the y axis ranges from 0.5 to 2, though, I'm not sure if this actually fixed it, or if all the points just happen to be above 0. What I would like to plot are three lines (one for each level of Var2) denoting the ORs for a given value of var1 (ranging from 100 to 1000 in steps of 100). In this case, I am assuming that 100 will be the referent category of the OR, is that true?

                Comment


                • #9
                  Originally posted by amandacpac View Post
                  I tried the code listed above and adding in the 'r.var2' wasn't quite right because it only gave two lines representing the contrasts of the 2nd and 3rd levels of var2 vs level 1, so I tried including the var in the 'at()' expression as follows:

                  Code:
                  logit dx c.var1##c.var1##i.var2, or
                  margins, at(var1 = 100) expression(exp(predict(xb)))
                  matrix b = r(b)
                  scalar base = b[1,1]
                  margins, at(var1 = (100(100)1000) var2=(0(1)2)) expression((exp(predict(xb))/base))
                  marginsplot
                  This also doesn't appear to be totally correct because the Y axis on the plot ranges from -1 to 3. If these are ORs they shouldn't go below 0. I should note that when I take out the var2 out of that margins command, the scale of the y axis ranges from 0.5 to 2, though, I'm not sure if this actually fixed it, or if all the points just happen to be above 0. What I would like to plot are three lines (one for each level of Var2) denoting the ORs for a given value of var1 (ranging from 100 to 1000 in steps of 100). In this case, I am assuming that 100 will be the referent category of the OR, is that true?
                  That's interesting that the y axis goes below 0. You're correct that ORs should certainly not go below 0.

                  One other thing to note is that actually, the base case you selected is var1 = 100 and var2 = whatever distribution it had in your sample. I think you need to add to the -at- expression for the base case:

                  Code:
                  margins var2, at(var1 = 100) expression(exp(predict(xb)))
                  matrix b = r(b)
                  scalar base = b[1,1]
                  Let's try with NHANES2, but there's one issue.

                  Code:
                  logistic highbp i.sex##c.age##c.age
                  margins sex, at(age = 30) expression(exp(predict(xb)))
                  matrix b = r(b)
                  margins sex, at(age = (30(5)90)) expression(exp(predict(xb))/b[1,1])
                  marginsplot
                  Do the above and you get the odds ratios using males age 30 as the base category, in 5 year increments of age starting at 30. The problem is, the entire line for females is the odds ratios relative to males age 30. I am not sure if that was what you wanted! b[1,2] would get you the base odds for women age 30.

                  You mentioned you were familiar with -coefplot- (on SSC, written by Ben Jann), and I am pretty sure you can do this in coefplot separate calls to margins, then save the estimates from each. See if you can run that? I don't have the time to re-familiarize myself with the coefplot syntax right this minute.
                  Last edited by Weiwen Ng; 18 Dec 2017, 17:59.
                  Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                  When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Comment


                  • #10
                    I got this code to work by running it twice. Using the NHANES example above, I ran both of the margins statements with at(sex=0), and then reran everything, changing the margins statement to at(sex=1). I then saved those results and plotted in coefplot.

                    HOWEVER, when I did this with my data, I noticed that I had a couple of negative lower CI values. This obviously shouldn't be the case, since we are using ORs. I thought this might be an issue with my data (I have some smaller sample sizes), so I went back and messed around with the NHANES data example to double check, and when I run a lincom statement for a given value for age, I get the same OR that is given in the second margins output, however, the CIs are different. Any idea why this might be?

                    Comment


                    • #11
                      Here is the code I used with the NHANES2 dataset.
                      Code:
                      gen height_cent =height - 170 if strata==1
                      
                      *Females
                      logit highbp i.female##c.height_cent##c.height_cent if strata==1, or
                      margins , at(height_cent = 0 female=1) expression(exp(predict(xb)))
                      matrix b = r(b)
                      margins, at(height_cent = (-30(5)30) female=1) expression(exp(predict(xb))/b[1,1])
                      
                      *Males
                      logit highbp i.female##c.height_cent##c.height_cent if strata==1, or
                      margins , at(height_cent = 0 female=0) expression(exp(predict(xb)))
                      matrix b = r(b)
                      margins, at(height_cent = (-30(5)30) female=0) expression(exp(predict(xb))/b[1,1])
                      Last edited by amandacpac; 09 Mar 2018, 14:53.

                      Comment


                      • #12
                        In principle, I think you could switch to predict(xb), to predict log odds, then use coefplot and tell it to exponentiate the result to coefficients. I haven't tested your code for typos or logic, but something like this should work:

                        Code:
                        webuse nhanes2, clear
                        gen height_cent =height - 170 if strata==1
                        logit highbp i.female##c.height_cent##c.height_cent if strata==1, or
                        margins , at(height_cent = 0 female=1) expression(predict(xb))
                        matrix b = r(b)
                        margins, at(height = (-30(5)30) female=1) expression((predict(xb))-b[1,1]) post
                        coefplot, at eform
                        Since you're working in log-odds, the odds ratio should be the difference in log odds. It looks like you're calculating the odds ratios versus mean height. Seems correct. You now effectively have logit-transformed confidence intervals - mind you, not everyone understands what those are, but I think this is a correct approach. If I missed something, which I may well have, I hope to be corrected.
                        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                        Comment


                        • #13
                          Ok, so, that is getting me close, but, the CLs still do not seem to be correct. I'm going to use female=0 as an example, since it is a little easier to verify, so, the code is:
                          Code:
                          webuse nhanes2, clear
                          gen height_cent =height - 170 if strata==1
                          logit highbp i.female##c.height_cent##c.height_cent if strata==1, or
                          margins , at(height_cent = 0 female=0) expression(predict(xb))
                          matrix b = r(b)
                          margins, at(height_cent = (-30(5)30) female=0) expression((predict(xb))-b[1,1]) post
                          coefplot, at eform
                          As an example, let's look at height_cent = 5. The Beta (95% CI) from the 2nd margins command is -.0926283 (-.4382235, .252967). However, when I run the following lincom command:
                          Code:
                          quietly logit highbp i.female##c.height_cent##c.height_cent if strata==1, or
                          lincom 5*c.height_cent + 25*c.height_cent#c.height_cent
                          the Beta (95% CI) is -.0926283 (-.275966, .0907095). So, the Beta is the same, but, the CLs are not. Any thoughts on why this might be??

                          Comment


                          • #14
                            Originally posted by amandacpac View Post
                            Ok, so, that is getting me close, but, the CLs still do not seem to be correct. I'm going to use female=0 as an example, since it is a little easier to verify, so, the code is:
                            Code:
                            webuse nhanes2, clear
                            gen height_cent =height - 170 if strata==1
                            logit highbp i.female##c.height_cent##c.height_cent if strata==1, or
                            margins , at(height_cent = 0 female=0) expression(predict(xb))
                            matrix b = r(b)
                            margins, at(height_cent = (-30(5)30) female=0) expression((predict(xb))-b[1,1]) post
                            coefplot, at eform
                            As an example, let's look at height_cent = 5. The Beta (95% CI) from the 2nd margins command is -.0926283 (-.4382235, .252967). However, when I run the following lincom command:
                            Code:
                            quietly logit highbp i.female##c.height_cent##c.height_cent if strata==1, or
                            lincom 5*c.height_cent + 25*c.height_cent#c.height_cent
                            the Beta (95% CI) is -.0926283 (-.275966, .0907095). So, the Beta is the same, but, the CLs are not. Any thoughts on why this might be??
                            I would welcome others' insights into why the SEs are different. I bet it's that a different method is used to compute the SE in each command. But I do not know enough to comment why this would make a difference.
                            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                            Comment


                            • #15
                              I am not sure how to explain the discrepancy. Margins is giving you the adjusted predictions given the other X values. lincom is giving you the standard errors for a linear combo of coefficients. Different means are being used to calculate the standard errors. This seems like an unusual way to use lincom, i.e. plugging in specific values for X. Personally I would not worry much about it! Just pick one or the other approach and be consistent. My guess is that the CIs from both are correct but are testing different things or using different methods to get the SEs.
                              -------------------------------------------
                              Richard Williams, Notre Dame Dept of Sociology
                              StataNow Version: 19.5 MP (2 processor)

                              EMAIL: [email protected]
                              WWW: https://www3.nd.edu/~rwilliam

                              Comment

                              Working...
                              X