Coefplot for model with quadratic and interaction term

amandacpac

Join Date: Sep 2014

Posts: 52
#1

Coefplot for model with quadratic and interaction term

17 Dec 2017, 13:17

Statalisters, I am trying to plot ORs from a logistic model that has one continuous predictor variable, the square of that variable, one factor variable (with 3 levels) and the interaction between those. For example:
logit dx c.var1##c.var1##i.var2, or
I would like to use the coefplot command to plot the ORs for a given set of x values (i.e. 100(100)1000), similar to how a marginsplot would look (I cannot use margins because this is a case-control study). I have tried manipulating coefplot in many different ways to achieve this, but, it will either not plot it quite right or it will return an error of some sort. The code I am using to store the estimates is something like this:

forvalues i = 100(100)1000{
[INDENT=2]local i2 = (`i')^2
logit dx c.var1##c.var1##i.var2, or [/INDENT][INDENT=3]lincomest `i'*var1 + `i2'*c.var1#c.var1, eform(OR) [/INDENT][INDENT=3]est sto var2_0_`i'[/INDENT][INDENT=2]logit dx c.var1##c.var1##i.var2, or [/INDENT][INDENT=3]lincomest `i'*var1 + `i2'*c.var1#c.var1 + i1.var2 + `i'*i1.var2#c.var1 + `i2'*i1.var2#c.var1#c.var1, eform(OR)
est sto var2_1_`i'[/INDENT][INDENT=2]logit dx c.var1##c.var1##i.var2, or [/INDENT][INDENT=3]lincomest `i'*var1 + `i2'*c.var1#c.var1+ i2.var2 + `i'*i2.var2#c.var1 + `i2'*i2.var2#c.var1#c.var1, eform(OR)
est sto var2_2_`i'[/INDENT]
}
Please let me know if this is even possible, and if so, how to do it.

Thanks,
Amanda
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

17 Dec 2017, 14:06

Originally posted by amandacpac View Post

...I cannot use margins because this is a case-control study)...

No, you can use margins. Where did you read that advice?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
amandacpac

Join Date: Sep 2014

Posts: 52
#3

17 Dec 2017, 15:33

Here is one example: https://www.statalist.org/forums/for...e-control-data.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

18 Dec 2017, 07:37

Originally posted by amandacpac View Post

Here is one example: https://www.statalist.org/forums/for...e-control-data.

I read through that post, as well as another related one.

Maarten (the post you linked) is correct that you should present only the odds ratios from a case-control study. Initially, I thought he was wrong to assume that you couldn't use the margins command at all. I recalled that the margins command gives you an option to calculate the margins of any specified expression (correctly specified, anyway!). Margins defaults to predicting the probability, but it can also predict xb, i.e. the linear prediction, aka log odds.

The problem is, it only predicts log odds, and not odds ratios. However, Richard Williams showed, in post 10, how to get margins to predict the log odds of whatever quantity you want divided by the log odds of the base category - i.e. the odds ratios. You clearly know how to use coefplot already. All you need is to correctly specify the margins command. Note that in Williams' command, he first ran the margins command to get the log odds, then he saved the log odds of the base category to a scalar, then he re-ran the same margins command dividing everything by the log odds of the scalar.

I hope this example correctly translates his code to your situation, since he used age groups and you are interested in age as continuous.

Code:

webuse nhanes2, clear logistic highbp i.sex##c.age margins, at(age = 30) expression(exp(predict(xb))) matrix b = r(b) scalar base = b[1,1] margins, at(age = (30(5)100)) expression((exp(predict(xb))/base)) marginsplot

Does that get you closer to what you want?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
amandacpac

Join Date: Sep 2014

Posts: 52
#5

18 Dec 2017, 14:37

Hmmm....It definitely gets me closer! So, since I have a quadratic term in the model, would the third line of code read something like: margins, at(age=30 age_sq = 900) expression(exp(predict(xb))) ? Also, I'm a little confused about the 'base' term. I see that you created the scalar for it, but, what exactly is that referencing? What would the referent group be in this case, age=0?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

18 Dec 2017, 15:15

Originally posted by amandacpac View Post

... since I have a quadratic term in the model, would the third line of code read something like: margins, at(age=30 age_sq = 900) expression(exp(predict(xb))) ?

No, you definitely do not need to specify margins that way. Because of the way you specified your logistic command, margins knows that you want age and age squared in the model, and it understands that when you reference just age in the margins.

Btw, do use code delimiters in replying, as it makes your code easier to reply; tap the # button in the control panel available when you post reply (it's towards the right side) and it will automatically generate code delimiters, or you can type them manually (type "CODE" and "/CODE", enclosed in square brackets, but no quotes, to start and end a block of code.

Originally posted by amandacpac View Post

Also, I'm a little confused about the 'base' term. I see that you created the scalar for it, but, what exactly is that referencing? What would the referent group be in this case, age=0?

If I got the command right, then the answer is no. By typing

Code:

margins, at(age = 30) expression(exp(predict(xb)))

I'm pretty sure I am asking -margins- to calculate the specified expression with age set to 30, and all other covariates as observed in the data. (You must specify at with a continuous covariate.) The -margins- command then saves the predicted quantity in a matrix called r(b). I saved r(b) in a matrix called b.

b[1,1] refers to row 1, column 1, of matrix b. In this case, there's only one row and one column, since I asked for only one prediction. Anyway, that is the predicted odds of whatever the outcome was at age of 30. Then, you save that as a scalar, which is like an invisible variable in Stata's memory that you can call anytime - including in asking margins to predict some sort of expression.

I forget what the base odds were, but in the subsequent margins command, you are dividing every predicted odds at 5 year age intervals by the odds at age = 30. That gets you the predicted odds ratios relative to age 30. Again, you're not presenting any base odds, or predicted probability - you're only presenting the odds ratio as you observed it.

PS, in my prior post, I misspoke. When you say

Code:

predict(xb)

, you ask -margins- to predict the log odds instead of its default (which is probability).

Code:

expression(exp(...))

asks -margins- to predict you e to the power of (...). e to the power of the log odds is odds. Then, you divide the odds of whatever predicted values you desired by the odds at the base value of age. I can't read your code clearly, but it looks like you wanted to predict the odds ratios with var2 = 100 to 1000, in steps of 100. Just change the values in my sample code as appropriate for your data/model.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

amandacpac

Join Date: Sep 2014
Posts: 52

18 Dec 2017, 16:12

So sorry about not using the delimiters. I wasn't sure how to, so, I tried indenting it. Apparently, that didn't work as I intended. Here is what the code should have looked like.

Code:

forvalues i = 100(100)1000{
local i2 = (`i')^2

logit dx c.var1##c.var1##i.var2, or
lincomest `i'*var1 + `i2'*c.var1#c.var1, eform(OR)
est sto var2_0_`i'

logit dx c.var1##c.var1##i.var2, or
lincomest `i'*var1 + `i2'*c.var1#c.var1 + `i'*i1.var2#c.var1 + `i2'*i1.var2#c.var1#c.var1, eform(OR)
est sto var2_1_`i'

logit dx c.var1##c.var1##i.var2, or
lincomest `i'*var1 + `i2'*c.var1#c.var1 + `i'*i2.var2#c.var1 + `i2'*i2.var2#c.var1#c.var1, eform(OR)
est sto var2_2_`i'
}

And no, I actually wanted to predict the odds ratios with VAR1 = 100 to 1000, in steps of 100 (not var2), but I wanted to do it separately for the three levels of the factor variables, var2, since I have an interaction with that variable. To do that with the margins command would I then just change the second margins command to include the r.var2, as follows:

Code:

logit dx c.var1##c.var1##i.var2, or
margins, at(var1 = 100) expression(exp(predict(xb)))
matrix b = r(b)
scalar base = b[1,1]
margins r.var2, at(var1 = (100(100)1000))
expression((exp(predict(xb))/base))
marginsplot

Thanks! Amanda

Comment

amandacpac

Join Date: Sep 2014

Posts: 52
#8

18 Dec 2017, 17:10

I tried the code listed above and adding in the 'r.var2' wasn't quite right because it only gave two lines representing the contrasts of the 2nd and 3rd levels of var2 vs level 1, so I tried including the var in the 'at()' expression as follows:

Code:

logit dx c.var1##c.var1##i.var2, or margins, at(var1 = 100) expression(exp(predict(xb))) matrix b = r(b) scalar base = b[1,1] margins, at(var1 = (100(100)1000) var2=(0(1)2)) expression((exp(predict(xb))/base)) marginsplot

This also doesn't appear to be totally correct because the Y axis on the plot ranges from -1 to 3. If these are ORs they shouldn't go below 0. I should note that when I take out the var2 out of that margins command, the scale of the y axis ranges from 0.5 to 2, though, I'm not sure if this actually fixed it, or if all the points just happen to be above 0. What I would like to plot are three lines (one for each level of Var2) denoting the ORs for a given value of var1 (ranging from 100 to 1000 in steps of 100). In this case, I am assuming that 100 will be the referent category of the OR, is that true?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

18 Dec 2017, 17:50

Originally posted by amandacpac View Post

I tried the code listed above and adding in the 'r.var2' wasn't quite right because it only gave two lines representing the contrasts of the 2nd and 3rd levels of var2 vs level 1, so I tried including the var in the 'at()' expression as follows:

Code:

logit dx c.var1##c.var1##i.var2, or margins, at(var1 = 100) expression(exp(predict(xb))) matrix b = r(b) scalar base = b[1,1] margins, at(var1 = (100(100)1000) var2=(0(1)2)) expression((exp(predict(xb))/base)) marginsplot

This also doesn't appear to be totally correct because the Y axis on the plot ranges from -1 to 3. If these are ORs they shouldn't go below 0. I should note that when I take out the var2 out of that margins command, the scale of the y axis ranges from 0.5 to 2, though, I'm not sure if this actually fixed it, or if all the points just happen to be above 0. What I would like to plot are three lines (one for each level of Var2) denoting the ORs for a given value of var1 (ranging from 100 to 1000 in steps of 100). In this case, I am assuming that 100 will be the referent category of the OR, is that true?

That's interesting that the y axis goes below 0. You're correct that ORs should certainly not go below 0.

One other thing to note is that actually, the base case you selected is var1 = 100 and var2 = whatever distribution it had in your sample. I think you need to add to the -at- expression for the base case:

Code:

margins var2, at(var1 = 100) expression(exp(predict(xb))) matrix b = r(b) scalar base = b[1,1]

Let's try with NHANES2, but there's one issue.

Code:

logistic highbp i.sex##c.age##c.age margins sex, at(age = 30) expression(exp(predict(xb))) matrix b = r(b) margins sex, at(age = (30(5)90)) expression(exp(predict(xb))/b[1,1]) marginsplot

Do the above and you get the odds ratios using males age 30 as the base category, in 5 year increments of age starting at 30. The problem is, the entire line for females is the odds ratios relative to males age 30. I am not sure if that was what you wanted! b[1,2] would get you the base odds for women age 30.

You mentioned you were familiar with -coefplot- (on SSC, written by Ben Jann), and I am pretty sure you can do this in coefplot separate calls to margins, then save the estimates from each. See if you can run that? I don't have the time to re-familiarize myself with the coefplot syntax right this minute.

Last edited by Weiwen Ng; 18 Dec 2017, 17:59.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
amandacpac

Join Date: Sep 2014

Posts: 52
#10

27 Feb 2018, 20:16

I got this code to work by running it twice. Using the NHANES example above, I ran both of the margins statements with at(sex=0), and then reran everything, changing the margins statement to at(sex=1). I then saved those results and plotted in coefplot.

HOWEVER, when I did this with my data, I noticed that I had a couple of negative lower CI values. This obviously shouldn't be the case, since we are using ORs. I thought this might be an issue with my data (I have some smaller sample sizes), so I went back and messed around with the NHANES data example to double check, and when I run a lincom statement for a given value for age, I get the same OR that is given in the second margins output, however, the CIs are different. Any idea why this might be?
Comment

amandacpac

Join Date: Sep 2014
Posts: 52

#11

09 Mar 2018, 14:41

Here is the code I used with the NHANES2 dataset.

Code:

gen height_cent =height - 170 if strata==1

*Females
logit highbp i.female##c.height_cent##c.height_cent if strata==1, or
margins , at(height_cent = 0 female=1) expression(exp(predict(xb)))
matrix b = r(b)
margins, at(height_cent = (-30(5)30) female=1) expression(exp(predict(xb))/b[1,1])

*Males
logit highbp i.female##c.height_cent##c.height_cent if strata==1, or
margins , at(height_cent = 0 female=0) expression(exp(predict(xb)))
matrix b = r(b)
margins, at(height_cent = (-30(5)30) female=0) expression(exp(predict(xb))/b[1,1])

Last edited by amandacpac; 09 Mar 2018, 14:53.

Comment

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#12

09 Mar 2018, 15:10

In principle, I think you could switch to predict(xb), to predict log odds, then use coefplot and tell it to exponentiate the result to coefficients. I haven't tested your code for typos or logic, but something like this should work:

Code:

webuse nhanes2, clear gen height_cent =height - 170 if strata==1 logit highbp i.female##c.height_cent##c.height_cent if strata==1, or margins , at(height_cent = 0 female=1) expression(predict(xb)) matrix b = r(b) margins, at(height = (-30(5)30) female=1) expression((predict(xb))-b[1,1]) post coefplot, at eform

Since you're working in log-odds, the odds ratio should be the difference in log odds. It looks like you're calculating the odds ratios versus mean height. Seems correct. You now effectively have logit-transformed confidence intervals - mind you, not everyone understands what those are, but I think this is a correct approach. If I missed something, which I may well have, I hope to be corrected.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
amandacpac

Join Date: Sep 2014

Posts: 52
#13

10 Mar 2018, 09:18

Ok, so, that is getting me close, but, the CLs still do not seem to be correct. I'm going to use female=0 as an example, since it is a little easier to verify, so, the code is:

Code:

webuse nhanes2, clear gen height_cent =height - 170 if strata==1 logit highbp i.female##c.height_cent##c.height_cent if strata==1, or margins , at(height_cent = 0 female=0) expression(predict(xb)) matrix b = r(b) margins, at(height_cent = (-30(5)30) female=0) expression((predict(xb))-b[1,1]) post coefplot, at eform

As an example, let's look at height_cent = 5. The Beta (95% CI) from the 2nd margins command is -.0926283 (-.4382235, .252967). However, when I run the following lincom command:

Code:

quietly logit highbp i.female##c.height_cent##c.height_cent if strata==1, or lincom 5*c.height_cent + 25*c.height_cent#c.height_cent

the Beta (95% CI) is -.0926283 (-.275966, .0907095). So, the Beta is the same, but, the CLs are not. Any thoughts on why this might be??
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#14

12 Mar 2018, 12:39

Originally posted by amandacpac View Post

Ok, so, that is getting me close, but, the CLs still do not seem to be correct. I'm going to use female=0 as an example, since it is a little easier to verify, so, the code is:

Code:

webuse nhanes2, clear gen height_cent =height - 170 if strata==1 logit highbp i.female##c.height_cent##c.height_cent if strata==1, or margins , at(height_cent = 0 female=0) expression(predict(xb)) matrix b = r(b) margins, at(height_cent = (-30(5)30) female=0) expression((predict(xb))-b[1,1]) post coefplot, at eform

As an example, let's look at height_cent = 5. The Beta (95% CI) from the 2nd margins command is -.0926283 (-.4382235, .252967). However, when I run the following lincom command:

Code:

quietly logit highbp i.female##c.height_cent##c.height_cent if strata==1, or lincom 5*c.height_cent + 25*c.height_cent#c.height_cent

the Beta (95% CI) is -.0926283 (-.275966, .0907095). So, the Beta is the same, but, the CLs are not. Any thoughts on why this might be??

I would welcome others' insights into why the SEs are different. I bet it's that a different method is used to compute the SE in each command. But I do not know enough to comment why this would make a difference.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4955
#15

14 Mar 2018, 07:46

I am not sure how to explain the discrepancy. Margins is giving you the adjusted predictions given the other X values. lincom is giving you the standard errors for a linear combo of coefficients. Different means are being used to calculate the standard errors. This seems like an unusual way to use lincom, i.e. plugging in specific values for X. Personally I would not worry much about it! Just pick one or the other approach and be consistent. My guess is that the CIs from both are correct but are testing different things or using different methods to get the SEs.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement