Adjusting for confounders when finding Odds ratio

Tia Denek

Join Date: Jan 2021

Posts: 17
#1

Adjusting for confounders when finding Odds ratio

05 Jan 2021, 16:50

Hello, I'm trying to find make an odds ratio table. My dependent variable is a binary 0,1 (1 being hypertensive and 0 not). and i have the following independent categorical variables : age (3 categories), sex, education level (4 levels), and BMI (3 categories). I also have the following independent binary variables INT (has access to internet/not); SMOK (smokes/not); EXER (exercises or not) and ALCOL (drinks alcol or not).

I wrote the following code

logit HYP i.AGE i.SEX i.EDU i.BMI INT SMOK EXER ALCOL

logit , or

However, i know that there may be potential confounders affecting my result. How do I know which ones to account for and how do i account for them to find more accurate odds ratio. Does finding adjusted odds ratio consider confounders or will I have to find and account for each confounder?

Thank you in advance
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29818
#2

05 Jan 2021, 17:22

Well, a confounder is, by definition, a variable that is associated with both the outcome of interest (HYP) and the predictor of interest. As you don't specify which predictor(s) here are of interest, and which are being included in your model solely to adjust for their effects as confounders, it is not possible to say what other confounders might be lurking. I will say this: if your study is being carried out in a multi-racial population, then an indicator for black race is probably an important confounder. It is definitely associated with hypertension (greater in blacks than non-blacks) and is definitely also associated with age, education, bmi, smoking, and probably internet access as well.

All of that said, no observational study can ever be guaranteed free of confounders. The best you can do is include as many as you reasonably can given the size of your data set. Or using a study design that is somewhat robust to confounders such as a difference-in-differences analysis (not applicable to your situation, I think.) Apart from perhaps having overlooked black race, I don't see any other obvious, important omissions.

Frankly, the bigger worry for validity of your study, in my opinion, is that you are working with data of degraded quality. The use of categorical variables for age and BMI is really inexcusable, unless the actual age and BMI values were unavailable. Moreover, while the concept of hypertension is necessary for clinical decision making purposes--some people will be treated, and others will not--it is a poor quality variable as well, and to gain an understanding of blood pressure it is far better to use the continuous blood pressure measurement itself as the outcome variable. Even the education, smoking, exercise, and alcohol variables would be better as actual quantities if the information is available. It is seldom a good idea to use a categorical variable in a statistical analysis if the underlying construct is really continuous. Turning continuous variables into categories is only helpful if something truly discrete and qualitative happens across the boundaries of the categories--which in health and medicine is almost never.

Last edited by Clyde Schechter; 05 Jan 2021, 17:24.
1 like
Comment
Tia Denek

Join Date: Jan 2021

Posts: 17
#3

05 Jan 2021, 18:00

Thank you so much for the thorough and quick response. Yes, I do agree that turning continuous variables into categorical will definitely affect the quality of my data. I just wanted to be able to determine if being of a certain age group/ bmi group affected the odds of HTN. I do have the continuous values of BMI and AGE and systolic and diastolic BP and just did

logit SYS AGE BMI i.SEX i.EDU INT SMOK EXER ALCOL

logit DIA AGE BMI i.SEX i.EDU INT SMOK EXER ALCOL

logit, or

and i got statistically significant numbers for BMI and SMOK which i hadnt before (it was only AGE before), which is great.

I have one more question, however. You mentioned that aside from race, there isn't much else to adjust for. I was wondering if that's the case, then in what situation would one make an Adjusted Odds ratio table along side a crude one like mine. I don't have a particular independent variable of interest. I am trying to see the odds ratio for each ind. variable and find which ones are statistically significant so that it may perhaps suggest something.

Thank you again
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29818
#4

05 Jan 2021, 20:08

logit SYS AGE BMI i.SEX i.EDU INT SMOK EXER ALCOL

logit DIA AGE BMI i.SEX i.EDU INT SMOK EXER ALCOL

This is wrong. You can't use systolic and diastolic blood pressures (or any other continuous variables) as outcomes in a logistic regression. The results you get will be nonsense. You have to do this in a linear regression, -regress-, not -logit-. And the results will not be odds ratios.

Returning to your initial goal of creating tables of odds ratios, the odds ratios you get from the logistic regression are adjusted odds ratios. Each of those odds ratios is adjusted for the effects of all the other variables in the -logit- command. Crude odds ratios would be obtained from the cross-product ratio in the output of a cross-tabulation, or, more simply, by doing logistic regressions of HTN on each of those predictor variables separately.

...and find which ones are statistically significant so that it may perhaps suggest something.

Statistical significance is a really poor way to decide what is being suggested. Look at the odds ratios themselves. Are they large enough to be meaningful? If your sample is very large, even tiny, clearly unimportant odds ratios can be statistically significance. If your data sample is small, the opposite can happen: meaningful associations will fail to be statistically significant. So look at the odds ratios themselves and ignore the p-values. Pick a cutoff odds ratio that you would consider large enough to be meaningful (and also on the flip side look at odds ratios smaller than the reciprocal of the same odds ratio.)
Comment
Tia Denek

Join Date: Jan 2021

Posts: 17
#5

06 Jan 2021, 02:56

ok, will do. Thank you for the guidance
Comment
Lishi Deng

Join Date: Apr 2022

Posts: 4
#6

28 Apr 2022, 10:34

Hi Clyde, quite useful conversation for me.
I saw your answer that "if the outcome is continuous variables, use linear regression model 'reg' instead of 'logit'. and the result will not be odds ratio." so the result of linear regression model is coefficient?

Could you tell me if my understanding is correct or not:
I am going to use linear regression model to verify confounders, 1) I run linear regression model without the potential confounder, I got the coefficient of group is A; 2) I added the confounder into the model and run the regression model again, I got the coefficient of group is B; 3) I calculated (B-A)/A, if >10% I verified it as confounder; 4) if the p value in the model of step is < 0.05, I verified it as interactions.

Thank you in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29818
#7

28 Apr 2022, 12:24

Agree with steps 1 and 2. Agree with calculating (B-A)/A. As for the 10% cutoff, that depends on whether a difference up to 10% is acceptable, which, in turn, depends on details of your context.

As for step 4), you don't say which step's p-value you are referring to. But, regardless, I do not endorse the use of p-values for determining whether a variable is a confounder: confounding is a sample-level phenomenon. p-values make claims about the population level, not the sample. So p-values, to the extent they are of any use, are of no use in deciding when you have a confounder. And nothing in steps 1 to 3 says anything to do with interactions.
Comment
Lishi Deng

Join Date: Apr 2022

Posts: 4
#8

30 Apr 2022, 05:39

Originally posted by Clyde Schechter View Post

Agree with steps 1 and 2. Agree with calculating (B-A)/A. As for the 10% cutoff, that depends on whether a difference up to 10% is acceptable, which, in turn, depends on details of your context.

As for step 4), you don't say which step's p-value you are referring to. But, regardless, I do not endorse the use of p-values for determining whether a variable is a confounder: confounding is a sample-level phenomenon. p-values make claims about the population level, not the sample. So p-values, to the extent they are of any use, are of no use in deciding when you have a confounder. And nothing in steps 1 to 3 says anything to do with interactions.

Thank you Clyde for your quick response.
To clarify my question:
The p-value I used to verify the effect modifier (interaction) here is the p-value of the coefficient of new variable in the model in step 2).
I didn't use P-value to verify confounders. I verify those (B-A)/A > 10% as confounders.

Thank you again for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29818
#9

30 Apr 2022, 11:51

Thank you for the clarification.

Step 4 is not correct. You have not introduced any interaction term into the model. You just added the possible confounder into the model. So there is nothing at all that can be said about interactions. As for what the p-value of the coefficient of the confounder means, I would say that it should just be ignored. In other contexts, for people who believe in p-values, it could have some meaning. But in the current context where the variable is introduced only because it is a possible confounder, it means nothing at all.
Comment
Lishi Deng

Join Date: Apr 2022

Posts: 4
#10

01 May 2022, 09:48

Originally posted by Clyde Schechter View Post

Thank you for the clarification.

Step 4 is not correct. You have not introduced any interaction term into the model. You just added the possible confounder into the model. So there is nothing at all that can be said about interactions. As for what the p-value of the coefficient of the confounder means, I would say that it should just be ignored. In other contexts, for people who believe in p-values, it could have some meaning. But in the current context where the variable is introduced only because it is a possible confounder, it means nothing at all.

Thank you very much for your help!

I started learning stata in these several months. have more questions regarding to this topic.

For example, my study was conducted in malnourished children with three different phenotype (A, B, C). I want to see how (and compare) the leptin level change from at admission to 2 weeks into nutritional rehabilitation (giving therapeutic food as treatment) in three groups. Group is my independent variable, leptin level is the dependent variable. Except leptin level, I have many other continuous and binary dependent variables. I used quintile regression model for continuous variables (not normal distributed), and logit model for categorical variables.
In this study, gge, country and sex are not equal in three groups, so they are potential confounders or effect modifiers.
Question 1: I used step 1-3 to identify confounders, and later I added them into the new model. But after adding them into new model, if coefficients of those new added variables are > 0.05, can I drop them from the new model (because they do not have effect in the outcome)?
Question 2: I was thinking to use step 4 to identify the effect modifiers. Do you have idea how can I identify them in my case?

Thank you Clyde.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29818
#11

01 May 2022, 11:49

There's a lot to unpack here.

I used quintile regression model for continuous variables (not normal distributed), and logit model for categorical variables.

There is no requirement of any statistical procedure that an outcome variable have a normal distribution. This is a widespread myth. It probably arises from misunderstanding the fact that in some circumstances, we rely on a normal distribution of the residuals to support statistical inference in linear regression. Even there, however, the normality assumption is of very little importance. If the sample is large, it does not matter at all. In small samples, it may matter, but it is also impossible to really identify normality or non-normality in small samples. (Moreover if your sample is small, you will not be in a position to adjust for many confounders.) I'm not saying a quintile regression is necessarily wrong, but if your only reason for doing it is concerns about normality, that is not a good reason.

In this study, gge, country and sex are not equal in three groups, so they are potential confounders or effect modifiers.

I don't know what gge stands for? Is is a typo for age? They are, indeed, potential confounders given their unequal distributions in the three groups, so it is important to see if they are also associated with the outcome variables, and, if so, you will want to adjust for them. But equality of distribution in the three groups has nothing to do with effect modification.

Effect modification is an entirely separate phenomenon and can occur whether the variable is equally distributed in the three groups or not. Effect modification by a variable V means that the differences in outcome among the groups depends on the value of V. It means that there is no unique "effect of group" on the outcome, rather there are many effects, which are functions of the value of V. Effect modification is dealt with by including a V#group interaction in the model (which is best done, in Stata, using V##i.group--and remember to prefix V with c. or i. according to whether it is continuous or discrete). The problem is that as you start throwing interactions in the model, they eat up degrees of freedom and the number of explanatory variables becomes too large for the sample size. In that situation you begin to overfit model noise and end up with results that will not replicate. So it is important to be conservative about adding effect modification into your model. Do it only where prior information and theory give you a firm basis for believing it exists and is large enough to matter for practical purposes.

Question 1: I used step 1-3 to identify confounders, and later I added them into the new model. But after adding them into new model, if coefficients of those new added variables are > 0.05, can I drop them from the new model (because they do not have effect in the outcome)?

No, you cannot drop them. On the basis of steps 1-3 you have already identified that their inclusion in the model changes your estimate of the effect of group on outcome by an amount you consider large enough to be meaningful. Therefore they must remain in the model. The magnitude of the coefficient of the confounder, and its p-value, have nothing to do with this. All that matters is the change in the group coefficient(s) that you ascertained in steps 1-3.

Also, it is simply not true that a p value > 0.05 means there is no effect on the outcome! That is another myth that is propagated by the widespread bad teaching of statistical inference. Again, even for people who take p-values seriously, properly understood, a p-value > 0.05 means only that you cannot exclude the possibility that there is zero effect, it does not mean that you have established the absence of any effect. But again, whether you can establish strong evidence of an effect on the outcome in a population is irrelevant to the issue of confounding: all that matters is the effect in the sample you are working with, and that you have already dealt with definitively in steps 1-3.

Question 2: I was thinking to use step 4 to identify the effect modifiers. Do you have idea how can I identify them in my case?

I have already said a bit about this earlier in this post. The only way to identify effect modifiers is to include interactions in the model and see if the interaction coefficients are large enough to matter for practical purposes. But, as I pointed out, as you add more and more variables to the model (and interactions tend to grow explosively!) you degrade the ability of the model to separate signal from noise, so this needs to be done cautiously. My best advice is to step away from the computer and go back to pencil and paper. Make a list of the variables you think might be effect modifiers and then go back to the literature to see whether there is prior evidence supporting this, or a strong theoretical reason for believing that there really would be effect modification. For those that seem likely to be important, try including them in the model and see whether you get large interaction coefficients.
1 like
Comment
Lishi Deng

Join Date: Apr 2022

Posts: 4
#12

02 May 2022, 02:22

Originally posted by Clyde Schechter View Post

There's a lot to unpack here. ......For those that seem likely to be important, try including them in the model and see whether you get large interaction coefficients.

Thank you very much for the explanation and guidance. That is quite useful for me.
Comment

Announcement

Adjusting for confounders when finding Odds ratio

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment