Stratified analysis or statistical measures of co-linearity

Joe Tuckles

Join Date: Jul 2018

Posts: 180
#1

Stratified analysis or statistical measures of co-linearity

06 Nov 2019, 03:37

Hello,

I have a sample of 1239 participants. My exposure variable is binary (1 = yes, 0 = no), and this is experience of psychosis (168 said yes). I have a number of outcomes, and I plan to use separate logistic regression models within a generalised estimating equations (GEEs) to get odds ratio for each outcome (example of such outcomes are smoking, alcohol, BMI etc). Most outcomes are binary.
I have two covariates that I do not think are appropriate to simply adjust for (binary - depression score and anxiety score). I am wondering how best to know how to proceed. Do I do a stratified analyses by looking at whether my exposure is associated with any of my outcomes within the depressed group and/ anxious group? How do I find out whether I have power for this? Or do I do a test of co-linearity?
Thanks

Last edited by Joe Tuckles; 06 Nov 2019, 03:40.
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

06 Nov 2019, 04:08

If I understood right, maybe - gsem - can do the trick.

Best regards,

Marcos
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#3

06 Nov 2019, 12:23

Joe,

I don't think this question has a simple answer. It requires more information and a fair amount of calculation. Here are some general thoughts that may be helpful.

First, though it's not what you asked about, I don't see how you're going to get an odds ratio for a BMI outcome, as BMI is a continuous variable. Perhaps you plan to dichotomize it into obese/not obese or overweight and obese/normal and underweight. I don't recommend that. The BMI cutoffs that define these categories are simply convenient round numbers. As far as I am aware, every health consequence associated with body mass index varies with BMI continuously and nothing discrete happens when you cross one of those cutoff boundaries. Putting cutoffs on continuous variables simply discards information, and sometimes also introduces bias. So I avoid it nearly all the time.

As for the question of stratification vs adjustment, the advantage of stratification is that you get results that are specific to each stratum in all respects. The drawback is that the sample size for at least one of the strata will be at most half as large as your total sample. And as you are starting out with only 168 exposures to psychosis, that means one of the strata will have 84 or fewer such exposures--and perhaps far worse than that. For that reason, I'd probably be inclined not to stratify. But you really need to do power calculations based on the actual breakdown of the numbers in the different strata: it may be that you don't have a problem depending on what size effects you need to detect and how the psychosis exposures distribute themselves among the depressed/non-depressed and anxious/non-anxious.

Assuming that the stratified analyses will be underpowered, however, a reasonably good alternative is to do adjusted analysis using interaction terms to help you get separate effect estimates. While you still have the reality that some subsets may have only a very small number of psychosis exposures, this type of approach "borrows from strength" and gives you a bit better precision on the estimates. So you could end up with a model that looks something like this:

Code:

logit outcome_variable i.psychosis##i.depressed##i.anxious // AND PERHAPS OTHER COVARIATES TO ADJUST FOR margins depressed#anxious, dydx(psychosis)

which would give you estimates of the psychosis effect (not as an odds ratio but as a risk difference, which is, in my view, better) in all four combinations of depressed or not with anxious or not.

Now one of the limitations of this approach, if you do adjust for other covariates, is that it constrains the coefficients of all the other covariates to be the same in all four groups--which may or may not be a realistic constraint to impose. One can get around that by including the covariates in the interactions as well.

If you are not familiar with the ## notation, read about it in -help fvvarlist-. If you are not familiar with interactions and the -margins- command, the simplest introduction I know of is the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf.
Comment
Joe Tuckles

Join Date: Jul 2018

Posts: 180
#4

06 Nov 2019, 13:17

Dear Clyde,

Thank you so much this is exactly what I needed!

I like your suggestion of using interaction terms. I wanted to clarify that I will be controlling for other covariates - age, gender and socioeconomic status for sure. Can I ask how do I go about including these in the interactions as well?
I was planning to categorise BMI as obese/ overweight/normal etc. But I am able to use it as a continuous variable as well.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#5

06 Nov 2019, 13:44

Like this:

Code:

logit outcome_variable i.psychosis##i.depressed##i.anxious##(c.age i.gender i.ses) margins depressed#anxious, dydx(psychosis)

Now, there are some limits to how far you can push this. Each additional covariate expands the number of regressors in the model, and eventually you can end up with too many regressors for the model to give meaningful estimates. But if you are starting from 1239 and you don't lose too many of those to missing data, you should still be ok. The output of -margins- will, in this case, give you the four groups' outcome risk differences associated with psychosis exposure, adjusted for the differences in age, gender, and ses.
Comment
Joe Tuckles

Join Date: Jul 2018

Posts: 180
#6

06 Nov 2019, 15:33

Thank you that is very helpful. Would I be able to use

Code:

xtgee

instead of logit or not in this case?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#7

06 Nov 2019, 15:56

Oh, yes. The same approach is viable with any linear model estimator.
Comment
Joe Tuckles

Join Date: Jul 2018

Posts: 180
#8

12 Nov 2019, 06:12

Dear Clyde,

Thank you for your help with this. Can I clarify - when presenting the findings in a table will I be presenting hazard ratios and confidence intervals to give estimates of the psychosis effect as a risk difference?

Is there a way to have just one table for example:

Last edited by Joe Tuckles; 12 Nov 2019, 07:04.
Comment

Joe Tuckles

Join Date: Jul 2018
Posts: 180

12 Nov 2019, 07:05

	Risk of psychosis	Risk of psychosis	Risk of psychosis	Risk of psychosis
	No depressive symptoms	Presence of depressive symptoms	No anxiety symptoms	Presence of anxiety symptoms
	Adjusted HR^a (95% CI)	Adjusted HR^a (95% CI)	Adjusted HR^a (95% CI)	Adjusted HR^a (95% CI)
BMI	ref	ref	ref	ref
Ever tried cannabis	ref	ref	ref	ref
Ever smoked	ref	ref	ref	ref

^aAdjusted for SES, gender, age

I am not sure if this is right because my outcomes are BMI, cannabis, smoking etc etc, and my exposure is psychosis. But ideally I just want one table showing all the different outcomes, not dozens of tables per outcome

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#10

13 Nov 2019, 07:59

The layout of the table looks pretty good. I would combine all the cells in the first row to a single cell, since they all just say the same thing. Similarly, I don't think you need a row of cells all saying Adjusted HR (95% CI). That can be said just once, perhaps even incorporated into the title of the table. Then the rectangular array of one outcome per row and one exposure level per column makes good sense.

That said, I don't understand how you are getting hazard ratios for these outcome variables. While I suppose it is possible to apply survival analysis techniques to these outcomes, as they are all non-negative, it would be very unusual and people will probably struggle with figuring out what it means. For something like BMI I would expect to see risk differences, and for ever tried cannabis and ever smoked, I would expect to see either risk ratios or odds ratios.
Comment

Joe Tuckles

Join Date: Jul 2018
Posts: 180

#11

13 Nov 2019, 08:08

Thanks so much! Yes I mucked up with the hazard ratios and it should say risk differences!

I'm wondering if doing two tables makes more sense/makes it clearer such as this:

	Psychosis without depression		Psychosis with depression
	Crude risk difference (95% CI)	Adjusted risk difference (95% CI)	Crude risk difference (95% CI)	Adjusted risk difference (95% CI)
BMI
Ever tried cannabis
Ever smoked

.
.

	Psychosis without anxiety		Psychosis with anxiety
	Crude risk difference (95% CI)	Adjusted risk difference (95% CI)	Crude risk difference (95% CI)	Adjusted risk difference (95% CI)
BMI
Ever tried cannabis
Ever smoked

However I am not sure what to do given I need to show both risk differences and either odds ratios/risk ratios for the binary outcomes!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#12

13 Nov 2019, 08:22

Yes, I think with this much information to be displayed, two tables like that would be better.

The problem of some results being risk differences and others being odds ratios is resolved by changing the column headers to read "Crude risk difference/ odds ratio (95% CI)" and analogous change for the Adjusted columns. Then in the row stubs, you can indicate which it is in parentheses, e.g. "BMI (risk difference)" and "Ever tried cannabs (odds ratio)" etc.
Comment
Joe Tuckles

Join Date: Jul 2018

Posts: 180
#13

13 Nov 2019, 08:26

Fantastic. your help is so valuable, I really appreciate it. Thank you!
Comment

Announcement