Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stratified analysis or statistical measures of co-linearity

    Hello,

    I have a sample of 1239 participants. My exposure variable is binary (1 = yes, 0 = no), and this is experience of psychosis (168 said yes). I have a number of outcomes, and I plan to use separate logistic regression models within a generalised estimating equations (GEEs) to get odds ratio for each outcome (example of such outcomes are smoking, alcohol, BMI etc). Most outcomes are binary.
    I have two covariates that I do not think are appropriate to simply adjust for (binary - depression score and anxiety score). I am wondering how best to know how to proceed. Do I do a stratified analyses by looking at whether my exposure is associated with any of my outcomes within the depressed group and/ anxious group? How do I find out whether I have power for this? Or do I do a test of co-linearity?
    Thanks
    Last edited by Joe Tuckles; 06 Nov 2019, 04:40.

  • #2
    If I understood right, maybe - gsem - can do the trick.
    Best regards,

    Marcos

    Comment


    • #3
      Joe,

      I don't think this question has a simple answer. It requires more information and a fair amount of calculation. Here are some general thoughts that may be helpful.

      First, though it's not what you asked about, I don't see how you're going to get an odds ratio for a BMI outcome, as BMI is a continuous variable. Perhaps you plan to dichotomize it into obese/not obese or overweight and obese/normal and underweight. I don't recommend that. The BMI cutoffs that define these categories are simply convenient round numbers. As far as I am aware, every health consequence associated with body mass index varies with BMI continuously and nothing discrete happens when you cross one of those cutoff boundaries. Putting cutoffs on continuous variables simply discards information, and sometimes also introduces bias. So I avoid it nearly all the time.

      As for the question of stratification vs adjustment, the advantage of stratification is that you get results that are specific to each stratum in all respects. The drawback is that the sample size for at least one of the strata will be at most half as large as your total sample. And as you are starting out with only 168 exposures to psychosis, that means one of the strata will have 84 or fewer such exposures--and perhaps far worse than that. For that reason, I'd probably be inclined not to stratify. But you really need to do power calculations based on the actual breakdown of the numbers in the different strata: it may be that you don't have a problem depending on what size effects you need to detect and how the psychosis exposures distribute themselves among the depressed/non-depressed and anxious/non-anxious.

      Assuming that the stratified analyses will be underpowered, however, a reasonably good alternative is to do adjusted analysis using interaction terms to help you get separate effect estimates. While you still have the reality that some subsets may have only a very small number of psychosis exposures, this type of approach "borrows from strength" and gives you a bit better precision on the estimates. So you could end up with a model that looks something like this:

      Code:
      logit outcome_variable i.psychosis##i.depressed##i.anxious // AND PERHAPS OTHER COVARIATES TO ADJUST FOR
      margins depressed#anxious, dydx(psychosis)
      which would give you estimates of the psychosis effect (not as an odds ratio but as a risk difference, which is, in my view, better) in all four combinations of depressed or not with anxious or not.

      Now one of the limitations of this approach, if you do adjust for other covariates, is that it constrains the coefficients of all the other covariates to be the same in all four groups--which may or may not be a realistic constraint to impose. One can get around that by including the covariates in the interactions as well.

      If you are not familiar with the ## notation, read about it in -help fvvarlist-. If you are not familiar with interactions and the -margins- command, the simplest introduction I know of is the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf.


      Comment


      • #4
        Dear Clyde,

        Thank you so much this is exactly what I needed!

        I like your suggestion of using interaction terms. I wanted to clarify that I will be controlling for other covariates - age, gender and socioeconomic status for sure. Can I ask how do I go about including these in the interactions as well?
        I was planning to categorise BMI as obese/ overweight/normal etc. But I am able to use it as a continuous variable as well.

        Comment


        • #5
          Like this:

          Code:
          logit outcome_variable i.psychosis##i.depressed##i.anxious##(c.age i.gender i.ses)
          margins depressed#anxious, dydx(psychosis)
          Now, there are some limits to how far you can push this. Each additional covariate expands the number of regressors in the model, and eventually you can end up with too many regressors for the model to give meaningful estimates. But if you are starting from 1239 and you don't lose too many of those to missing data, you should still be ok. The output of -margins- will, in this case, give you the four groups' outcome risk differences associated with psychosis exposure, adjusted for the differences in age, gender, and ses.

          Comment


          • #6
            Thank you that is very helpful. Would I be able to use
            Code:
            xtgee
            instead of logit or not in this case?

            Comment


            • #7
              Oh, yes. The same approach is viable with any linear model estimator.

              Comment


              • #8
                Dear Clyde,

                Thank you for your help with this. Can I clarify - when presenting the findings in a table will I be presenting hazard ratios and confidence intervals to give estimates of the psychosis effect as a risk difference?

                Is there a way to have just one table for example:

                Last edited by Joe Tuckles; 12 Nov 2019, 08:04.

                Comment


                • #9
                  Risk of psychosis Risk of psychosis Risk of psychosis Risk of psychosis
                  No depressive symptoms Presence of depressive symptoms No anxiety symptoms Presence of anxiety symptoms
                  Adjusted HRa
                  (95% CI)
                  Adjusted HRa
                  (95% CI)
                  Adjusted HRa
                  (95% CI)
                  Adjusted HRa
                  (95% CI)
                  BMI ref ref ref ref
                  Ever tried cannabis ref ref ref ref
                  Ever smoked ref ref ref ref
                  aAdjusted for SES, gender, age


                  I am not sure if this is right because my outcomes are BMI, cannabis, smoking etc etc, and my exposure is psychosis. But ideally I just want one table showing all the different outcomes, not dozens of tables per outcome

                  Comment


                  • #10
                    The layout of the table looks pretty good. I would combine all the cells in the first row to a single cell, since they all just say the same thing. Similarly, I don't think you need a row of cells all saying Adjusted HR (95% CI). That can be said just once, perhaps even incorporated into the title of the table. Then the rectangular array of one outcome per row and one exposure level per column makes good sense.

                    That said, I don't understand how you are getting hazard ratios for these outcome variables. While I suppose it is possible to apply survival analysis techniques to these outcomes, as they are all non-negative, it would be very unusual and people will probably struggle with figuring out what it means. For something like BMI I would expect to see risk differences, and for ever tried cannabis and ever smoked, I would expect to see either risk ratios or odds ratios.

                    Comment


                    • #11
                      Thanks so much! Yes I mucked up with the hazard ratios and it should say risk differences!

                      I'm wondering if doing two tables makes more sense/makes it clearer such as this:
                      Psychosis without depression Psychosis with depression
                      Crude risk difference (95% CI) Adjusted risk difference (95% CI) Crude risk difference (95% CI) Adjusted risk difference (95% CI)
                      BMI
                      Ever tried cannabis
                      Ever smoked
                      .
                      .
                      Psychosis without anxiety Psychosis with anxiety
                      Crude risk difference (95% CI) Adjusted risk difference (95% CI) Crude risk difference (95% CI) Adjusted risk difference (95% CI)
                      BMI
                      Ever tried cannabis
                      Ever smoked
                      However I am not sure what to do given I need to show both risk differences and either odds ratios/risk ratios for the binary outcomes!

                      Comment


                      • #12
                        Yes, I think with this much information to be displayed, two tables like that would be better.

                        The problem of some results being risk differences and others being odds ratios is resolved by changing the column headers to read "Crude risk difference/ odds ratio (95% CI)" and analogous change for the Adjusted columns. Then in the row stubs, you can indicate which it is in parentheses, e.g. "BMI (risk difference)" and "Ever tried cannabs (odds ratio)" etc.

                        Comment


                        • #13
                          Fantastic. your help is so valuable, I really appreciate it. Thank you!

                          Comment

                          Working...
                          X