Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Significant association in entire cohort, but not upon subgroup analyses? Help with interpretation

    Hello! I am evaluating the association of "treatment" with "outcome". My cohort consists of patients "typeA" and "typeB".

    I originally ran a multivariable logistic regression adjusting for relevant covariates, with the output below:

    Code:
    -----------------------------------------------------------------------------------------
            outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------+----------------------------------------------------------------
                       typeA |   .5341908   .0165037   -20.29   0.000      .502804    .5675368
                treatment |   1.301675   .0708633     4.84   0.000     1.169938    1.448245
    I interpreted this to mean that treatment is associated with reduced odds of outcome, while typeA is linked with greater odds of outcome.

    However, I decided to run this regression in subgroup analysis of only typeA OR typeB patients. Among typeA patients, my results are comparable as what I see in the entire cohort:
    Code:
    -----------------------------------------------------------------------------------------
            outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------+----------------------------------------------------------------
                       typeA |          1  (omitted)
                treatment |   1.354458    .083596     4.92   0.000     1.200135    1.528625

    However, among patients with typeB, I see the following output:
    Code:
    -----------------------------------------------------------------------------------------
            outcome| Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------+----------------------------------------------------------------
                       typeB |          1  (omitted)
                treatment |   1.146505   .1349509     1.16   0.245     .9102989    1.444002
    I am confused how to interpret this. My subgroup analysis among typeB patients appears to suggest there is no statistically significant treatment effect. However, when I look at the entire cohort and independently adjust for type and treatment, both remain linked with reduced odds of outcome. Further, the interaction term between type and treatment is not significant.

    If anyone has guidance on how I can best interpret and approach this problem, I would be greatly appreciative. Thank you!





  • #2
    I interpreted this to mean that treatment is associated with reduced odds of outcome, while typeA is linked with greater odds of outcome.
    No, this is exactly backwards. An OR > 1 means increased odds of outcome, and an OR < 1 means decreased odds of outcome.

    However, I decided to run this regression in subgroup analysis of only typeA OR typeB patients. Among typeA patients, my results are comparable as what I see in the entire cohort:
    This subgroup analysis is irrelevant to the previous question.

    You are confusing two things. One of those things is whether type A is itself associated with a reduced odds of outcome. That question is answered in your original regression. The other thing is whether the effect of treatment differs depending on type A vs type B. That is what your subgroup analyses examine. It's an entirely different question, and the answer to one of these questions has nothing to do with the answer to the other. All four combinations of yes and no answers to these two questions are possible.

    My subgroup analysis among typeB patients appears to suggest there is no statistically significant treatment effect.
    Yes, but so what? You don't show your complete output, so I can only get a rough order of magnitude intuition about your sample size. But bear in mind that sample sizes of the subgroup analyses are smaller than the sample sizes of the full analysis. And at least one of the two subgroups is at most half the size of the original. That reduction in sample size plays havoc with your statistical power. In general terms, a rule of thumb is that if your original sample size is just adequately powered for your purposes, your smaller subsample analysis will be severely underpowered.

    Further, the interaction term between type and treatment is not significant.
    Well, you don't show that result, so it's hard to comment on. But I would not necessarily disregard a "not significant" interaction if the magnitude (Ratio of Odds Ratios) is appreciably large and the confidence interval only slightly overlaps 1. The statistical power issue comes out in full force with an interaction analysis. The rule of thumb is that the sample size needed to adequately power an interaction test is between four and sixteen times as large as the sample size needed for the main effects tests. So, again, unless your sample size is extremely large, it is likely that the lack of statistical significance here tells you nothing useful.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Yes, but so what? You don't show your complete output, so I can only get a rough order of magnitude intuition about your sample size. But bear in mind that sample sizes of the subgroup analyses are smaller than the sample sizes of the full analysis. And at least one of the two subgroups is at most half the size of the original. That reduction in sample size plays havoc with your statistical power. In general terms, a rule of thumb is that if your original sample size is just adequately powered for your purposes, your smaller subsample analysis will be severely underpowered.
      Well, you don't show that result, so it's hard to comment on. But I would not necessarily disregard a "not significant" interaction if the magnitude (Ratio of Odds Ratios) is appreciably large and the confidence interval only slightly overlaps 1. The statistical power issue comes out in full force with an interaction analysis. The rule of thumb is that the sample size needed to adequately power an interaction test is between four and sixteen times as large as the sample size needed for the main effects tests. So, again, unless your sample size is extremely large, it is likely that the lack of statistical significance here tells you nothing useful.
      Hi Dr. Schechter, thank you so much for your help. And apologies for the misinterpretation. I changed the names of my variables to dummy names for this post but accidentally mis-renamed one of my variables, thus the error.

      Here is the output of the interaction term:
      Code:
      -----------------------------------------------------------------------------------------
              outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ------------------------+----------------------------------------------------------------
                         typeA |
                           0  |          1  (base)
                           1  |   .5387753   .0170946   -19.49   0.000     .5062909    .5733438
                              |
                  treatment |
                           0  |          1  (base)
                           1  |   1.363016   .0833109     5.07   0.000     1.209131    1.536485
                              |
             typeA#treatment |
                         1 1  |   .8304493   .1057699    -1.46   0.145     .6469938    1.065924
      The CI is quite wide.

      As you correctly predicted, one of my groups is much larger than the other:
      Code:
                 |      treatment
            typeA |         0          1 |     Total
      -----------+----------------------+----------
               0 |    53,094      2,836 |    55,930
                 |     66.05      63.93 |     65.94
      -----------+----------------------+----------
               1 |    27,294      1,600 |    28,894
                 |     33.95      36.07 |     34.06
      However, because of this, how would you suggest moving forward? If the subgroup on typeB patients is underpowered, can I really conclude that there is no statistically significant treatment effect?

      Thank you again.

      Comment


      • #4
        Well, you can conclude that the effect in type B patients is not "statistically significant." But what does that mean in an underpowered analysis. It does not mean that there is no treatment effect in type B patients. It means that the study is inconclusive with regard to type B patients. There might or might not be an effect, and it might be in either direction, but the data are too scanty to reach any specific conclusions, except perhaps to say confidently that the treatment effect is not very large.

        As for interpreting the interaction term, this is just what I was alluding to in #2. The ratio of odds ratios is 0.83, which is a 17% reduction in treatment odds ratio in type B compared to type A. That's an appreciable effect modification. If it were statistically significant you would even call it impressive. Now look at the confidence interval: it ranges from 0.65 to 1.07. So that's largely in the < 1 territory, with only a small amount of coverage in > 1 ranges. So this is pretty suggestive of an important amount of effect modification, but, again, in light of the reduced power of the test for the interaction term, it is just inconclusive. You can't really say it is, and you can't really say it isn't.

        Inconclusive results. That's what underpowered analyses look like.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Well, you can conclude that the effect in type B patients is not "statistically significant." But what does that mean in an underpowered analysis. It does not mean that there is no treatment effect in type B patients. It means that the study is inconclusive with regard to type B patients. There might or might not be an effect, and it might be in either direction, but the data are too scanty to reach any specific conclusions, except perhaps to say confidently that the treatment effect is not very large.

          As for interpreting the interaction term, this is just what I was alluding to in #2. The ratio of odds ratios is 0.83, which is a 17% reduction in treatment odds ratio in type B compared to type A. That's an appreciable effect modification. If it were statistically significant you would even call it impressive. Now look at the confidence interval: it ranges from 0.65 to 1.07. So that's largely in the < 1 territory, with only a small amount of coverage in > 1 ranges. So this is pretty suggestive of an important amount of effect modification, but, again, in light of the reduced power of the test for the interaction term, it is just inconclusive. You can't really say it is, and you can't really say it isn't.

          Inconclusive results. That's what underpowered analyses look like.
          Thank you so much. Is there a particular statistical test to evaluate power that you rely on?
          Similarly, would using a technique of IPTW or ebalance (that would balance groups) help in this situation?

          Comment


          • #6
            Statistical power is best dealt with during the study design phase, before even collecting data, calculating the sample size needed to detect any effect large enough that you would want it to be statistically significant if it is present, so one collects an adequate sample to start with. Once you have done the analysis, it is best to turn the power calculations on their head: calculate the minimum effect size that you have adequate power (conventionally 80% or 90%) to detect with the sample size that is in hand.

            Stata has some programs for calculating power from sample size or the other way around. It covers the most basic analyses (but not including logistic regression, I'm afraid), but not the more advanced ones. There are various software packages, commercially available, and typically expensive, that go into greater depth than what is covered by Stata. There is also freeware available, but most of that covers more or less the same range of analyses as Stata.

            Improving group balance is a good thing to do, but it doesn't enhance statistical power. It reduces bias. To the extent that you have bias in the direction of underestimating true effects, reducing that bias will increase your estimated effect size, which in turn increases your chances of getting a "statistically significant" result. But that isn't really increasing the power of your analysis--that's just the fact that correcting the error attributable to group imbalances was in a favorable direction. Over the long haul, it is just as likely that eliminating bias will reduce your effect size estimate, thereby making it even less "statistically significant." So, by all means, do things to remove bias to get more accurate estimates--but that's orthogonal to power issues.

            There are basically two ways to increase statistical power. The first is to have a larger sample. The second is to reduce extraneous sources of outcome variation. The latter might be done by using more precise ways to measure the outcome in the first place, or it might be done by including covariates in the analysis that themselves account for substantial outcome variation independent of the effect(s) you are trying to study. All of these approaches have their practical imitations. Getting larger samples, depending on the nature of the data, can be expensive, perhaps prohibitively so. More precise outcome measurements may or may not exist, or, if they do, may also prove expensive. And finding covariates that account for enough outcome variation to materially improve power can be difficult; there is no guarantee that any such covariates even exist, and when they do exist, obtaining data on them may be difficult or expensive.

            I don't say this to be nihilistic--just to emphasize that real world data analysis is subject to constraints. The best defense is good advance planning: carefully choosing sharp study questions, identifying sources of high quality measurements of the relevant variables, and calculating the needed sample size to assure that a large enough sample is obtainable with available resources. This phase of research isn't always fun, but it sure beats the frustration and disappointment of trying, and perhaps failing, to salvage analysis of a data set that is simply not up to the task required of it.
            Last edited by Clyde Schechter; 15 Mar 2024, 19:19.

            Comment

            Working...
            X