Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choosing a simpler, but flawed analysis - justifiable?

    Hello,

    I have a question about a choice of analysis for a pre-post intervention survey. I posted about this in an earlier thread, but had planned to fit regression models to the data to ascertain the effects of an intervention on various test scores (i am not really interested in the effects of variables like sex, age, etc that i'm controlling for - mostly wanted to isolate the effects of the intervention)

    Link to earlier thread: https://www.statalist.org/forums/for...ns-is-violated

    I came across an almost identical study that took a much simpler approach to data analysis:
    - Compared pre-post percentage of correct answer choices = McNemar test with a continuity correction
    - Compared pre-post knowledge and confidence (both ordinal variables) = paired t-tests

    My study, and this study, both had significant attrition.

    My question is - do you think this much simpler approach is reasonable? There are obvious flaws, like using t-tests for ordinal data, for example, but how important is this if you end up coming to similar conclusions with more complex statistical methods like regression? I want to make things as simple as possible to maximize interpretability for readers, without creating significant risk of bias.

    The other article is attached, in case anyone wants to see it. As always, thank you for any advice in advance!
    Attached Files

  • #2
    There is no clear-cut answer to your question.

    Box's famous quote that all models are wrong but some are useful remains as true today as when he first said it. Even a model with a large number of covariates is unlikely to be a complete and accurate reflection of the real-world data generating process it attempts to model. There may be non-linearities in the relationships among continuous variables that are uncaptured. There may be interactions that are represented. The real world is complicated enough that the full data generating model may well involve so many things in so many ways that it would be infeasible to ever get a large enough data set to apply such a model to! So I think it is fair to say that even our most complicated models are to some extent simplifications.

    That said, you can go very wrong by simplifying too much. In your particular instance, the unadjusted analyses lead you to similar conclusions as the one that adjust for several other factors. But it is not always so. Indeed, here on Statalist, we regularly see people posting questions asking why the sign of the coefficient of their key variable changed when they added (removed) some covariate to (from) the model. This phenomenon is common enough and important enough that it has a name: Simpson's paradox. And everyone who works with data should become familiar with it. The Wikipedia page on it is an excellent place to start. So, you have to proceed with some caution when you simplify models. And you sometimes have to do some difficult mental wrestling, when an adjusted and unadjusted analysis give conflicting answers, to decide which one is the correct answer to your specific research question. It is not always the most adjusted model that answers the research question: sometimes the unadjusted model is. In fact, even, as in your case, where the results are not in conflict, you probably should spend at least a little time scrutinizing which analysis is actually more appropriate to your specific research question.

    I would also note that simplicity is in the eye of the beholder. From the name of the file you attached, I'm guessing that this study is aimed at a medical audience, or perhaps psychologists. (I don't download attachments from people I don't know.) My own experience with people in those fields is that they are more likely to feel comfortable with a multivariable regression than they are with a McNemar test, with or without continuity correction. In fact, I suspect most people in those fields have never heard of the McNemar test.

    And I have to point out that in any actual presentation of your results, even assuming you stick to a regression model that adjusts for several covariates, you still need to present the unadjusted results anyway. That is just standard practice. So, in a sense, you will be doing both (minus some p-values of dubious importance anyway).

    As for the use of statistics intended for use with continuous variables being applied to ordinal variables, this is not uncommon. While some people find it objectionable, if you can make a half-way plausible case that the ordinal scores in question are actually "equally spaced psychologically" then it's fine. And many people find it acceptable to use statistics for continuous variables with ordinals in all or nearly all situations.

    Comment


    • #3
      I tend to agree with Clyde Schechter.


      I would like to offer academic grades as an example of measurements that fail to fit over-simplified schemes such as nominal -- ordinal -- interval -- ratio. In the system I work in most assessments are on a percent scale. Some are based on a detailed marking scheme (e.g. 5% for question 1, 10% for question 2, and so forth) which is objective except for fuzziness often in detail: what is to be done for incomplete or partly correct answers? Others are based on a judgment mark with some conventions and boil down to "the student deserves a mark of so many % if and only if different academics agree that that is so". I couldn't pretend that there was any sense in which work that got 80% is twice as good as work that got 40%,, or that the difference between 80% and 70% is exactly the same as that between 50 and 40%: rather, all calculations downstream assume that that is so. Nor could I claim that there is any independent sense in which the work I grade as 80% is exactly equal in merit to work graded by someone else on a different course. Short of everything being based on exactly the same quiz, no one has a better system that I know.

      This is no different from say website rankings where my 4 is just my 4 and is going to be lumped in with other people's grades with no checks whatsoever on equivalence.

      To a strict measurement maven such scores aren't even ordinal, but people still calculate means and SDs regardless, and there are other checks and balances too.

      Comment


      • #4
        Thank you both so much for your thoughtful responses! I love "all models are wrong, but some are useful" - so true, and kind of puts things in perspective. I’m going with regression, and I think interpretability won't be an issue since I’m really not doing anything that complicated. Thanks again!

        Comment

        Working...
        X