Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting an Unexpected Negative Beta in HMR

    Hello!
    I'm hoping someone can help me interpret an unexpected negative value for a beta that I obtained in a hierarchical multiple linear regression. I do not believe it is a suppressor variable as a) the sum of the squared semi-partials are not greater than the r-squared, b) the r-squared is less than .5, and c) there doesn't seem to be anything too wonky in the correlations (technical term!)
    A little background about my study... I'm trying to see if one of three executive function assessments better predict academic achievement. The 3 IVs are a performance-based measure of inhibition, a teacher's rating of inhibition, and a teacher's rating of attention. The DV is a score on an academic test (i.e., reading, math, or science-- each one run separately). The teacher's rating of inhibition has negative beta values in about half of the models (I'm looking at 3 different subjects across 2 different grades).
    I'm not sure how to interpret it or what to do about it, if it is uninterpretable.
    I would greatly appreciate any help.
    Thank you!!
    --Emily

  • #2
    It is very difficult to look at a correlation matrix an decide whether it contains something "wonky." These things are not intuitive at all. I'll assume you have already checked that, individually, each of these three predictors correlates positively with your outcome measure. Most likely, the combination of the two other predictor variables slightly overpredicts the outcome, and a negative contribution from the third corrects for it.

    Another possibility to consider is that you have confounding by some variable that is not included in your model. Adding a covariate can change the signs of other predictors: this is Simpson's paradox (often called Lord's paradox in the context of regression analysis.)

    Comment


    • #3
      Thank you Clyde! (And, yes, all of the variables, individually, correlate positively with the outcome.) So, another question-- what is the best way to deal with it in the results/discussion? (Every text book that I can find just says it is hard to interpret.) This variable is the 2nd variable added, it's positive in the second model. It's negative in 4 out of the 6 third models (Grades 4 and 5 for math, just Grade 5 for reading and science--which is really confusing to me because the variables held pretty steady from year to year). The correlations are not all that different for any of the models so I'm really baffled. Is this something to mention and just say 'interpret with caution', are the models that have the negative betas invalid, are they all invalid?
      I would appreciate any advice.
      Thank you!!

      Comment


      • #4
        When in #1 you referred to hierarchical multivariate regression, I thought you meant a multi-level model. I imagined students nested in schools or classes, or something like that. But when you refer to a variable as being "added second" I gather you are using stepwise regression instead. Stepwise regression is an invalid statistical method for testing hypotheses or estimating effects At best, it is only usable as a data exploration tool, and frankly, it's not very good for that either. See https://www.stata.com/support/faqs/s...sion-problems/ for a fuller litany of the difficulties it presents.

        Another thing to consider before issuing your interpretation is the confidence intervals around these two coefficients. Are they very wide? When you have highly correlated predictors, it may be impossible to truly separate their effects, and you can end up with both coefficients being estimated with very large standard errors, and, correspondingly, very wide confidence intervals. Is that the case here? If so, then you really are not in a position to say much of anything about either variable's separate effect. If this is the situation in your data, then there is no easy way out of it. You would either need to get a much larger data set (a sufficiently large sample will overcome a high correlation, though the number required is often infeasible), or scrap your sample and collect a new one that is specially sampled is such a way that the two variables end up with a much lower correlation. The latter approach might be feasible, although the non-random sampling then entails a different analytic approach that respects the sampling design.

        Assuming that the confidence intervals are not wide enough that the situation in the preceding paragraph applies, in a multivariate model where you have chosen the predictors in some better way you end up with this situation, the interpretation is that you have two correlated predictors (which is unsurprising since both are measuring aspects of a single construct, executive function) and that the end result is that one ends up with a positive coefficient and the other with a negative one. This means that the measure of executive function that provides the best (under OLS) approximation to the outcome is actually a weighted difference between the two different aspects of executive function. This situation can arise either because it is actually the difference itself that is important, or simply because the positive-coefficient variable overpredicts the outcome, and this error is compensated for by the negative-coefficient predictor. It's just a fact of life that variables in multivariate regression models can and often do behave very differently from how they behave on their own in simple correlational analyses.


        Comment


        • #5
          Thank you (again!) for your detailed reply! I am using theory-driven (vs. statistically-chosen) step-wise regression, which, as I understand it, is a little more acceptable. As for the SE, they are actually really tiny-- although, my dataset is huge. There are a total of about 18,000 participants, although list-wise for these regressions it's more like 11,000.

          Comment


          • #6
            Yes, theory-driven stepwise regression is a sensible procedure.

            Given the narrow confidence intervals, it sounds to me like either this is one of those situations where one aspect of executive function overpredicts outcomes in a linear model, and the other, with a negative coefficient, corrects that overprediction, or it may actually be that the (weighted) difference between those two aspects of executive function is, itself, a predictor of the outcomes. It can be hard to explain this kind of result to people, but not impossible.

            There is one exploration you might want to consider in connection with this. I believe you said there are three different measures of executive function among your variables. You could try doing a principal components analysis of those. That will give you three orthogonal variables that are themselves aspects of executive function. Now, it might turn out that one of those components (usually the first) is a "size" component--all of the three measures load appreciably positively on it. And it might turn out that the second component has an appreciable positive loading from one of the variables and an appreciable negative from the second of the variables you are worried about. Then you could try replacing the executive function variables in your regression model with these two components. You might find both of them come out to have important coefficients. In that case you could conclude that overall magnitude of combined executive function as well as the difference between those two variables you are worried about are each predictive of your outcomes. It might not turn out that way, of course, but if it did, it would give the story a little extra clarity.

            Comment


            • #7
              My first question would be if the rating of the inhibition is better or worse as it gets higher. Is a teacher's rated inhibition supposed to have a positive coefficient? Meaning more reserved teachers are better? And it is a rating who is doing the rating? Would not surprise me if more inhibited teachers are less favorably rated if someone else especially students are doing the rating. Could be a bias in the measurement.

              Comment

              Working...
              X