Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • foo

    There is a lot to unpack in this thread. I will break this up into two posts. In this post I will look at what a difference in \(R^2\) could actually mean.

    lets start with a bivariate regression: \(y = \beta_0 + \beta_1 x_1 + \varepsilon\) . The \(R^2\) is the proportion of the variance in \(y\) explained by the model: \(\frac{\mathrm{var}(\hat{y})}{\mathrm{var}(y)}= \frac{\mathrm{var}(\beta_0 + \beta_1 x_1)}{\mathrm{var}(y)} = \frac{\beta_1^2\mathrm{var}(x_1)}{\mathrm{var}(y )} \)

    So if we compare the \(R^2\) across groups the \(R^2\) in group 1 could be higher than in group two because \(\beta_1\) is higher in group 1, or the variance in \(x\) is higher in group 1 or the overall variance in \(y\) is lower in group 1 (and if we assume that for the latter case that \(\beta_1\) and \(\mathrm{var}(x_1)\) are equal in group 1 and 2, then \(\mathrm{var}(\varepsilon)\) is lower in group 1). In real life it is going to be a combination of all three. So what does a difference in \(R^2\) across groups mean? It means that either \(x\) has a different effect across groups, or the variance of \(x\) is different across groups, or the variance of the unobserved other factors is different across groups, or any combination of these. In don't find that a very satisfying result.

    Things get even more complicated when you add multiple explanatory variables. Rember that \(\mathrm{var}(x + z) = \mathrm{var}(x) + \mathrm{var}(z)-2\mathrm{cov}(x,z)\)
    Last edited by Maarten Buis; 01 Oct 2024, 04:03.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------
Working...
X