Test whether difference in R-squared across regression modes is significant?

Jim Johnson

Join Date: Mar 2023

Posts: 6
#1

Test whether difference in R-squared across regression modes is significant?

29 Sep 2024, 20:24

Is it possible to test whether the difference in R-squared between two OLS models with the same dependent variable is statistically significant? What test would that be?

For some context, I'm curious whether postsecondary indicators of academic performance (e.g. undergrad GPA, major, quality of institution) account for more variance in LSAT scores for students who take the LSAT from home compared to facilities. So I have one regression with the sample of students who test from home, and another regression with the facilities students. The r-squared is higher in the home test group.

Last edited by Jim Johnson; 29 Sep 2024, 20:45.
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3407
#2

30 Sep 2024, 02:12

The first thing I would do is look at the variances in LSAT scores for these two groups. If they are different, then you have a problem: The proportion of the variance in LSAT scores that is explained by your model may be lower for the facility test group, but if the overall variance is higher for the facility test group, then what conclusion would you draw? So if the variances are too different, then you need to reconsider your research project.

If the variances are similar enough for your taste, then I would look into ANOVA. I hardly ever use ANOVA. All I know is that this is a question that could be answered with that.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Jim Johnson

Join Date: Mar 2023

Posts: 6
#3

30 Sep 2024, 08:20

I've been doing some more Googling...What about Levene's test?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17606
#4

30 Sep 2024, 11:08

Jim:
why bot interacting your main predictor with the 2-level categorical variable -home- (say, 0=home; 1=facilities)?

Kind regards,
Carlo
(StataNow 18.5)
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3407
#5

30 Sep 2024, 11:12

Levene's test tests for equality of variance, but I would not worry about that. Just compute both variances and look at it and decide if you find them substantively equal enough. What you are dealing with is a substantive problem not one for statistical testing.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
Jim Johnson

Join Date: Mar 2023

Posts: 6
#6

30 Sep 2024, 13:13

I worry that won't satisfy a journal editor, unfortunately. How about just an F-test?
Comment

Felix Bittmann

Join Date: Aug 2018
Posts: 616

30 Sep 2024, 15:14

You have gotten good advice here. If you really just want to compare the R2 values, you can use bootstrapping, see this example:

Code:

clear all
sysuse nlsw88, clear

cap program drop rcompare
program define rcompare, rclass
    reg wage hours collgrad if south == 0
    local r1 = e(r2)
    reg wage hours collgrad if south == 1
    local r2 = e(r2)
    return scalar r2diff = `r1' - `r2'
end

*** Test***
rcompare
return list


bootstrap r(r2diff), reps(500) seed(123) nodrop: rcompare
estat bootstrap, bc

Best wishes

(Stata 16.1 MP)

Comment

Alfonso Sánchez-Peñalver

Join Date: Mar 2014
Posts: 432

30 Sep 2024, 16:35

Hi everyone, just some thoughts that came to mind while reading the posts here which actually are what I'm sure Carlo Lazzaro, on the money as usual, was hinting at. Jim Johnson let me see if I have the issue clear in my head. You have estimated two regressions to see which academic characteristics predict LSAT scores better, and you have divided the estimation between two groups: home takers, and facility takers. I am unsure why it is important to check on which of the two groups the explanatory variables do a better job. I think that something that is more interesting is to see if the estimated parameters differ across groups. For example, is the marginal effect of GPA on the LSAT score different between home takers and facility takers. The kind of test you need for that is a Chow test, which is a kind of structural test. Consider the following as an example.

Code:

. sysuse auto, clear
(1978 Automobile Data)

. 
. reg price i.foreign##(c.mpg c.weight c.length)

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(7, 66)        =     14.66
       Model |   386465950         7  55209421.4   Prob > F        =    0.0000
    Residual |   248599446        66  3766658.28   R-squared       =    0.6085
-------------+----------------------------------   Adj R-squared   =    0.5670
       Total |   635065396        73  8699525.97   Root MSE        =    1940.8

----------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
         foreign |
        Foreign  |  -8856.966   11765.08    -0.75   0.454    -32346.71    14632.77
             mpg |   142.7663     122.76     1.16   0.249     -102.332    387.8646
          weight |   6.767233    1.11744     6.06   0.000     4.536192    8.998275
          length |  -109.9518   35.97631    -3.06   0.003    -181.7809   -38.12283
                 |
   foreign#c.mpg |
        Foreign  |  -161.1735    151.057    -1.07   0.290    -462.7686    140.4216
                 |
foreign#c.weight |
        Foreign  |  -1.982392   2.716314    -0.73   0.468    -7.405688    3.440903
                 |
foreign#c.length |
        Foreign  |   123.3424   83.33313     1.48   0.144    -43.03759    289.7223
                 |
           _cons |   2359.475   7080.216     0.33   0.740    -11776.63    16495.58
----------------------------------------------------------------------------------

. 
. test 1.foreign 1.foreign#c.mpg 1.foreign#c.weight 1.foreign#c.length

 ( 1)  1.foreign = 0
 ( 2)  1.foreign#c.mpg = 0
 ( 3)  1.foreign#c.weight = 0
 ( 4)  1.foreign#c.length = 0

       F(  4,    66) =   10.59
            Prob > F =    0.0000

What you're testing is that at least one of the parameters related to foreign (location could be your case) is different from zero. If at least one is different from zero, as is the case here, you have two different equations: one for each group.

Alternatively, you can do the same test using the separate regressions you just did. Wooldridge (2020) explains how to do it that way, but both tests are identical, so you can just do it the way I show you here. The advantage of doing it like i show you here is that you can also identify which parameter(s) are actually significantly different and which not with the usual t-tests, or do F-tests of group of parameters to see if they are jointly (in)significant, whereas in the separate equation estimation is not so straight forward.

Reference:
Wooldridge, Jeffery M. (2020) Introductory Econometrics: A Modern Approach 7th. ed. Boston, MA: Cengage Learning

Alfonso Sanchez-Penalver

Comment

Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#9

30 Sep 2024, 16:49

Wanted to add, following Maarten Buis's point, that if the difference in variances is an issue, you could always do FGLS estimation (which is done with weighted least squares here), having location as an explanatory variable of the variance of the residuals, and then do the Chow test. I refer you to the same book I did before to see how to estimate the model like this.

By the way, this just made me think of something. Is there a command in Stata for ml estimation of the linear model that allows to model heteroskedasticity? I feel that robust standard errors have become so conveniently pervasive in applied work, that we have obviated and almost depecrated modeling heteroskedasticity. Best!!!

Alfonso Sanchez-Penalver
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3407
#10

01 Oct 2024, 03:03

Originally posted by Alfonso Sánchez-Peñalver View Post

By the way, this just made me think of something. Is there a command in Stata for ml estimation of the linear model that allows to model heteroskedasticity? I feel that robust standard errors have become so conveniently pervasive in applied work, that we have obviated and almost depecrated modeling heteroskedasticity. Best!!!

Kit Baum had a Stata Tip on that years ago

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4352
#11

01 Oct 2024, 03:35

Originally posted by Alfonso Sánchez-Peñalver View Post

Is there a command in Stata for ml estimation of the linear model that allows to model heteroskedasticity?

There are a couple: hetregress and mixed with its residuals(independent, by()) option. For the latter, you can omit the random effects term, although it allows modeling residual variances only by a categorical variable.
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3407
#12

01 Oct 2024, 04:18

There is a lot to unpack in this thread. I will break this up into two posts. In this post I will look at what a difference in \(R^2\) could actually mean.

lets start with a bivariate regression: \(y = \beta_0 + \beta_1 x_1 + \varepsilon\) . The \(R^2\) is the proportion of the variance in \(y\) explained by the model: \(\frac{\mathrm{var}(\hat{y})}{\mathrm{var}(y)}= \frac{\mathrm{var}(\beta_0 + \beta_1 x_1)}{\mathrm{var}(y)} = \frac{\beta_1^2\mathrm{var}(x_1)}{\mathrm{var}(y )} \)

So if we compare the \(R^2\) across groups, the \(R^2\) in group 1 could be higher than in group two because \(\beta_1\) is higher in group 1, or the variance in \(x\) is higher in group 1 or the overall variance in \(y\) is lower in group 1 (and if we assume that for the latter case that \(\beta_1\) and \(\mathrm{var}(x_1)\) are equal in group 1 and 2, then \(\mathrm{var}(\varepsilon)\) is lower in group 1). In real life it is going to be a combination of all three. So what does a difference in \(R^2\) across groups mean? It means that either \(x\) has a different effect across groups, or the variance of \(x\) is different across groups, or the variance of the unobserved other factors is different across groups, or any combination of these. In don't find that a very satisfying result.

Things get even more complicated when you add multiple explanatory variables. Rember that \(\mathrm{var}(x + z) = \mathrm{var}(x) + \mathrm{var}(z)-2\mathrm{cov}(x,z)\). So now there is a fourth factor that could influence a difference in \(R^2\) across groups: differences between groups in the covariance between the explanatory variables (multicolinearity).

So just showing that the \(R^2\) is different between groups tells us surprisingly little. It may be a first descriptive result, which is followed in that paper by an analysis of the variances and covariances of the explanatory variables, the coefficients and residual variance. Especially if most of the differences in \(R^2\) can be explained by the coefficients and the variances and covariances of the explanatory variables, then showing where those differences come from could be interesting in some cases. Alternatively, you could narrow down the focus of your paper and look at only the coefficients and ignore the variances and covariances of the explanatory variable and the residual, like Alfonso Sánchez-Peñalver suggested in #8. I suspect that the latter is closer to what you actually want and easier to implement.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
3 likes
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3407
#13

01 Oct 2024, 05:25

The second part of my answer refers to this comment:

Originally posted by Jim Johnson View Post

I worry that won't satisfy a journal editor, unfortunately.

In that case the editor is just plain wrong. In fact so wrong that (s)he should be boiled in oil before being politely asked to leave the profession, and hand in all diplomas and certificates from university, secondary school, primary school and kindergarten on the way out, since they are obviously obtained through fraud given demonstrated lack of abiltiy of any form of though (rational or otherwise) by said editor. I guess I have rather strong opions about that, and I would not recommend quoting the second sentence to that editor.

First, statistical test \(\neq\) scientific. A statistical test is a very limited attempt to answer a very specific question: The question is "how certain am I about my results?". That is a very broad question. So to put a number on that, we need to narrow it down. A (frequentist) statistical test changes the question to: "what would the chance be of drawing a sample that defiates as much or more from the null-hypothesis as was observed if the null-hypothesis is true and the model is correct?" The nice thing is that that question has an answer (the \(p\)-value), the not so nice thing is that is rather removed from the question we actually want to answer. It still has a purpose, but nowhere near somthing like the definition of scientific.

Second, only use tests to test the hypothesis you care about. Do not use tests for intermediate steps like model selection. If you select a model, the model needs to be an acceptable simplification of reality. So the question you have is "is this deviation acceptable". That is not the question a statistical tests answers. That is bad. It is like when you want to know whether or not you should pursue a PhD degree you look at what stars and planets where visible at the moment you were born in your part of the world. (Yes, I have just claimed that people who use statistical test for model selection are as scientific as astrologists, and I stand by that claim. At least it is a milder statement than that they should be boiled in oil... )

Moreover, the p-value is (a rather convoluted) probability. If you only perform a test if you pass previous tests it becomes a conditional probability. It changes the meaning completely. People have enough trouble with sensibly interpreting p-values, now you have made it pretty much impossible.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#14

01 Oct 2024, 10:06

Originally posted by Joseph Coveney View Post

There are a couple: hetregress and mixed with its residuals(independent, by()) option. For the latter, you can omit the random effects term, although it allows modeling residual variances only by a categorical variable.

Thanks Joseph Coveney, I didn't know about hetregess, I'll check it out. I am so used to using mixed for random parameter estimation that I hadn't realized you can model different overall variances by categories. Thanks!!!

Alfonso Sanchez-Penalver
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#15

01 Oct 2024, 11:01

As an educational psychologist who thinks a lot about measurement, I tend to agree with Maarten about the issue at hand. In measurement, a key concern is whether a given instrument can be used and interpreted similarly across different meaningful groups. With people, the groups might be based on biological sex, race, age, etc. In the present analysis, the most important grouping variable is remote vs. in-person administration. I would wager my salary that nearly all of the validity and reliability evidence for tests such as the LSAT were done on the in-person assessment.

Psychologists have developed means of examining whether the properties of a measurement instrument hold across groups; we call it measurement invariance. It involves either IRT or CTT (sem) models that impose successively restrictive constraints on the parameters of the measurement model across groups. To conduct invariance assessment, an analyst needs access to the item level data (the individual responses to each question) and the relevant grouping variables associated with the test taker. Absent having that level of information, you are at the mercy of the test developers to have carried out these analyses and reported on them.

In the case of something like the LSAT, one would imagine that the company/organization that makes it has looked at this issue extensively. However, a rather quick web search did not reveal any published material (peer-reviewed or otherwise) on the topic. Jim Johnson should conduct their own exhaustive search on the matter. The validity of your own analysis of mean differences depends upon it. Everything proposed in the discussion so far is reasonable under the assumption that the LSAT scores are invariant to method of administration.
Comment

Announcement