Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Test whether difference in R-squared across regression modes is significant?

    Is it possible to test whether the difference in R-squared between two OLS models with the same dependent variable is statistically significant? What test would that be?

    For some context, I'm curious whether postsecondary indicators of academic performance (e.g. undergrad GPA, major, quality of institution) account for more variance in LSAT scores for students who take the LSAT from home compared to facilities. So I have one regression with the sample of students who test from home, and another regression with the facilities students. The r-squared is higher in the home test group.
    Last edited by Jim Johnson; 29 Sep 2024, 20:45.

  • #2
    The first thing I would do is look at the variances in LSAT scores for these two groups. If they are different, then you have a problem: The proportion of the variance in LSAT scores that is explained by your model may be lower for the facility test group, but if the overall variance is higher for the facility test group, then what conclusion would you draw? So if the variances are too different, then you need to reconsider your research project.

    If the variances are similar enough for your taste, then I would look into ANOVA. I hardly ever use ANOVA. All I know is that this is a question that could be answered with that.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      I've been doing some more Googling...What about Levene's test?

      Comment


      • #4
        Jim:
        why bot interacting your main predictor with the 2-level categorical variable -home- (say, 0=home; 1=facilities)?
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Levene's test tests for equality of variance, but I would not worry about that. Just compute both variances and look at it and decide if you find them substantively equal enough. What you are dealing with is a substantive problem not one for statistical testing.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            I worry that won't satisfy a journal editor, unfortunately. How about just an F-test?

            Comment


            • #7
              You have gotten good advice here. If you really just want to compare the R2 values, you can use bootstrapping, see this example:

              Code:
              clear all
              sysuse nlsw88, clear
              
              cap program drop rcompare
              program define rcompare, rclass
                  reg wage hours collgrad if south == 0
                  local r1 = e(r2)
                  reg wage hours collgrad if south == 1
                  local r2 = e(r2)
                  return scalar r2diff = `r1' - `r2'
              end
              
              *** Test***
              rcompare
              return list
              
              
              bootstrap r(r2diff), reps(500) seed(123) nodrop: rcompare
              estat bootstrap, bc

              Best wishes

              (Stata 16.1 MP)

              Comment


              • #8
                Hi everyone, just some thoughts that came to mind while reading the posts here which actually are what I'm sure Carlo Lazzaro, on the money as usual, was hinting at. Jim Johnson let me see if I have the issue clear in my head. You have estimated two regressions to see which academic characteristics predict LSAT scores better, and you have divided the estimation between two groups: home takers, and facility takers. I am unsure why it is important to check on which of the two groups the explanatory variables do a better job. I think that something that is more interesting is to see if the estimated parameters differ across groups. For example, is the marginal effect of GPA on the LSAT score different between home takers and facility takers. The kind of test you need for that is a Chow test, which is a kind of structural test. Consider the following as an example.

                Code:
                . sysuse auto, clear
                (1978 Automobile Data)
                
                . 
                . reg price i.foreign##(c.mpg c.weight c.length)
                
                      Source |       SS           df       MS      Number of obs   =        74
                -------------+----------------------------------   F(7, 66)        =     14.66
                       Model |   386465950         7  55209421.4   Prob > F        =    0.0000
                    Residual |   248599446        66  3766658.28   R-squared       =    0.6085
                -------------+----------------------------------   Adj R-squared   =    0.5670
                       Total |   635065396        73  8699525.97   Root MSE        =    1940.8
                
                ----------------------------------------------------------------------------------
                           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -----------------+----------------------------------------------------------------
                         foreign |
                        Foreign  |  -8856.966   11765.08    -0.75   0.454    -32346.71    14632.77
                             mpg |   142.7663     122.76     1.16   0.249     -102.332    387.8646
                          weight |   6.767233    1.11744     6.06   0.000     4.536192    8.998275
                          length |  -109.9518   35.97631    -3.06   0.003    -181.7809   -38.12283
                                 |
                   foreign#c.mpg |
                        Foreign  |  -161.1735    151.057    -1.07   0.290    -462.7686    140.4216
                                 |
                foreign#c.weight |
                        Foreign  |  -1.982392   2.716314    -0.73   0.468    -7.405688    3.440903
                                 |
                foreign#c.length |
                        Foreign  |   123.3424   83.33313     1.48   0.144    -43.03759    289.7223
                                 |
                           _cons |   2359.475   7080.216     0.33   0.740    -11776.63    16495.58
                ----------------------------------------------------------------------------------
                
                . 
                . test 1.foreign 1.foreign#c.mpg 1.foreign#c.weight 1.foreign#c.length
                
                 ( 1)  1.foreign = 0
                 ( 2)  1.foreign#c.mpg = 0
                 ( 3)  1.foreign#c.weight = 0
                 ( 4)  1.foreign#c.length = 0
                
                       F(  4,    66) =   10.59
                            Prob > F =    0.0000
                What you're testing is that at least one of the parameters related to foreign (location could be your case) is different from zero. If at least one is different from zero, as is the case here, you have two different equations: one for each group.

                Alternatively, you can do the same test using the separate regressions you just did. Wooldridge (2020) explains how to do it that way, but both tests are identical, so you can just do it the way I show you here. The advantage of doing it like i show you here is that you can also identify which parameter(s) are actually significantly different and which not with the usual t-tests, or do F-tests of group of parameters to see if they are jointly (in)significant, whereas in the separate equation estimation is not so straight forward.

                Reference:
                Wooldridge, Jeffery M. (2020) Introductory Econometrics: A Modern Approach 7th. ed. Boston, MA: Cengage Learning
                Alfonso Sanchez-Penalver

                Comment


                • #9
                  Wanted to add, following Maarten Buis's point, that if the difference in variances is an issue, you could always do FGLS estimation (which is done with weighted least squares here), having location as an explanatory variable of the variance of the residuals, and then do the Chow test. I refer you to the same book I did before to see how to estimate the model like this.

                  By the way, this just made me think of something. Is there a command in Stata for ml estimation of the linear model that allows to model heteroskedasticity? I feel that robust standard errors have become so conveniently pervasive in applied work, that we have obviated and almost depecrated modeling heteroskedasticity. Best!!!
                  Alfonso Sanchez-Penalver

                  Comment


                  • #10
                    Originally posted by Alfonso Sánchez-Peñalver View Post
                    By the way, this just made me think of something. Is there a command in Stata for ml estimation of the linear model that allows to model heteroskedasticity? I feel that robust standard errors have become so conveniently pervasive in applied work, that we have obviated and almost depecrated modeling heteroskedasticity. Best!!!
                    Kit Baum had a Stata Tip on that years ago
                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment


                    • #11
                      Originally posted by Alfonso Sánchez-Peñalver View Post
                      Is there a command in Stata for ml estimation of the linear model that allows to model heteroskedasticity?
                      There are a couple: hetregress and mixed with its residuals(independent, by()) option. For the latter, you can omit the random effects term, although it allows modeling residual variances only by a categorical variable.

                      Comment


                      • #12
                        There is a lot to unpack in this thread. I will break this up into two posts. In this post I will look at what a difference in \(R^2\) could actually mean.

                        lets start with a bivariate regression: \(y = \beta_0 + \beta_1 x_1 + \varepsilon\) . The \(R^2\) is the proportion of the variance in \(y\) explained by the model: \(\frac{\mathrm{var}(\hat{y})}{\mathrm{var}(y)}= \frac{\mathrm{var}(\beta_0 + \beta_1 x_1)}{\mathrm{var}(y)} = \frac{\beta_1^2\mathrm{var}(x_1)}{\mathrm{var}(y )} \)

                        So if we compare the \(R^2\) across groups, the \(R^2\) in group 1 could be higher than in group two because \(\beta_1\) is higher in group 1, or the variance in \(x\) is higher in group 1 or the overall variance in \(y\) is lower in group 1 (and if we assume that for the latter case that \(\beta_1\) and \(\mathrm{var}(x_1)\) are equal in group 1 and 2, then \(\mathrm{var}(\varepsilon)\) is lower in group 1). In real life it is going to be a combination of all three. So what does a difference in \(R^2\) across groups mean? It means that either \(x\) has a different effect across groups, or the variance of \(x\) is different across groups, or the variance of the unobserved other factors is different across groups, or any combination of these. In don't find that a very satisfying result.

                        Things get even more complicated when you add multiple explanatory variables. Rember that \(\mathrm{var}(x + z) = \mathrm{var}(x) + \mathrm{var}(z)-2\mathrm{cov}(x,z)\). So now there is a fourth factor that could influence a difference in \(R^2\) across groups: differences between groups in the covariance between the explanatory variables (multicolinearity).

                        So just showing that the \(R^2\) is different between groups tells us surprisingly little. It may be a first descriptive result, which is followed in that paper by an analysis of the variances and covariances of the explanatory variables, the coefficients and residual variance. Especially if most of the differences in \(R^2\) can be explained by the coefficients and the variances and covariances of the explanatory variables, then showing where those differences come from could be interesting in some cases. Alternatively, you could narrow down the focus of your paper and look at only the coefficients and ignore the variances and covariances of the explanatory variable and the residual, like Alfonso Sánchez-Peñalver suggested in #8. I suspect that the latter is closer to what you actually want and easier to implement.
                        ---------------------------------
                        Maarten L. Buis
                        University of Konstanz
                        Department of history and sociology
                        box 40
                        78457 Konstanz
                        Germany
                        http://www.maartenbuis.nl
                        ---------------------------------

                        Comment


                        • #13
                          The second part of my answer refers to this comment:

                          Originally posted by Jim Johnson View Post
                          I worry that won't satisfy a journal editor, unfortunately.
                          In that case the editor is just plain wrong. In fact so wrong that (s)he should be boiled in oil before being politely asked to leave the profession, and hand in all diplomas and certificates from university, secondary school, primary school and kindergarten on the way out, since they are obviously obtained through fraud given demonstrated lack of abiltiy of any form of though (rational or otherwise) by said editor. I guess I have rather strong opions about that, and I would not recommend quoting the second sentence to that editor.

                          First, statistical test \(\neq\) scientific. A statistical test is a very limited attempt to answer a very specific question: The question is "how certain am I about my results?". That is a very broad question. So to put a number on that, we need to narrow it down. A (frequentist) statistical test changes the question to: "what would the chance be of drawing a sample that defiates as much or more from the null-hypothesis as was observed if the null-hypothesis is true and the model is correct?" The nice thing is that that question has an answer (the \(p\)-value), the not so nice thing is that is rather removed from the question we actually want to answer. It still has a purpose, but nowhere near somthing like the definition of scientific.

                          Second, only use tests to test the hypothesis you care about. Do not use tests for intermediate steps like model selection. If you select a model, the model needs to be an acceptable simplification of reality. So the question you have is "is this deviation acceptable". That is not the question a statistical tests answers. That is bad. It is like when you want to know whether or not you should pursue a PhD degree you look at what stars and planets where visible at the moment you were born in your part of the world. (Yes, I have just claimed that people who use statistical test for model selection are as scientific as astrologists, and I stand by that claim. At least it is a milder statement than that they should be boiled in oil... )

                          Moreover, the p-value is (a rather convoluted) probability. If you only perform a test if you pass previous tests it becomes a conditional probability. It changes the meaning completely. People have enough trouble with sensibly interpreting p-values, now you have made it pretty much impossible.

                          ---------------------------------
                          Maarten L. Buis
                          University of Konstanz
                          Department of history and sociology
                          box 40
                          78457 Konstanz
                          Germany
                          http://www.maartenbuis.nl
                          ---------------------------------

                          Comment


                          • #14
                            Originally posted by Joseph Coveney View Post
                            There are a couple: hetregress and mixed with its residuals(independent, by()) option. For the latter, you can omit the random effects term, although it allows modeling residual variances only by a categorical variable.
                            Thanks Joseph Coveney, I didn't know about hetregess, I'll check it out. I am so used to using mixed for random parameter estimation that I hadn't realized you can model different overall variances by categories. Thanks!!!
                            Alfonso Sanchez-Penalver

                            Comment


                            • #15
                              As an educational psychologist who thinks a lot about measurement, I tend to agree with Maarten about the issue at hand. In measurement, a key concern is whether a given instrument can be used and interpreted similarly across different meaningful groups. With people, the groups might be based on biological sex, race, age, etc. In the present analysis, the most important grouping variable is remote vs. in-person administration. I would wager my salary that nearly all of the validity and reliability evidence for tests such as the LSAT were done on the in-person assessment.

                              Psychologists have developed means of examining whether the properties of a measurement instrument hold across groups; we call it measurement invariance. It involves either IRT or CTT (sem) models that impose successively restrictive constraints on the parameters of the measurement model across groups. To conduct invariance assessment, an analyst needs access to the item level data (the individual responses to each question) and the relevant grouping variables associated with the test taker. Absent having that level of information, you are at the mercy of the test developers to have carried out these analyses and reported on them.

                              In the case of something like the LSAT, one would imagine that the company/organization that makes it has looked at this issue extensively. However, a rather quick web search did not reveal any published material (peer-reviewed or otherwise) on the topic. Jim Johnson should conduct their own exhaustive search on the matter. The validity of your own analysis of mean differences depends upon it. Everything proposed in the discussion so far is reasonable under the assumption that the LSAT scores are invariant to method of administration.

                              Comment

                              Working...
                              X