Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The r square is too high

    Hi,
    I run an OLS for my cross-section data. I tried to test a non-linear relationship.
    y=x12+x1+x12*m1+x1*m1+x12*m2+x1*m2
    But I have a very high adjusted R square in the full model, more than 0.9. What should I do to solve this problem?

  • #2
    Very high is not a problem, generally. But if it explodes when you include the squared terms and interactions and you've got a lot of low t-stats, then you've probably got multicollinearity.

    Comment


    • #3
      Sylvia:
      welcome to this forum.
      As per FAQ, please post what you typed and what Stata gave you back. Thanks.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Originally posted by George Ford View Post
        Very high is not a problem, generally. But if it explodes when you include the squared terms and interactions and you've got a lot of low t-stats, then you've probably got multicollinearity.
        Hi George, thank you for your reply.

        The VIF is small enough for my variables. But it is true I found significant multicollinearity in my full model due to the interactions. Even though I still have good significant level for my main variables. Must I solve the multicollinearity?

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Sylvia:
          welcome to this forum.
          As per FAQ, please post what you typed and what Stata gave you back. Thanks.
          Hi Carlo, thank you. I post my results below.
          I just "reg" these variables in stata.
          y Coef. St.Err. t-value p-value [95% Conf Interval] Sig
          control_1 -.041 .017 -2.45 .017 -.074 -.008 **
          control_2 -1.228 3.486 -0.35 .726 -8.165 5.709
          control_3 -1.451 1.113 -1.30 .196 -3.667 .765
          control_4 .105 .719 0.15 .885 -1.325 1.535
          control_5 5.172 2.985 1.73 .087 -.768 11.112 *
          iv_square 477.16 43.563 10.95 0 390.467 563.852 ***
          iv -111.024 39.825 -2.79 .007 -190.279 -31.77 ***
          mv_1 .003 .01 0.35 .729 -.016 .022
          iv_square*mv_1 -3.988 .438 -9.11 0 -4.859 -3.117 ***
          iv*mv_1 .822 .344 2.39 .019 .137 1.507 **
          imv_2 -.851 1.389 -0.61 .542 -3.614 1.913
          iv_square*mv_2 -57.368 25.543 -2.25 .027 -108.2 -6.536 **
          iv*mv_2 14.359 23.391 0.61 .541 -32.191 60.909
          constant 12.957 5.871 2.21 .03 1.273 24.642 **
          Mean dependent var 10.706 SD dependent var 23.087
          R-squared 0.932 Number of obs 94
          F-test 84.935 Prob > F 0.000
          Akaike crit. (AIC) 630.632 Bayesian crit. (BIC) 666.239
          *** p<.01, ** p<.05, * p<.1

          Comment


          • #6
            Sylvia:
            high R_sq+low T-values for most of your predictors=possible quasi-extreme multicollineraity issue (not necesarily a problem).
            Type -estat vce, corr- and see the nasty correlations.
            As as aside:
            1) you'd be better off with using -fvvarlist- notation for interactions and categorical variables;
            2) read hilarious chapter 23 of https://www.hup.harvard.edu/catalog....=9780674175440 for a funny explanation of multicollinearity.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              post the results without the interactions

              Comment


              • #8
                Originally posted by Carlo Lazzaro View Post
                Sylvia:
                high R_sq+low T-values for most of your predictors=possible quasi-extreme multicollineraity issue (not necesarily a problem).
                Type -estat vce, corr- and see the nasty correlations.
                As as aside:
                1) you'd be better off with using -fvvarlist- notation for interactions and categorical variables;
                2) read hilarious chapter 23 of https://www.hup.harvard.edu/catalog....=9780674175440 for a funny explanation of multicollinearity.

                If you don't have easy access to Goldberger's textbook, you can get the gist of Chapter 23 in this blog post by Dave Giles.
                --
                Bruce Weaver
                Email: [email protected]
                Version: Stata/MP 18.5 (Windows)

                Comment


                • #9
                  Dear sylvia wan,

                  To add to the excellent advice you have already received, I note that you are estimating 14 parameters with just 94 observations, whereas for valid inference we generally need the square of the number of parameters over the sample size to be "small." This is may explain both the high R2 and the low t-statistics, and is the other face of multicollinearity as noted by Goldberger in the book that was mentioned above.

                  Best wishes,

                  Joao

                  Comment


                  • #10
                    Originally posted by George Ford View Post
                    post the results without the interactions
                    Hi, I attached all other regressions below.
                    Model 1 Model 2 Model 5 Model 6 Model 3 Model 4
                    control_1 -0.119** -0.050 -0.051 -0.052** -0.050 -0.036
                    (0.053) (0.041) (0.044) (0.022) (0.041) (0.033)
                    control_2 2.169 -1.444 -1.588 -1.220 -1.791 -1.093
                    (11.934) (9.091) (9.220) (4.586) (8.973) (7.257)
                    control_3 -5.110 -3.489 -3.683 -1.288 -3.472 -3.205
                    (3.665) (2.797) (2.952) (1.474) (2.760) (2.227)
                    control_4 0.529 0.340 0.425 0.619 1.170 -0.354
                    (2.166) (1.658) (1.797) (0.901) (1.698) (1.401)
                    control_5 13.478 8.353 8.425 6.433* 3.629 6.047
                    (9.658) (7.372) (7.464) (3.710) (7.720) (6.246)
                    iv_square 154.226*** 153.075*** 415.920*** 153.688*** 380.650***
                    (42.902) (43.878) (35.096) (42.336) (89.552)
                    iv -53.587 -52.574 -91.886*** -52.929 -143.805*
                    (38.419) (39.246) (29.145) (37.913) (82.322)
                    mv_1 0.002 0.007
                    (0.025) (0.013)
                    iv_square*mv_1 -4.807***
                    (0.527)
                    iv*mv_1 1.079**
                    (0.420)
                    mv_2 -5.413* -1.312
                    (2.961) (2.909)
                    iv_square*mv_2 -126.260**
                    (49.730)
                    iv*mv_2 48.607
                    (46.255)
                    Constant 17.341 14.994 15.374 9.588 28.148* 18.426
                    (17.346) (13.211) (13.465) (6.710) (14.890) (12.170)
                    N 95 95 94 94 95 95
                    Adjusted R2 0.033 0.440 0.433 0.860 0.455 0.650

                    Comment


                    • #11
                      looks to me like iv_square is the problem. Adding it Model 2 create large increase in R2. The interaction of mv_1 raises R2 to 0.86.

                      I'd focus my attention on iv_square for starters. I suspect something funky is going on.

                      G

                      Comment


                      • #12
                        Sylvia:
                        what does -linktest- tell you about the specification of the functional form of your regressand?
                        If -linktest- outcome does not reach statistical significance, I'd bet all in on Model 6.
                        Last edited by Carlo Lazzaro; 11 Jan 2023, 01:06.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Originally posted by Carlo Lazzaro View Post
                          Sylvia:
                          what does -linktest- tell you about the specification of the functional form of your regressand?
                          If -linktest- outcome does not reach statistical significance, I'd bet all in on Model 6.
                          Hi, I did the linktest.

                          I regress with the square.
                          reg y controls iv_square iv mv_1 iv_square*mv_1 iv*mv_1 imv_2 iv_square*mv_2 iv*mv_2
                          linktest


                          Source | SS df MS Number of obs = 94
                          -------------+---------------------------------- F(2, 91) = 634.23
                          Model | 46252.4464 2 23126.2232 Prob > F = 0.0000
                          Residual | 3318.17894 91 36.4635049 R-squared = 0.9331
                          -------------+---------------------------------- Adj R-squared = 0.9316
                          Total | 49570.6254 93 533.017477 Root MSE = 6.0385

                          ------------------------------------------------------------------------------
                          rdr_3 | Coefficient Std. err. t P>|t| [95% conf. interval]
                          -------------+----------------------------------------------------------------
                          _hat | .861542 .1533277 5.62 0.000 .5569754 1.166109
                          _hatsq | .0006343 .0006905 0.92 0.361 -.0007374 .002006
                          _cons | 1.097713 1.380735 0.80 0.429 -1.644947 3.840374
                          ------------------------------------------------------------------------------


                          I also regress without the square
                          reg y controls iv mv_1 iv*mv_1 imv_2 iv*mv_2
                          linktest



                          Source | SS df MS Number of obs = 94
                          -------------+---------------------------------- F(2, 91) = 413.16
                          Model | 44653.0745 2 22326.5372 Prob > F = 0.0000
                          Residual | 4917.55088 91 54.0390207 R-squared = 0.9008
                          -------------+---------------------------------- Adj R-squared = 0.8986
                          Total | 49570.6254 93 533.017477 Root MSE = 7.3511

                          ------------------------------------------------------------------------------
                          rdr_3 | Coefficient Std. err. t P>|t| [95% conf. interval]
                          -------------+----------------------------------------------------------------
                          _hat | -.1531973 .1011222 -1.51 0.133 -.3540641 .0476695
                          _hatsq | .0081328 .0006594 12.33 0.000 .006823 .0094426
                          _cons | 8.261672 1.092516 7.56 0.000 6.091522 10.43182
                          ------------------------------------------------------------------------------


                          Comment


                          • #14
                            Sylvia:
                            -linktest- outcome confirms that the squared term makes sense in your regressio code.
                            Stick with Model 6.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment

                            Working...
                            X