Adjusted R-squared still high after deleting control variables

Luca Haseney

Join Date: Dec 2022
Posts: 22

Adjusted R-squared still high after deleting control variables

11 Jan 2023, 07:14

Hello Statalist!

Currently I run a regression and wanted to check if my variables explain a lot of variation in the dependent variable.

To do this I once ran the regression with control variables and once without. The adjusted R-squared is only larger by one percent if I include control variables. All of them are variables identified as important by previous literature.

Now I am hesitant to report this. Does this mean something is wrong with my model?

I also tried deleting my main variables and just include the controls, and the adjusted R-squared is still at 89 percent.

I included time and individual fixed effects, clustered for individual.

Cross-posted here:
https://stats.stackexchange.com/ques...trol-variables.

Code:

. reghdfe Y1 x1 x2 x3 c1 c2 c3 c4 c5 c6 c7 c8 c9, absorb(FIRM Year) cluster(FIRM)
(dropped 8 singleton observations)
(MWFE estimator converged in 6 iterations)

HDFE Linear regression                            Number of obs   =        967
Absorbing 2 HDFE groups                           F(  12,    215) =       4.20
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.8595
                                                  Adj R-squared   =     0.8151
                                                  Within R-sq.    =     0.0566
Number of clusters (FIRM)    =        216         Root MSE        =     0.1304

                                 (Std. err. adjusted for 216 clusters in FIRM)
------------------------------------------------------------------------------
             |               Robust
          Y1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   .0274918   .0196236     1.40   0.163    -.0111875    .0661712
          x2 |  -.0207942   .0100225    -2.07   0.039    -.0405492   -.0010391
          x3 |  -.0019367    .000597    -3.24   0.001    -.0031134     -.00076
          c1 |   .1019985    .058054     1.76   0.080    -.0124293    .2164264
          c2 |   -.016942   .0090505    -1.87   0.063     -.034781    .0008971
          c3 |  -.0003009   .0058301    -0.05   0.959    -.0117923    .0111905
          c4 |   .0068474   .0265863     0.26   0.797    -.0455557    .0592505
          c5 |   .0363782   .0308575     1.18   0.240    -.0244438    .0972001
          c6 |  -.0004864   .0009786    -0.50   0.620    -.0024153    .0014426
          c7 |  -.1023495    .089382    -1.15   0.253    -.2785266    .0738277
          c8 |   .0001882   .0000738     2.55   0.012     .0000426    .0003337
          c9 |   .0066574   .0023477     2.84   0.005     .0020299    .0112849
       _cons |  -3.505176   1.365308    -2.57   0.011     -6.19628   -.8140728
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        FIRM |       216         216           0    *|
        Year |         5           0           5     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Similarily, once without the controls:

Code:

. reghdfe Y1 x1 x2 x3, absorb(FIRM Year) cluster(FIRM)
(dropped 8 singleton observations)
(MWFE estimator converged in 6 iterations)

HDFE Linear regression                            Number of obs   =        967
Absorbing 2 HDFE groups                           F(   3,    215) =       5.28
Statistics robust to heteroskedasticity           Prob > F        =     0.0016
                                                  R-squared       =     0.8549
                                                  Adj R-squared   =     0.8114
                                                  Within R-sq.    =     0.0262
Number of clusters (FIRM)    =        216         Root MSE        =     0.1317

                                 (Std. err. adjusted for 216 clusters in FIRM)
------------------------------------------------------------------------------
             |               Robust
          Y1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   .0207614   .0198057     1.05   0.296    -.0182767    .0597995
          x2 |  -.0213535   .0101183    -2.11   0.036    -.0412973   -.0014098
          x3 |  -.0019855   .0005953    -3.34   0.001    -.0031589   -.0008122
       _cons |  -.9862518   .0393388   -25.07   0.000    -1.063791   -.9087126
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        FIRM |       216         216           0    *|
        Year |         5           0           5     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

And once without the other IV:

Code:

. reghdfe Y1 c1 c2 c3 c4 c5 c6 c7 c8 c9, absorb(FIRM Year) cluster(FIRM)
(dropped 8 singleton observations)
(MWFE estimator converged in 6 iterations)

HDFE Linear regression                            Number of obs   =        967
Absorbing 2 HDFE groups                           F(   9,    215) =       2.89
Statistics robust to heteroskedasticity           Prob > F        =     0.0030
                                                  R-squared       =     0.8557
                                                  Adj R-squared   =     0.8109
                                                  Within R-sq.    =     0.0314
Number of clusters (FIRM)    =        216         Root MSE        =     0.1319

                                 (Std. err. adjusted for 216 clusters in FIRM)
------------------------------------------------------------------------------
             |               Robust
          Y1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          c1 |   .1182841   .0640627     1.85   0.066    -.0079873    .2445556
          c2 |  -.0174848   .0088881    -1.97   0.050    -.0350038    .0000342
          c3 |   -.000514   .0058961    -0.09   0.931    -.0121356    .0111076
          c4 |  -.0000784   .0279881    -0.00   0.998    -.0552447    .0550879
          c5 |   .0416486   .0315728     1.32   0.189    -.0205832    .1038804
          c6 |  -.0008959    .000984    -0.91   0.364    -.0028354    .0010436
          c7 |    -.09267   .0946564    -0.98   0.329    -.2792434    .0939033
          c8 |   .0002231   .0000741     3.01   0.003      .000077    .0003691
          c9 |   .0060863    .002376     2.56   0.011      .001403    .0107696
       _cons |  -3.877952   1.484846    -2.61   0.010    -6.804671   -.9512319
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        FIRM |       216         216           0    *|
        Year |         5           0           5     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Last edited by Luca Haseney; 11 Jan 2023, 07:37.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#2

11 Jan 2023, 07:19

Luca:
why not posting what you typed and what Stata gave you back (as per FAQ)? Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Luca Haseney

Join Date: Dec 2022

Posts: 22
#3

11 Jan 2023, 07:38

Dear Mr. Lazzaro, sure. My bad!
Edited in the first post.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#4

11 Jan 2023, 08:37

Hi Luca
So, the reason you have a high R2 is that its also accounts the absorbed effects!. They alone are probably explaining 80% of the variation
What is more relevant here is your within R2. which is only 5.6% using all controls. but as low as 2.6% when looking at X's only.
HTH
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#5

11 Jan 2023, 08:46

Luca:
1) I do share Fernando's concern about the too low within R-sq, that calls for double-checking the functional form of the regressand;
2) just out of curiosity: what is the gain in going -reghfde- rather than -xtreg,fe- with your panel data;
3) please call me Carlo, as all on (and many more off) this forum do. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Luca Haseney

Join Date: Dec 2022

Posts: 22
#6

11 Jan 2023, 09:25

Originally posted by FernandoRios View Post

Hi Luca
So, the reason you have a high R2 is that its also accounts the absorbed effects!. They alone are probably explaining 80% of the variation
What is more relevant here is your within R2. which is only 5.6% using all controls. but as low as 2.6% when looking at X's only.
HTH

Dear Fernando, yes, this helps. Thank you very much!

Dear Carlo,

1) Is there a systematic way to check the functional form?
I oriented at previous literature and applied natural logarithms on the respective variables. The Y1 measure is logarithmic, as well as c4 and c5. The other variables are binary or decimals (i.e. a ratio of some kind).

2) Because I explicitly wanted to include time and firm fixed effects to model unobserved firm data and time effects that affect all firms equally.

3) Sure!
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17708

11 Jan 2023, 09:34

Luca:
1) yes (drawing heavily on -linktest-). The toy-example consider -re- but works with -fe-either:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage i.race i.nev_mar, re vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,518
Group variable: idcode                          Number of groups  =      4,711

R-squared:                                      Obs per group:
     Within  = 0.0263                                         min =          1
     Between = 0.0121                                         avg =        6.1
     Overall = 0.0145                                         max =         15

                                                Wald chi2(3)      =     429.57
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                             (Std. err. adjusted for 4,711 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        race |
      Black  |   -.110084     .01332    -8.26   0.000    -.1361908   -.0839772
      Other  |   .1165283   .0666152     1.75   0.080     -.014035    .2470917
             |
   1.nev_mar |  -.1611142   .0087208   -18.47   0.000    -.1782066   -.1440217
       _cons |    1.72454   .0074549   231.33   0.000     1.709929    1.739152
-------------+----------------------------------------------------------------
     sigma_u |  .38311279
     sigma_e |   .3159974
         rho |  .59512448   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb


. g sq_fitted=fitted^2


. xtreg ln_wage fitted sq_fitted , re vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,518
Group variable: idcode                          Number of groups  =      4,711

R-squared:                                      Obs per group:
     Within  = 0.0263                                         min =          1
     Between = 0.0120                                         avg =        6.1
     Overall = 0.0145                                         max =         15

                                                Wald chi2(2)      =     421.17
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                             (Std. err. adjusted for 4,711 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      fitted |   1.624269   1.588631     1.02   0.307     -1.48939    4.737928
   sq_fitted |  -.1929673   .4921703    -0.39   0.695    -1.157603    .7716688
       _cons |   -.503038   1.278528    -0.39   0.694    -3.008906     2.00283
-------------+----------------------------------------------------------------
     sigma_u |   .3847733
     sigma_e |  .31599437
         rho |  .59721155   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

As -sq_fitted- doe not reach statistical significance, there's no evidence of model misspecification;
2) why not going:

Code:

xtset firm year
xtreg depvar indepvars i.year, fe vce(cluster panelid)

Kind regards,
Carlo
(Stata 19.0)

Comment

Luca Haseney

Join Date: Dec 2022
Posts: 22

11 Jan 2023, 10:31

Dear Carlo, thank you so much for your suggestions. I really appreciate it.

I applied the code for my sample:

Code:

xtreg  Y1 x1 x2 x3 c1 c2 c3 c4 c5 c6 c7 c8 c9 i.Year, fe vce(cluster FIRM)
predict fitted, xb
g sq_fitted=fitted^2
xtreg Y1 fitted sq_fitted, fe vce(cluster FIRM)

Fixed-effects (within) regression               Number of obs     =        975
Group variable: FIRM                            Number of groups  =        224

R-squared:                                      Obs per group:
     Within  = 0.4556                                         min =          1
     Between = 0.0501                                         avg =        4.4
     Overall = 0.0236                                         max =          5

                                                F(2,223)          =     118.62
corr(u_i, Xb) = -0.9464                         Prob > F          =     0.0000

                                 (Std. err. adjusted for 224 clusters in FIRM)
------------------------------------------------------------------------------
             |               Robust
          Y1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      fitted |    1.09574   .1140394     9.61   0.000     .8710069    1.320472
   sq_fitted |   .0454518   .0392435     1.16   0.248    -.0318836    .1227873
       _cons |   .0205962    .078555     0.26   0.793    -.1342089    .1754013
-------------+----------------------------------------------------------------
     sigma_u |  .91800801
     sigma_e |  .12883184
         rho |  .98068551   (fraction of variance due to u_i)
------------------------------------------------------------------------------

It points to no model misspecification.

I also referenced to Mr. Wooldridge's post to check for non-linearity:

Code:

xtreg Y1 x1 x2 x3 c1 c2 c3 c4 c5 c6 c7 c8 c9 i.Year, fe vce(cluster FIRM)
predict xbhat, xb
gen xbhatsq = xbhat^2
gen xbhatcu = xbhat^2
xtreg Y1 x1 x2 x3 c1 c2 c3 c4 c5 c6 c7 c8 c9 i.Year xbhatsq xbhatcu, fe vce(cluster FIRM)
test xbhatsq xbhatcu

Fixed-effects (within) regression               Number of obs     =        975
Group variable: FIRM                            Number of groups  =        224

R-squared:                                      Obs per group:
     Within  = 0.4559                                         min =          1
     Between = 0.0511                                         avg =        4.4
     Overall = 0.0218                                         max =          5

                                                F(17,223)         =      16.97
corr(u_i, Xb) = -0.9299                         Prob > F          =     0.0000

                                 (Std. err. adjusted for 224 clusters in FIRM)
------------------------------------------------------------------------------
             |               Robust
          Y1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   .0300244   .0199219     1.51   0.133    -.0092348    .0692836
          x2 |  -.0224521    .010106    -2.22   0.027    -.0423676   -.0025365
          x3 |  -.0021353   .0006035    -3.54   0.000    -.0033246    -.000946
          c1 |   .0982184   .0552264     1.78   0.077     -.010614    .2070508
          c2 |   -.017616   .0089341    -1.97   0.050    -.0352221   -9.93e-06
          c3 |  -.0003486   .0058461    -0.06   0.952    -.0118694    .0111721
          c4 |   .0147686   .0276555     0.53   0.594    -.0397309    .0692681
          c5 |   .0381699   .0311407     1.23   0.222    -.0231978    .0995376
          c6 |  -.0003709   .0009665    -0.38   0.702    -.0022755    .0015338
          c7 |  -.1128985   .0909374    -1.24   0.216    -.2921051    .0663081
          c8 |   .0001861   .0000739     2.52   0.013     .0000404    .0003317
          c9 |   .0077529   .0025723     3.01   0.003     .0026837     .012822
             |
        Year |
       2018  |  -.0554002   .0574869    -0.96   0.336    -.1686873    .0578868
       2019  |  -.0811843    .111925    -0.73   0.469    -.3017502    .1393817
       2020  |  -.0204323   .1680655    -0.12   0.903     -.351632    .3107675
       2021  |  -.0867472    .224055    -0.39   0.699    -.5282832    .3547889
             |
     xbhatsq |   .0498728   .0411951     1.21   0.227    -.0313087    .1310544
     xbhatcu |          0  (omitted)
       _cons |  -3.615348   1.222262    -2.96   0.003    -6.024008   -1.206687
-------------+----------------------------------------------------------------
     sigma_u |  .80376154
     sigma_e |  .13011625
         rho |  .97446276   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test xbhatsq xbhatcu

 ( 1)  xbhatsq = 0
 ( 2)  o.xbhatcu = 0
       Constraint 2 dropped

       F(  1,   223) =    1.47
            Prob > F =    0.2273

I really have trouble interpreting it. I think it means I can not reject the H0 that the model is correctly specified.

Which in turn again means that the functional form is correct. Yet, the problem with the low r squared persists.
Regarding another thread where you responded, this test is equivalent/similair to the RESET test, so I should be happy with my model and go on?
How else can I explain for this low R-squared? Is it that I simply omitted variables? This would be interesting, since I added all the controls which are suggested by literature.

Last edited by Luca Haseney; 11 Jan 2023, 10:51.

Comment

George Ford

Join Date: Aug 2014

Posts: 3152
#9

11 Jan 2023, 10:38

the fitted/sq_fitted is a joint test, which is what RESET is doing.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#10

12 Jan 2023, 00:51

Luca:
the r_sq you should check in -fe- model is the within one that, in your case, is not that low (0.4556).
Therefore, you should be happy with your model and move on.

Last edited by Carlo Lazzaro; 12 Jan 2023, 00:54.

Kind regards,
Carlo
(Stata 19.0)
Comment
Luca Haseney

Join Date: Dec 2022

Posts: 22
#11

12 Jan 2023, 01:08

Dear Carlo,

thank you very much!

I think I also read somewhere on the forum that the within R squared measure obtained by the reghdfe is misspecified anyways for fixed effects models.
Comment

Announcement

Adjusted R-squared still high after deleting control variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment