R2-values are very, very low.

Goran Ekmekci

Join Date: Dec 2021

Posts: 8
#1

R2-values are very, very low.

21 Dec 2021, 16:40

Im currently writing my bachler's thesis about economic growth (GDP per capita) and democracy (Polity), and some other independent variables like Economic Freedom, Life expectancy, mean years of schooling and investments.

Background:
I have done tests to control for heteroskedasticity, autocorrelation, stationarity and multicollinearity.

The data is heteroskedastic (to counter this, robust standard errors are used, which in Stata 16.1 is clustered)
No autocorrelation
One of the variables was non-stationarity (Economic Freedom) and was remade in first difference which solved to issue of non-stationarity.
No multicollinearity (according to results from VIF)

I use panel data with a panel a strongly balanced panel (because I interpolate missing values).
t = 23
n = 147
observations = 3381

I conducted a hausman test and realized that the fixed model is prefered for this panel.

To the problem and analysis:
When I conduct my panel data analysis, robust command is used (xtreg variable 1, variable 2 etc, fe robust)

All the results are insignificant and the R2-value is very low as you can see. I am concerned about this since I was expecting a higher R2-value since many of the independent variables is directly linked to GDP per capita.
I also know that insignificance is not a problem since it is also a result that can be interpreted as "there is no connection between these variables". But since the R2 is so low I am afraid that there is something wrong with the entire analysis.

I also ran a miss-specification test to see if I the model was miss-specificed, the null hypothesis was rejected indicating no problem of miss-specification.

If there is no problem here, could you please give an example of an article that explains that low R2 is not a problem.
And if there is a problem, what do you think the problem is? Many previous studies on the subject have used the same type of variables without this problem.
If there is any uncertainties, feel free to ask me.

PS. I know of the dataex command, but I think this is a better visual representation.

Last edited by Goran Ekmekci; 21 Dec 2021, 16:45.
Tags: fixed effects, panel data, R2
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17612
#2

22 Dec 2021, 01:08

Goran:
two trivial question:
1) were you able to detect a panel-wise effect in your dataset?
2) are you reporting overall R-sq or within R-sq in your table?

As an aside, please note that (as per FAQ) reporting what you typed and what Stata gave you back (instead of describing it) can well increase your chances of getting (more) helpful replies. Thanks.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Goran Ekmekci

Join Date: Dec 2021

Posts: 8
#3

22 Dec 2021, 06:04

Hello Carlo!
Thank you for the reply, I appreciate it a lot!

Question 1:
I ran the -xttest0- after using -xtreg, re- on my variables. The results showed the following:

I know the test rejects, but I don't know what that indicates.

Question 2:
When I store my estimates with -esttab- I write -esttab m1, r2- if that makes sense.

PS. I tried using dataex but I just get this:
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17612
#4

22 Dec 2021, 06:30

Goran:
1) in your original post, you mentioned the -fe- estimator; however, from your last post, I notice that you switched to -re- specification.
Let's focus on -re- then:
a) with 147 panels, cluster-robust standard error is mandatory;
b) it's wise to run -xttest0- after -xtreg,re-; in your case, there's evidence of panel-wise effect (ie, go -xtreg,re-);
c) it is impossible to get from your post whether or not you compared -fe- vs -re- specification. If you did not, after invoking the cluster-robust standard error, you can test if -xtreg,re. is the way to go via the community-contributed module -xtoverid- (if the null is rejected, go -xtreg,fe-);
c) as your R-sq between (the one that you should consider when you go -xtreg,re-) is low, I would investigate the correlation between your predictors via -estat vce, corr- after -xtreg,re-.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Goran Ekmekci

Join Date: Dec 2021

Posts: 8
#5

22 Dec 2021, 06:57

Hey again Carlo!
Thank you for your quick reply.

1. Yes, when I tried to do the -xttest0- it said I had to use -re- instead of -fe-.
"last estimates not xtreg, re" was the error message.

A) Alright!

B) Okey! But I havent checked for the -xtreg, fe- yet since the -xttest0- wont allow me. As I mentioned in my original post, the hausman test signified fixed effects to be used but without robust standard errors since that is not allowed when conducting the hausman test.

C) When I try to compute the -xtoverid- I get the following error message:

First step. -xtreg polGDPpercapita polpolity polEconFree pollifeexp poltinvestments polschoolyear, re robust-
Second step. -xtoverid-

C) I ran the -estat vce, corr- after -xtreg ,re- and this is the results. I am not sure what to make of this.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17612
#6

22 Dec 2021, 07:21

Goran:
1) -xttest0- should be used after -xtreg,re- only;
B) impose cluster-robust standard error and retest via the community-contributed module -xtoverid- if -xtreg,re- is actually the way to go;
C) to make the community-contributed module works, you have to download the suggested community-contributed modules;
as far as the outcome of -estat vce, corr- is concerned, -0.6962 correlation might be a sign that you should choose between -polschoolyear- and -polifeeexp-.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Goran Ekmekci

Join Date: Dec 2021

Posts: 8
#7

22 Dec 2021, 08:25

Hey again!
And thanks for the answer!

B) The test was rejected:

Which indicates that -xtreg, fe robust- should be used.

C) Alright, I guess I will have to find other independent variables to be included.

But the R2-value is still very low. Maybe this can be an affect of using interpolated variables?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17612
#8

22 Dec 2021, 09:52

Goran:
cluster-robust standard errors highlights that your model is not better than -mean polGDPpercapita-.
There's strong case for reconsidering your predictors and then re-run -xtoverid-.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35219
#9

22 Dec 2021, 10:21

I don't know how or indeed whether polGDPpercapita differs from what I would expect of GDP per capita but it's my experience that GDP per capita is best analysed on log scale.
1 like
Comment
Goran Ekmekci

Join Date: Dec 2021

Posts: 8
#10

22 Dec 2021, 13:06

Carlo:
Thank you once again for your answer.
I decided to find other independent variables (I hope that is what you meant by predictors in that context). But if the -xtoverid- test is significant again under -xtreg, re robust- should I use the xtreg, fe robust- for my model instead?

Nick:
Thank you for taking your time.
There is not real difference other than some missing values have been lineary interpolated. Even before interpolation there was negative values, which is an indication that log can not be used, right? Also the variable for GDP per capita is measured in (annual % growth) which is then converted to first difference values (from the original source).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35219
#11

22 Dec 2021, 13:17

OK. So the variable name does not mean what I inferred. Logarithms plain and simple will not make sense, but transformations are not ruled out.
Comment
Goran Ekmekci

Join Date: Dec 2021

Posts: 8
#12

22 Dec 2021, 17:12

Hello again Nick!
Thank you for the reply.

I am not to familiar with transformations, what kind of transformations are you suggesting?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35219
#13

23 Dec 2021, 06:22

Looking a little more carefully at this thread I see values for polGDPercapita for Afghanistan for 1996-2002 that look utterly extraordinary -- based on what?

That could be part of your problem: some of your data are based on extrapolation to fill in missing values.
Comment
Goran Ekmekci

Join Date: Dec 2021

Posts: 8
#14

23 Dec 2021, 11:05

Hello again Nick!
Thank you for your answer.

Taking a look at it, the interpolated numbers are a bit off from what would be "normal" or "okey".
I ran all the tests for heteroskedasticity, autocorrelation etc, got the same results as with the interpolated variables. But when I ran the panel regression -xtreg, fe robust- there was some results being significant (which was not the case previously with the interpolated variables). Also, even though the R2-value is still low its higher than that of the previous one.

For further information, I have decided to leave out the interpolated variables and stick with the "normal" ones. Also I have omitted "Life expectancy" from the sample due to correlations with education, which can give biased results. I have instead chosen to use corruption as another independent variable. Im still cleaning the data, when that is done I will run everything over again!

Thank you for noticing this!
Comment

Announcement

R2-values are very, very low.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment