How to evaluate the credibility of a regression model?

Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#1

How to evaluate the credibility of a regression model?

11 May 2017, 08:45

Dear Statalist respectful users,

I am running 3 multivariate regression models based on 3 dependant variables, I have 31 regressors (out of which 25 independent and 6 control). the regressors contain dummy variables, continuous variables, percentages, categorical variables and my dependant variables are ratios.I have unbalanced panel data (78 companies in 16 years: 1,063 observations). I ran pooled OLS estimation (regress command including year dummies and industry dummies) and I found the following results

Based on the probability of F (0.0000) I reject the null hypothesis that the coefficients estimated = 0. In other words, my model is fine! However, the post-estimation tests showed a noticeable departure from the basic assumptions of the OLS. I tested for the heteroskedasticity (estat hettest), normality (predict r, residuals then estat swilk), collinearity (estat vif) and auto-correlation (xtserial), and I can tell that the assumptions of the OLS were violated.

Then, I ran the Breusch-Pagan Lagrange multiplier (LM) test and the results emphasised the existence of a panel effect so, I ran FE and RE estimators followed by Hausman test and I found that the FE model is consistent with 2 regression models and the RE is consistent with the third. I ran the FE and RE models with the "vce (cluster panelid)" and "nonest" options to control for potential heteroskedasticity and auto-correlation and I got the following results:

Then, I did some post-estimation test after the RE and FE models (with the vce cluster option), namely, "xttest0" for heteroskedasticity and the test statistics were 0.0000 to conclude that there is a heteroskedasticity problem. I am not aware of any command to use to test for auto-correlation except xtserial, so, I have no idea whether the vce cluster option addressed the auto-correlation problem. One of my colleagues advised me to use the f value of the regression as a benchmark to compare between estimators but the probability of F is literally identical (0.0000).
I need some help in finding a way to evaluate the models to pick up the best that matches my data.

Please note that I was not able to test for unit root (stationarity) because the commands I found on Stata work only with balanced panel data, I read something about the ability of Hadri LM test to work with unbalanced panel data but I can't find the correct syntax for this.

I am looking forward to hearing from you.

All the best,

Mohammed
Attached Files
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17674
#2

11 May 2017, 11:46

Mohammed:
your post is admittedly too long.
Please wrap it up and rank your questions. Thanks.
That said, your first regression model is surely biased, as you treated panel data as it were composed of independent observations. You shoud clustered your standard errors on panelid, instead.
As far as panel regression models are concerned, you stated that you run -hausman- test (which allows default standard errors only) to investigate whether -re- ouperforrms -fe- specifications. However, you reported the results of regression models with clustered standard errors. Please, note that the user-written programme -xtoverid- (tyope -search xtoverid- from within Stata) performs a robust -hausman- test allowing clustered standard errors.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#3

11 May 2017, 12:04

Dear Carlo Lazzaro

Thanks a lot for your priceless reply. I will try the xtoverid command and see whether the results are changed.
However, I do appreciate if you help me to find a criteria by which I can say that the estimations are consistent and unbiased. Is it about the standard errors? Is it about the F-statistics of the model? Is it about the post-estimation tests?
Thanks a lot
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17674
#4

11 May 2017, 22:40

Mohammed:
no black or white magic about that.
If you exclude: heteroskedasticity, autocorrelation (that you can accomodate via clustered standard errors, though), endogeneity (especially reversal causation, that usually hits reviewers' eyes), omitted variables (which intersect with endogenety) and your regression model adequately reflects the data generating process that you're investigating (also according to the literature in your research field), then your regression estimates are defendable (usually there's no such a thing as a perfect regression model).
As an aside, I'm still not clear with your notion of robustness.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#5

12 May 2017, 04:10

Dear Carlo Lazzaro

I run the xtoverid command after running the RE with the vce cluster panelid option and the results changed. Now, according to the xtoverid test statistics, the FE model outperfrom the RE in the three regression models.

About the robustness, as far as I know, the robust standard errors are empirically generated by estimating ∑ because it is 'usually' unknown and according to Wooldridge (2002) and Greene (2003), these empirically generated standard errors are robust and ,then, the homoskedasticity assumption is met and the coefficients obtained are consistent and unbiased. I am not expert in econometrics, so, please excuse my naive language in this field.

Regarding staionarity, I found a user-written command "xtfisher" to test for the existence of unit root in unbalanced panel dataset, But, I am not sure if I have to meet the assumption of stationarity in all the regressors, the idiosynchratic error term of the model or I can just ignore this when I apply the vce cluster panel id option.

Thanks a lot
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17674
#6

12 May 2017, 04:58

Mohammed:
-you've already addressed the robust/cluster SEs issue;
-as far as I know, stationarity in panel data does not creep up that frequently on this list.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#7

12 May 2017, 08:55

Thanks a lot Carlo Lazzaro and please excuse my limited knowledge of econometrics.
I do appreciate you being very patient with me
I wish you have a lovely weekend.

Kind regards,
Mohammed
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17674
#8

12 May 2017, 10:48

Mohammed:
everybody has a limitated knowledge vs what she/he would (or could) have learnt.
In a quite old of his post, Nick Cox said (in a far better English, of course) that "We all are beginners, some of us are only more experienced".
I do share his thought.
I reciprocate all the best for the incoming week-end (and over, of course).

Last edited by Carlo Lazzaro; 12 May 2017, 11:42.

Kind regards,
Carlo
(Stata 19.0)
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

12 May 2017, 11:16

Originally posted by Mohammed Kasbar View Post

...

Then, I did some post-estimation test after the RE and FE models (with the vce cluster option), namely, "xttest0" for heteroskedasticity and the test statistics were 0.0000 to conclude that there is a heteroskedasticity problem. I am not aware of any command to use to test for auto-correlation except xtserial, so, I have no idea whether the vce cluster option addressed the auto-correlation problem. One of my colleagues advised me to use the f value of the regression as a benchmark to compare between estimators but the probability of F is literally identical (0.0000).
I need some help in finding a way to evaluate the models to pick up the best that matches my data.
...

Mohammed,

Not sure if I can add anything other than Carlo's advice. I'm not an economist, although we had one methods class with an econometrician (where I understood maybe a third of what he was saying).

However, reading your initial post, and some of your responses, perhaps you can clarify one thing. I believe the F-test presented in the Stata output is a test comparing each model you fit to a null model. In other words, it's a test of whether all the coefficients are jointly zero. Not useful to compare models.

I suspect your colleague meant that you could use a F-test to This would be akin (not equivalent) to using a likelihood ratio test to compare two nested models. I'm actually not sure that's useful either. We know that the fixed effect model is always consistent, and that the random effects model is more efficient than the fixed effect model but it may be inconsistent. If the test you used suggests to stick with the fixed effect model, then you should stick with the FE model, regardless of an F test or LR test. Further, I don't think that model selection via F tests or LR tests will be useful if the problem is actually that one estimator is biased but the other isn't. It's also not clear what model you are trying to compare via F statistics; I thought you were asking about comparing FE vs RE models, but I could be wrong.

I reiterate Carlo's reminder that no models are perfect. There's a saying that all models are wrong, but some models are useful, usually attributed to George Box. There are always problems of heteroskedasticity or something like that.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#10

12 May 2017, 11:31

Dear Weiwen Ng

Thanks a lot for your contribution. I wanted to compare between Pooled OLS, FE and RE estimators as I explained in my original post. I was thinking that there might be a criteria by which I can evaluate the coefficients estimated by a given model. However, now I understand the complexity of this process and it is likely to be more subjective rather than objective. Thanks a lot again and sorry for any confusion I might have caused.

Mohammed
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17674
#11

12 May 2017, 11:45

Mohammed:
assuming a default standard errors, POLS outperforms -xtreg, fe- only when the F-test at the foot of -xtreg, fe- outcome table (which tests that the individual effects are jointly not different form zero) lacks statistical significance (usualy, this seldom happens). Otherwise, go -xtreg, fe-.
The you can compare -xtreg, fe- vs -xtreg, re- via -hausman- (or -xtoverid- if you suspect that robustified/clustered SEs are needed).
I suspect that you're keeping walking along the same road but (and unavoidably so), when you've tried all the strategies you're back to square one.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#12

12 May 2017, 12:27

Dear Carlo Lazzaro
This is exactly what is happening with me. But, I will move on now after your recommendations. Thanks a lot.
Also, I will try to use the GMM to address endogeneity especially that the literature assumes a dynamic effect between my dependent and independent variables. The merit in GMM is that it uses the lags of the dependent and independent variables as instruments. this will save a lot of time finding a strong instrument if I go with the 2SLS estimation to address endogeneity.
Comment

Announcement

How to evaluate the credibility of a regression model?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment