Dear Statalist respectful users,
I am running 3 multivariate regression models based on 3 dependant variables, I have 31 regressors (out of which 25 independent and 6 control). the regressors contain dummy variables, continuous variables, percentages, categorical variables and my dependant variables are ratios.I have unbalanced panel data (78 companies in 16 years: 1,063 observations). I ran pooled OLS estimation (regress command including year dummies and industry dummies) and I found the following results

Based on the probability of F (0.0000) I reject the null hypothesis that the coefficients estimated = 0. In other words, my model is fine! However, the post-estimation tests showed a noticeable departure from the basic assumptions of the OLS. I tested for the heteroskedasticity (estat hettest), normality (predict r, residuals then estat swilk), collinearity (estat vif) and auto-correlation (xtserial), and I can tell that the assumptions of the OLS were violated.
Then, I ran the Breusch-Pagan Lagrange multiplier (LM) test and the results emphasised the existence of a panel effect so, I ran FE and RE estimators followed by Hausman test and I found that the FE model is consistent with 2 regression models and the RE is consistent with the third. I ran the FE and RE models with the "vce (cluster panelid)" and "nonest" options to control for potential heteroskedasticity and auto-correlation and I got the following results:



Then, I did some post-estimation test after the RE and FE models (with the vce cluster option), namely, "xttest0" for heteroskedasticity and the test statistics were 0.0000 to conclude that there is a heteroskedasticity problem. I am not aware of any command to use to test for auto-correlation except xtserial, so, I have no idea whether the vce cluster option addressed the auto-correlation problem. One of my colleagues advised me to use the f value of the regression as a benchmark to compare between estimators but the probability of F is literally identical (0.0000).
I need some help in finding a way to evaluate the models to pick up the best that matches my data.
Please note that I was not able to test for unit root (stationarity) because the commands I found on Stata work only with balanced panel data, I read something about the ability of Hadri LM test to work with unbalanced panel data but I can't find the correct syntax for this.
I am looking forward to hearing from you.
All the best,
Mohammed
I am running 3 multivariate regression models based on 3 dependant variables, I have 31 regressors (out of which 25 independent and 6 control). the regressors contain dummy variables, continuous variables, percentages, categorical variables and my dependant variables are ratios.I have unbalanced panel data (78 companies in 16 years: 1,063 observations). I ran pooled OLS estimation (regress command including year dummies and industry dummies) and I found the following results
Based on the probability of F (0.0000) I reject the null hypothesis that the coefficients estimated = 0. In other words, my model is fine! However, the post-estimation tests showed a noticeable departure from the basic assumptions of the OLS. I tested for the heteroskedasticity (estat hettest), normality (predict r, residuals then estat swilk), collinearity (estat vif) and auto-correlation (xtserial), and I can tell that the assumptions of the OLS were violated.
Then, I ran the Breusch-Pagan Lagrange multiplier (LM) test and the results emphasised the existence of a panel effect so, I ran FE and RE estimators followed by Hausman test and I found that the FE model is consistent with 2 regression models and the RE is consistent with the third. I ran the FE and RE models with the "vce (cluster panelid)" and "nonest" options to control for potential heteroskedasticity and auto-correlation and I got the following results:
Then, I did some post-estimation test after the RE and FE models (with the vce cluster option), namely, "xttest0" for heteroskedasticity and the test statistics were 0.0000 to conclude that there is a heteroskedasticity problem. I am not aware of any command to use to test for auto-correlation except xtserial, so, I have no idea whether the vce cluster option addressed the auto-correlation problem. One of my colleagues advised me to use the f value of the regression as a benchmark to compare between estimators but the probability of F is literally identical (0.0000).
I need some help in finding a way to evaluate the models to pick up the best that matches my data.
Please note that I was not able to test for unit root (stationarity) because the commands I found on Stata work only with balanced panel data, I read something about the ability of Hadri LM test to work with unbalanced panel data but I can't find the correct syntax for this.
I am looking forward to hearing from you.
All the best,
Mohammed
Comment