Dear Statalists and Statistic-Experts,
I am still working on my research project to predict ‘company risk‘ with balance sheet characteristics (n = 100). I have some unusual values (i.e. outliers) in my dataset. These values are no measurement errors and reflect reality. The observations of the dependent variables are not normally distributed and have a (very) strong ‚peak‘ in the middle (maybe there is a better model for that?).
A regression output (reg) with robust standard errors (vce(robust)) and some log-transformed independent variables provide the afterwards results:

Most applied regressions designs (different control variables) yield for the same variables significant coefficients. Yet, the overall models stay insignificant (F-test) with a low adj. R2. Now, I want to investigate this further and come up with a „better“ model as the variables should explain more of the variation. In terms of Multicolinearity, there seems no problem: Mean VIF = 1.12; all variables around 1. Moreover, the bivariate correlations (>0.4) is considered. The linearity assumption is, except in respect of some unusual ‚values‘, in my personal view no problem.
I think the model suffers from two other problems:
[1] Normality: The Shapiro-Wilk W test for normal data is significant. Hence, I have a problem here as also shown in the below figure:

pnorm: standardized normal probability (P-P)
.... and:

kernel density plot with the normal option
[2.] Homoscedasticity: The IM-Test ist significant (Heteroskedaticity: 72.61; Skewness: 26 and Kurtosis: 3.48). Hence, I have a problem even with the robust standard errors, right?

residuals vs. fitten plot
My main question:
A.] Do I understand it right that I still have problems with normality and heteroscedasticity?
B.] If yes, what does it imply and how could I (would you) improve the model?
Hope there are one or two experts that could help me further with this problem.
THANK YOU!!!! :-)
I am still working on my research project to predict ‘company risk‘ with balance sheet characteristics (n = 100). I have some unusual values (i.e. outliers) in my dataset. These values are no measurement errors and reflect reality. The observations of the dependent variables are not normally distributed and have a (very) strong ‚peak‘ in the middle (maybe there is a better model for that?).
A regression output (reg) with robust standard errors (vce(robust)) and some log-transformed independent variables provide the afterwards results:
Most applied regressions designs (different control variables) yield for the same variables significant coefficients. Yet, the overall models stay insignificant (F-test) with a low adj. R2. Now, I want to investigate this further and come up with a „better“ model as the variables should explain more of the variation. In terms of Multicolinearity, there seems no problem: Mean VIF = 1.12; all variables around 1. Moreover, the bivariate correlations (>0.4) is considered. The linearity assumption is, except in respect of some unusual ‚values‘, in my personal view no problem.
I think the model suffers from two other problems:
[1] Normality: The Shapiro-Wilk W test for normal data is significant. Hence, I have a problem here as also shown in the below figure:
pnorm: standardized normal probability (P-P)
.... and:
kernel density plot with the normal option
[2.] Homoscedasticity: The IM-Test ist significant (Heteroskedaticity: 72.61; Skewness: 26 and Kurtosis: 3.48). Hence, I have a problem even with the robust standard errors, right?
residuals vs. fitten plot
My main question:
A.] Do I understand it right that I still have problems with normality and heteroscedasticity?
B.] If yes, what does it imply and how could I (would you) improve the model?
Hope there are one or two experts that could help me further with this problem.
THANK YOU!!!! :-)
Comment