Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Robust Standard Errors: Still a problem with Heteroskedaticity?

    Dear Statalists and Statistic-Experts,

    I am still working on my research project to predict ‘company risk‘ with balance sheet characteristics (n = 100). I have some unusual values (i.e. outliers) in my dataset. These values are no measurement errors and reflect reality. The observations of the dependent variables are not normally distributed and have a (very) strong ‚peak‘ in the middle (maybe there is a better model for that?).

    A regression output (reg) with robust standard errors (vce(robust)) and some log-transformed independent variables provide the afterwards results:

    regressionxyy.png

    Most applied regressions designs (different control variables) yield for the same variables significant coefficients. Yet, the overall models stay insignificant (F-test) with a low adj. R2. Now, I want to investigate this further and come up with a „better“ model as the variables should explain more of the variation. In terms of Multicolinearity, there seems no problem: Mean VIF = 1.12; all variables around 1. Moreover, the bivariate correlations (>0.4) is considered. The linearity assumption is, except in respect of some unusual ‚values‘, in my personal view no problem.

    I think the model suffers from two other problems:

    [1] Normality: The Shapiro-Wilk W test for normal data is significant. Hence, I have a problem here as also shown in the below figure:

    Graphpnorm.png
    pnorm: standardized normal probability (P-P)

    .... and:

    Graphxy.png
    kernel density plot with the normal option


    [2.] Homoscedasticity: The IM-Test ist significant (Heteroskedaticity: 72.61; Skewness: 26 and Kurtosis: 3.48). Hence, I have a problem even with the robust standard errors, right?

    Graph_Homoske.png
    residuals vs. fitten plot


    My main question:

    A.] Do I understand it right that I still have problems with normality and heteroscedasticity?

    B.] If yes, what does it imply and how could I (would you) improve the model?

    Hope there are one or two experts that could help me further with this problem.

    THANK YOU!!!! :-)
    Last edited by Konstantin Fischer; 15 Sep 2020, 09:16.

  • #2
    Konstantin:
    1) once you've invoked -robust-, threre's no gain in repeating -estat hettest-,as the results will be absolutely the same as before invoking -robust-;
    2) normality is a (weak) requirement for residual distribution only.

    What I would say, instead, is that with 109 observations your sample size is too limited to convey interesting coefficients (that is, the lack of statistical significance may well be due to your limited sample size or to an absence of any relationship between predictors and regressand in the population from which your sample was drawn).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      The point about robust standard errors is not that they give you a robust regression, more that they give you a better idea of the uncertainty around the model you chose.

      I don't find P-P plots (of what?) nearly as informative as Q-Q plots, but your residual versus fitted plot makes it pretty clear to me your outliers are warping the model fit. I approve of your stance that they should be respected as genuine but the implication is then that you need a different model or method of fitting.

      Advice surely depends on knowing more about the data which aren't too numerous for you to give them all here through a data example. What is your measure of risk and what are its possible bounds?

      Comment


      • #4
        Thank you Carlo & Nick for the fast and helpful reply. It was the confirmation I needed.

        I am aware of the difference between robust models (i.e. dealing with outliers, leverage values etc.) and robust standard errors.

        As I do not 'want' to exclude the outliers & can't increase the sample size... a probit or logistic regression may serves now as an alternative solution (to predict the "good" and "bad").

        Btw. the risk is measured as a kind of Altman's Z-Score (but for a different period as the applied variables).

        Thanks, again!
        Last edited by Konstantin Fischer; 15 Sep 2020, 11:26.

        Comment


        • #5
          As Nick mentioned, your model seems misspecified. I would guess you are missing important regressors, especially for these outliers, even leading to omitted variable bias.

          I assume that you are already using all the available data, but if you have balance sheets for these companies for multiple years panel data models are the way to go, especially when there's a lot o firm fixed effects that panel modeling can handle with.

          Also, with small samples, even after you already cleared out all possibilities of misspecification and is still suspicious about distributions and standard errors estimation, bootstrap could be useful.

          Comment


          • #6
            Sorry, but Altman's Z-score means nothing much to me. How does that differ from any other Z-score which I take to imply (value MINUS mean) / SD?

            Comment


            • #7
              The Altman Z-score is generally used to predict corporate bankruptcy. It differs from the Z-score as it does not involve any reference to the (value - mean)/SD of a series. More can be found here https://en.wikipedia.org/wiki/Altman_Z-score
              Regards
              --------------------------------------------------
              Attaullah Shah, PhD.
              Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
              FinTechProfessor.com
              https://asdocx.com
              Check out my asdoc program, which sends outputs to MS Word.
              For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

              Comment


              • #8
                Thanks for that information, Attaullah Shah . My question what are its possible bounds? and the request for a data example remain.

                Comment

                Working...
                X