Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help evaluating residual distribution

    Hi!

    I have a panel with N=19 regions and T=81 observations, and compare separate time-series estimation with multiple panel data estimators.
    In the regression diagnostics for separate estimations, -sktest- rejects null of normally distributed residuals in 12 out of 19 regions (at the 10% level, and in 5 regions at the 1% level).
    However, I've been told that the -sktest- has been criticized for rejecting the null of normality too often. Therefore, I have drawn qnorm-plots and histograms to illustrate the residual distributions, but I'm not sure how to evaluate them.

    Here are the qnorm-plot and histogram (bin=31 (auto)) for all residuals from all regions pooled:



    Should I be concerned?

  • #2
    The residuals have a little skew and, based upon the histogram, they're a little leptokurtic, too. (You can also try pnorm to zoom in on the latter more, if you're interested.) Depending upon the intended use of data (I'm not at all knowledgeable about comparing separate time-series estimation with multiple panel data estimators, and so cannot help you there), the extent of non-normality in the residuals might not matter much. Try ladder to find a transformation of the data that ameliorates the residual skew (also take a look at findit transint for advice here) and see whether it makes a substantial difference to your scientific interpretation of the results.

    Comment


    • #3
      The normal probability plot is more useful and more revealing than the histogram. It shows that the distribution of residuals has a systematically heavier upper tail than a normal distribution. That may be cause for concern. That plot, however, appears to be based on the combined residuals from all 19 regions. I would look at the corresponding plot for each region separately. In your message I did not see any information about the nature of the dependent variable. That would influence suggestions for further steps if the plots for the individual regions show clear departures from a normal distribution.

      Comment


      • #4
        Thank you both for answering!

        The dependent variable is the growth rate of real housing prices, and my research question is how the impact of different explanatory variables (in a demand equation) vary across regions. The above plot and histogram show pooled residuals from separate time-series estimation.

        Below are separate qnorm-plots:


        As I interpret them, they indicate long tails for most regions - i.e some outliers.
        I have tried including a dummy for the financial crisis, and this seems to reduce the non-normality for some regions.
        Is including dummy variables for special events a good approach to deal with long tails?

        Comment


        • #5
          Thank you for sharing the plots for the 19 regions!

          In the regions where the tails are heavier, the issue is mainly the upper tail. I would identify the observations that produced those residuals and look at their characteristics. If those observations are associated with the financial crisis, that dummy variable would be appropriate. Do they have other characteristics in common that your analysis should take account of? It's reasonable for a model to account for known sources of systematic behavior in the data.

          Sometimes a transformation is appropriate for growth rates. I would consider using the logarithm. If the range of growth rates is not wide, however, a transformation may not do much.

          I don't see many actual outliers. Just heavier upper tails in some regions.

          Comment


          • #6
            Once again, thank you - I really appreciate your answers!

            I sortet the residuals and looked into the top 50.
            - 16 of them represent 2008q4: The fiancial crisis - a dummy will be added in the forecasts
            - 13 of them represent 1998q3: Peak of the Asian crisis, huge raise in interest rates to defend national (Norwegian) currency. The interest rate is one of my explanatory variables, but this may imply that the effect of the interest rate is not linear (e.g. the elasticity is increasing in the level).
            - 9 of them represent 1994q1: Winter olympics were held in february (Norway is a small country)

            The rest seems to be random periods.

            The variables are measured in first-difference of logaritms (i.e. approximately growth rates)

            Comment

            Working...
            X