Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Violation of Normal distrubution of dependent variable in multiple linear regression

    Hello! I am using multiple linear regression for analyzing my data. Dependent as well as most of my independent variable are highly skewed. Transformation of dependent, polynomials, removing influential IDs following regression diagnostics ....nothing works out. Both normality and homogeneity are badly violated (p=0.0000) Dependent variable ranges from 1-10 (Modified Fall efficacy scale). Is there any way that I could deal with such skewed dependent variable or how can I normalize it.
    Thank you.
    PS: I am using Stata 12.0
    on using command, swilk MFES, p=.00000 and swilk r on removing influential IDs p still stays 0.0002


  • #2
    First things first: Regression does not require that your dependent variable is normally distributed. If anything, there is a slight preference that your residuals are normally distributed, but in datasets with more than say 30 observations you can get away with ignoring that too.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Linear regression models estimated via OLS do not assume a normal distribution of the dependent variable but for the residuals. This assumption is not needed for OLS to yield unbiased and efficient estimates and becomes less important with larger sample size. With, say N > 50 somehow bell-shaped residuals should be fine.

      If your variable is highly skewed, you might want to think about the appropriateness of a linear model to capture the data-generating process. Maybe you should consider a generalized linear model?

      Best
      Daniel

      Comment


      • #4
        Thanks for your reply and correcting me Marteen, Yeah! I need normal distribution of residuals rather than for dependent variable. But, even my residuals are neither normally distributed on swilk (p=.0000) nor visible to be normally distributed on histogram. I also thought the same way that n>30, so it should be ok to go ahead (by referencing central limit theorem) but wished to confirm if there is some different way to analyze. Thanks again for your time!

        Comment


        • #5
          Dear Daniel
          Thanks for your suggestion that I should look on using generalized linear model. I have recently learnt multiple regression using Stata to analyze my data and not much familiar with Generalized linear model yet. But, Thanks a lot for your advice, I will surely read on it....
          Regards
          Neha

          Comment

          Working...
          X