Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I run regression with a slightly negatively skewed dependent variable?

    Hello Clyde Schechter, Bruce Weaver, George Ford, lorenabarberia, and Noor Sethi,

    I am sorry for bothering you all repeatedly. I am encountering another issue. One of my dependent variable turned out to be a bit negatively skewed (skeweness is -1.516). I have read some articles that suggest we the range of the skewness should be lower than ± 2 and Kurtosis is ±7. In this case can I run regression analysis without any transformation such as first reflection and then log transfer? I learn that if I reflect any dependent variable, I have to de-reflect the coefficients. Would you please suggest me? For your information my sample is 9600 and I am using bootstrap weighting with 1000 reps.
    Thank you in advance,

    Iqbal Chowdhury

  • #2
    you could use ln, or not in your case (it's within the limits of skewness you list).

    when you transform the DV, the coefficients will change. It's not difficult to interpret however, and margins makes it easy (dydx, dyex).

    Comment


    • #3
      I would add that it is a common misunderstanding that dependent variables should have normal distributions. This is not true. There is a theorem that says that if the residuals of an OLS regression are normally distributed, then that is a sufficient condition to assure that the calculated t-statistics do, in fact, have a t-sampling distribution, and it justifies the usual protocol for statistical inference following OLS. I believe that the myth that dependent variables must have normal distributions arises from misunderstanding this theorem.

      However, OLS regression is very robust even to departures from normality of the residuals. It is not hard to prove that in sufficiently large samples, the central limit theorem guarantees that the calculated t-statistics will have a t-sampling distribution (asymptotically), so that the usual protocol for statistical inference holds even without strong distributional assumptions about the residuals. Indeed, the only distributional assumption necessary for this result to hold is that the residual distribution be of finite variance. As a practical matter, in real world research, this assumption is pretty much always met. So the distributional requirements can be ignored if the sample size is sufficiently large. Just how large is sufficiently large depends on the residual distribution itself, and the more skewed it is, the larger the sample must be for the central limit theorem to come to the rescue. But a skewness of -1.56 is very comfortably dealt with by a sample size of a few hundred. So unless you are analyzing a pretty small sample, you don't need to worry about this and should not transform anything unless there is some other good reason to do so.

      Comment


      • #4
        Hello George Ford and Clyde Schechter,

        Thank you so much for your valuable insights and comments.
        ​​​​​​​
        I am a bit confused with the following thing. Can you please suggest me the possibly best way to deal with the issue.

        One of my hypotheses is that compared to Canadian born and long residing immigrants (those living in Canada for 10 and more years), recent immigrants (those are in Canada for less than 10 years) are likely to be better mental health. In this case my primary variables are: DV= PMH, IVs= immigrant (a three category immigration status; 1= Canadian born, 2= recent immigrant, 3= long residing immigrant) and GDP (log regional GDP per capita). Here, I am considering 5 regions, AC, QC, ON, Prairies, and BC. So, the GDP variable has only 5 data points. In this context, I am planning to consider this variable as a contiguous variable and plot the OLS model with bootstrapping in STATA as follows:

        svy: regress PMH ib1.immigrant GDP c.GDP#ib1.immigrant
        svy: regress PMH ib1.immigrant GDP c.GDP#ib1.immigrant i.AGE i.SEX i.Marital_STATUS i.EDUCATION i.INCOME i.RURAL_URBAN

        Do you think consideration of the GDP variable with 5 data points as a continuous variable will cause any issue in answering research question related to the hypothesis?

        I will gracefully appreciate your input in this respect.

        Many many thank you in advance,

        Iqbal

        Bruce Weaver, lorenabarberia, and Noor Sethi,

        Comment

        Working...
        X