Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to assess 'close to normal' distribution using skew, kurtosis and -sktest-?

    Hi Stata list,

    I am running 3x logistic regressions, each with an interaction.
    For example,
    Code:
    logistic fightpart i.visittoptwohotspot i.clubbingfreq i.drinkconsumed c.restrictivetotal i.drinkconsumed#c.restrictivetotal
    I looked at the distribution of each of the three predictors - restrictivetotal inhibitedtotal exageratedtotal. To me, the historgrams for restrictivetotal and inhibitedtotal do not look like a 'close to normal' distribution, though my judgment and experience in data analysis are limited. I think the skew and kurtosis numbers look OK. But the -sktest- command shows that they are not normal.

    Do the skew and kurtosis figures represent a close to normal distribution? Is it worth paying attention to the results of the -sktest-?

    cheers
    Emily

    Code:
     
     . sum restrictivetotal inhibitedtotal exageratedtotal, detail                        restrictivetotal -------------------------------------------------------------       Percentiles      Smallest  1%            5              5  5%            6              5 10%            8              5       Obs                 663 25%           11              5       Sum of Wgt.         663  50%           14                      Mean           14.37557                         Largest       Std. Dev.      5.093015 75%           18             25 90%           21             25       Variance        25.9388 95%           23             25       Skewness       .1411719 99%           25             25       Kurtosis       2.379279                         inhibitedtotal -------------------------------------------------------------       Percentiles      Smallest  1%            5              5  5%            5              5 10%            5              5       Obs                 663 25%            8              5       Sum of Wgt.         663  50%           11                      Mean           12.00905                         Largest       Std. Dev.      5.233223 75%           15             25 90%           20             25       Variance       27.38662 95%           22             25       Skewness       .5554661 99%           25             25       Kurtosis       2.515539                         exageratedtotal -------------------------------------------------------------       Percentiles      Smallest  1%            8              5  5%           10              6 10%           12              6       Obs                 663 25%           14              6       Sum of Wgt.         663  50%           17                      Mean           16.63499                         Largest       Std. Dev.      3.794906 75%           19             25 90%           22             25       Variance       14.40131 95%           23             25       Skewness      -.0951031 99%           25             25       Kurtosis        2.88996   . sktest restrictivetotal inhibitedtotal exageratedtotal                      Skewness/Kurtosis tests for Normality                                                           ------ joint ------     Variable |        Obs  Pr(Skewness)  Pr(Kurtosis) adj chi2(2)   Prob>chi2 -------------+--------------------------------------------------------------- restrictiv~l |        663     0.1354        0.0000       21.00         0.0000 inhibitedt~l |        663     0.0000        0.0010       33.98         0.0000 exagerated~l |        663     0.3132        0.6303        1.25         0.5345

  • #2
    Normality of the *response* variable can be relevant in a regression model, but normality of the explanatory variables is not. (And, in fact, it's the residual distribution of the response variable, not its unconditional distribution, that matters.) So, you would not need to answer the question you ask here if its purpose is to support your regression modeling.


    Comment


    • #3
      I just wish to add that, if the number of observations is big enough (and you didn't mention the sample size), you will get virtually a significant p-value for sktest evertime you type the command.
      Best regards,

      Marcos

      Comment


      • #4
        Thank you Mike. I was hoping someone might say what you said! I just thought it's better to be safe and ask the question.

        Thanks Marcos. Sorry, n=6 11. I'm still learning what is considered 'big'. Regardless of the sktest, it sounds like it's best to stick with looking at the median being +/-10% of mean, skew close to zero and kurtosis close to 3.

        Comment


        • #5
          I'd advise remembering that skewness and kurtosis are in the first instance measures (not tests). Tests will often answer the wrong question.

          Yet sometimes, but not always, skewness and kurtosis values far from 0 and 3 respectively flag problems that may include outliers or nonlinearity to think about. I'd advise always looking at quantile normal plots.

          For example with multqplot from the Stata Journal, we can look at everything in the auto data except make (string) and foreign (known to be binary) with one command:


          SJ-12-3 gr0053 . Speaking Stata: Axis practice, or what goes where on a graph
          (help multqplot if installed) . . . . . . . . . . . . . . . N. J. Cox
          Q3/12 SJ 12(3):549--561
          discusses variations on what goes on each axis of a two-way
          plot; provides multiple quantile plots


          Code:
          . sysuse auto
          (1978 Automobile Data)
          
          . multqplot price-gear, trscale(invnormal(@)) xla(-2/2)
          (We could have included foreign; there's just little point in doing so.

          Click image for larger version

Name:	multqplot4.png
Views:	1
Size:	50.1 KB
ID:	1461614



          There are hints here of mild skewness in a few variables.

          moments (
          SSC) is a wrapper for summarize that can help if numbers will.

          Code:
          . moments
          
          -----------------------------------------------------------------------
                          n = 69 |       mean          SD    skewness    kurtosis
          -----------------------+-----------------------------------------------
                           Price |   6146.043    2912.440       1.688       5.032
                   Mileage (mpg) |     21.290       5.866       0.995       3.997
              Repair Record 1978 |      3.406       0.990      -0.057       2.678
                  Headroom (in.) |      3.000       0.853       0.197       2.144
           Trunk space (cu. ft.) |     13.928       4.343      -0.044       2.159
                   Weight (lbs.) |   3032.029     792.851       0.118       2.073
                    Length (in.) |    188.290      22.747      -0.076       2.000
               Turn Circle (ft.) |     39.797       4.441       0.071       2.228
          Displacement (cu. in.) |    198.000      93.148       0.581       2.354
                      Gear Ratio |      2.999       0.463       0.279       2.109
                        Car type |      0.304       0.464       0.850       1.723
          -----------------------------------------------------------------------

          Comment


          • #6
            Thanks for the references and commands, Nick. I like graphs. They help tell the story so much more clearly. I'll definitely keep this one up my sleeve.

            Comment

            Working...
            X