How to assess 'close to normal' distribution using skew, kurtosis and -sktest-?

Emily Mann

Join Date: Jul 2018
Posts: 21

How to assess 'close to normal' distribution using skew, kurtosis and -sktest-?

09 Sep 2018, 04:35

Hi Stata list,

I am running 3x logistic regressions, each with an interaction.
For example,

Code:

logistic fightpart i.visittoptwohotspot i.clubbingfreq i.drinkconsumed c.restrictivetotal i.drinkconsumed#c.restrictivetotal

I looked at the distribution of each of the three predictors - restrictivetotal inhibitedtotal exageratedtotal. To me, the historgrams for restrictivetotal and inhibitedtotal do not look like a 'close to normal' distribution, though my judgment and experience in data analysis are limited. I think the skew and kurtosis numbers look OK. But the -sktest- command shows that they are not normal.

Do the skew and kurtosis figures represent a close to normal distribution? Is it worth paying attention to the results of the -sktest-?

cheers
Emily

Code:

 
 . sum restrictivetotal inhibitedtotal exageratedtotal, detail                        restrictivetotal -------------------------------------------------------------       Percentiles      Smallest  1%            5              5  5%            6              5 10%            8              5       Obs                 663 25%           11              5       Sum of Wgt.         663  50%           14                      Mean           14.37557                         Largest       Std. Dev.      5.093015 75%           18             25 90%           21             25       Variance        25.9388 95%           23             25       Skewness       .1411719 99%           25             25       Kurtosis       2.379279                         inhibitedtotal -------------------------------------------------------------       Percentiles      Smallest  1%            5              5  5%            5              5 10%            5              5       Obs                 663 25%            8              5       Sum of Wgt.         663  50%           11                      Mean           12.00905                         Largest       Std. Dev.      5.233223 75%           15             25 90%           20             25       Variance       27.38662 95%           22             25       Skewness       .5554661 99%           25             25       Kurtosis       2.515539                         exageratedtotal -------------------------------------------------------------       Percentiles      Smallest  1%            8              5  5%           10              6 10%           12              6       Obs                 663 25%           14              6       Sum of Wgt.         663  50%           17                      Mean           16.63499                         Largest       Std. Dev.      3.794906 75%           19             25 90%           22             25       Variance       14.40131 95%           23             25       Skewness      -.0951031 99%           25             25       Kurtosis        2.88996   . sktest restrictivetotal inhibitedtotal exageratedtotal                      Skewness/Kurtosis tests for Normality                                                           ------ joint ------     Variable |        Obs  Pr(Skewness)  Pr(Kurtosis) adj chi2(2)   Prob>chi2 -------------+--------------------------------------------------------------- restrictiv~l |        663     0.1354        0.0000       21.00         0.0000 inhibitedt~l |        663     0.0000        0.0010       33.98         0.0000 exagerated~l |        663     0.3132        0.6303        1.25         0.5345

Tags: None

Mike Lacy

Join Date: Apr 2014

Posts: 2404
#2

09 Sep 2018, 07:53

Normality of the *response* variable can be relevant in a regression model, but normality of the explanatory variables is not. (And, in fact, it's the residual distribution of the response variable, not its unconditional distribution, that matters.) So, you would not need to answer the question you ask here if its purpose is to support your regression modeling.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

09 Sep 2018, 09:20

I just wish to add that, if the number of observations is big enough (and you didn't mention the sample size), you will get virtually a significant p-value for sktest evertime you type the command.

Best regards,

Marcos
1 like
Comment
Emily Mann

Join Date: Jul 2018

Posts: 21
#4

10 Sep 2018, 18:38

Thank you Mike. I was hoping someone might say what you said! I just thought it's better to be safe and ask the question.

Thanks Marcos. Sorry, n=6 11. I'm still learning what is considered 'big'. Regardless of the sktest, it sounds like it's best to stick with looking at the median being +/-10% of mean, skew close to zero and kurtosis close to 3.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35433

11 Sep 2018, 11:56

I'd advise remembering that skewness and kurtosis are in the first instance measures (not tests). Tests will often answer the wrong question.

Yet sometimes, but not always, skewness and kurtosis values far from 0 and 3 respectively flag problems that may include outliers or nonlinearity to think about. I'd advise always looking at quantile normal plots.

For example with multqplot from the Stata Journal, we can look at everything in the auto data except make (string) and foreign (known to be binary) with one command:

SJ-12-3 gr0053 . Speaking Stata: Axis practice, or what goes where on a graph
(help multqplot if installed) . . . . . . . . . . . . . . . N. J. Cox
Q3/12 SJ 12(3):549--561
discusses variations on what goes on each axis of a two-way
plot; provides multiple quantile plots

Code:

. sysuse auto
(1978 Automobile Data)

. multqplot price-gear, trscale(invnormal(@)) xla(-2/2)

(We could have included foreign; there's just little point in doing so.

Click image for larger version

Name: multqplot4.png
Views: 1
Size: 50.1 KB
ID: 1461614

There are hints here of mild skewness in a few variables.

moments (SSC) is a wrapper for summarize that can help if numbers will.

Code:

. moments

-----------------------------------------------------------------------
                n = 69 |       mean          SD    skewness    kurtosis
-----------------------+-----------------------------------------------
                 Price |   6146.043    2912.440       1.688       5.032
         Mileage (mpg) |     21.290       5.866       0.995       3.997
    Repair Record 1978 |      3.406       0.990      -0.057       2.678
        Headroom (in.) |      3.000       0.853       0.197       2.144
 Trunk space (cu. ft.) |     13.928       4.343      -0.044       2.159
         Weight (lbs.) |   3032.029     792.851       0.118       2.073
          Length (in.) |    188.290      22.747      -0.076       2.000
     Turn Circle (ft.) |     39.797       4.441       0.071       2.228
Displacement (cu. in.) |    198.000      93.148       0.581       2.354
            Gear Ratio |      2.999       0.463       0.279       2.109
              Car type |      0.304       0.464       0.850       1.723
-----------------------------------------------------------------------

Comment

Emily Mann

Join Date: Jul 2018

Posts: 21
#6

12 Sep 2018, 05:07

Thanks for the references and commands, Nick. I like graphs. They help tell the story so much more clearly. I'll definitely keep this one up my sleeve.
Comment

Announcement

How to assess 'close to normal' distribution using skew, kurtosis and -sktest-?

Comment

Comment

Comment

Comment

Comment