ksmirnov and chitest

Stefano Pagliarani

Join Date: Dec 2021
Posts: 11

ksmirnov and chitest

17 Feb 2022, 03:58

Hi everyone,
I am looking for a sound method to test whether a distribution is significantly different from a discrete uniform one. Basically I have a small sample of the outcomes of an unfair die roll and I need to "prove" that the die is unfair.
I tried two methods. The first was

Code:

ksmirnov mydata = runiformint(1, 6)

which gave

Code:

One-sample Kolmogorov-Smirnov test against theoretical distribution
           runiformint(1, 6)

 Smaller group       D       P-value  
 -----------------------------------
 mydata:            -0.3500    0.007
 Cumulative:        -5.8000    0.000
 Combined K-S:       5.8000    0.000

while the second one was

Code:

 chitest mydata, count sep(0)

which gave

Code:

observed frequencies of mydata; expected frequencies equal

         Pearson chi2(5) =   2.2000   Pr =  0.821
likelihood-ratio chi2(5) =   2.2530   Pr =  0.813

  +-------------------------------------------------------------+
  | mydata    observed   expected   notes   obs - exp   Pearson |
  |-------------------------------------------------------------|
  |       1          4      3.333   *           0.667     0.365 |
  |       2          2      3.333   *          -1.333    -0.730 |
  |       3          2      3.333   *          -1.333    -0.730 |
  |       4          3      3.333   *          -0.333    -0.183 |
  |       5          5      3.333   *           1.667     0.913 |
  |       6          4      3.333   *           0.667     0.365 |
  +-------------------------------------------------------------+

*  1 <= expected < 5

So basically it seems that the result depends on the method. Do you have any ideas about why this happens and/or better solutions?

Thank you

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35211
#2

17 Feb 2022, 04:33

Different methods will indeed give different results as K-S looks at the cumulative distribution function while a chi-square test ignores the ordering in the data.

A more basic problem is that you haven't presented ksmirnov with a theoretical cumulative (distribution function) varying from 0 to 1.

If I understand correctly, the cumulative for your problem is mydata/6

Further, specifying a random number function for your cumulative would mean results hard to reproduce and even then dependent slightly on seed.

I tend to prefer chi-square tests here, possibly a case of familiarity rather than statistical optimality. One good reason is that you can look at residuals easily.

Note: chitest is from tab_chi on SSC (FAQ Advice #12).

Last edited by Nick Cox; 17 Feb 2022, 05:30.
1 like
Comment
Stefano Pagliarani

Join Date: Dec 2021

Posts: 11
#3

17 Feb 2022, 05:38

Thank you very much for your help
Comment

Announcement

ksmirnov and chitest

Comment

Comment