Fitting Poisson Distribution

Gaurav Dhamija

Join Date: May 2016

Posts: 35
#1

Fitting Poisson Distribution

28 May 2019, 00:27

Dear all,
I have a variable - number of people who are malnourished in a PSU. I have approximately 2000 independently surveyed PSUs. Data would like as follows:
X
20
30
50
70
.
.
.
.
80
Now, my doubt is how should I check whether this variable is following a Poisson distribution? I know I can compare mean and variance and expect them to be same. Is there any test which can be used to argue that this variable follows a Poisson distribution?

Thank you.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#2

28 May 2019, 01:53

Gaurav:
see -estat gof- under -help poisson postestimation-.
As you're dealing with survey data, see also -help svy_estimation-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Gaurav Dhamija

Join Date: May 2016

Posts: 35
#3

28 May 2019, 02:11

Dear Carlo,
Thank you for your response.
I can use -estat gof- after I use the following command -poisson dependent independent-
However, in my case, there is no independent variable and I just want to check whether X follows Poisson distribution or not.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

28 May 2019, 02:30

Gaurav:
you are correct.
The following toy-example shows that the variable does not follow a Poisson distribution:

Code:

. use http://www.stata-press.com/data/r15/airline.dta

. sum injuries

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    injuries |          9    7.111111    5.487359          1         19

. di r(sd)^2/r(mean)
4.234375

Kind regards,
Carlo
(Stata 19.0)

Comment

Gaurav Dhamija

Join Date: May 2016

Posts: 35
#5

28 May 2019, 02:42

Dear Carlo,
If I am not wrong, it is the index of dispersion. Since it is not equal to 1, it implies that mean and variance are not the same. Can we claim this statistically?
I mean - can we say index of dispersion is statistically different from 1.
If not, can we use any test to claim the same?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#6

28 May 2019, 03:02

Gaurav:
you are correct again.
The example provided in #4 tells that the -poisson- is actually overdispersed (as it is most frequently the case with empirical data) as the variance/mean ratio well exceed 1.

Kind regards,
Carlo
(Stata 19.0)
Comment
Gaurav Dhamija

Join Date: May 2016

Posts: 35
#7

28 May 2019, 03:09

Dear Carlo,
But I have to claim this statistically.
Say, I want to do the following hypothesis testing:
H0: ID = 1
H1: ID != 1
where ID is the variance/mean ratio. Is there any test which can be used to claim that H0 is rejected?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

28 May 2019, 03:54

Gaurav:
a dispersion index>1 enough to reject the null.
However, you can elaborate on this with the following toy-example, that applies -poisson- to a constant-only model:

Code:

. use "http://www.stata-press.com/data/r15/airline.dta", clear

. poisson injuries

Iteration 0:   log likelihood = -31.909771 
Iteration 1:   log likelihood = -31.909771 

Poisson regression                              Number of obs     =          9
                                                LR chi2(0)        =       0.00
                                                Prob > chi2       =          .
Log likelihood = -31.909771                     Pseudo R2         =     0.0000

------------------------------------------------------------------------------
    injuries |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   1.961659       .125    15.69   0.000     1.716663    2.206654
------------------------------------------------------------------------------

. estat gof

         Deviance goodness-of-fit =  31.85932
         Prob > chi2(8)           =    0.0001

         Pearson goodness-of-fit  =    33.875
         Prob > chi2(8)           =    0.0000

Kind regards,
Carlo
(Stata 19.0)

Comment

Gaurav Dhamija

Join Date: May 2016

Posts: 35
#9

28 May 2019, 03:58

Dear Carlo,

Okay.
But I had to claim this statistically. I think this result would be sufficient.
Thank you so much.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#10

28 May 2019, 07:00

The main test in the poisson results of #8 is confirming that the mean really isn't zero. That is not itself assessing fit to a Poisson.

What precisely are the goodness of fit measures measuring?
Comment
Gaurav Dhamija

Join Date: May 2016

Posts: 35
#11

31 May 2019, 00:17

Dear Carlo and Nick,

In that case what should be the correct way to check (statistically) whether a particular variable follows Poisson distribution or not?

Since the variance of variable 'injuries' is 30.11.
Shall I do a hypothesis testing?
H0 : mean = 30.11
H1 : mean != 30.11

If the test significantly rejects the null hypothesis then it would imply that the mean is significantly different from variance and hence doesn't follow the Poisson distribution.

In addition to this, Is there any minimum number of observations required to do check the fit of the Poisson distribution?

Last edited by Gaurav Dhamija; 31 May 2019, 00:30.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#12

31 May 2019, 01:13

Gaurav:
see Nick Cox https://www.stata.com/statalist/arch.../msg00267.html

Kind regards,
Carlo
(Stata 19.0)
Comment
Gaurav Dhamija

Join Date: May 2016

Posts: 35
#13

31 May 2019, 02:12

Dear Carlo,

Thanks for sharing this link. But 'chitest' is used to check whether a variable follows uniform distribution or not. How can we use this to check for the Poisson Distribution?

It would be great if you can shed some light on the second query also.
Is there any minimum number of observations required to do check the fit of the Poisson distribution?
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35432

#14

31 May 2019, 03:38

Not so. The help for chitest gives as its first code example

Code:

chitest count Poisson, nfit(1)

which was surely intended as a hint. Testing uniformity is merely the default.

So, you need to do a little work to set it up. The twists here are that you must be careful what you count, including values that don't occur! You must also ensure that the sum of expected frequencies is correct, as the largest bin you use will always be open-ended.

Here the sandbox dataset is a Poisson sample with mean 3. I then forget that I know that, use the observed mean and count how many times the variable is 0, 1, ..., up to the maximum observed. As said, we then adjust the last expected frequency.

Code:

clear
set obs 1000
set seed 2803
gen y = rpoisson(3)

su y, meanonly

scalar mean = r(mean)
local N = r(N)
local maxp1 = r(max) + 1

gen obs = .
gen exp = .

qui forval x = 1/`maxp1' {
    count if y == `x' - 1
    replace obs = r(N) in `x'
    replace exp = `N' * poissonp(mean, `x' - 1) in `x'
}

su exp, meanonly
replace exp = exp + (`N' - r(sum)) in `maxp1'

chitest obs exp, nfit(1) sep(0)




observed frequencies from obs; expected frequencies from exp

         Pearson chi2(9) =   7.5502   Pr =  0.580
likelihood-ratio chi2(9) =   7.7716   Pr =  0.557

  +---------------------------------------------------+
  | observed   expected   notes   obs - exp   Pearson |
  |---------------------------------------------------|
  |       55     48.364               6.636     0.954 |
  |      144    146.494              -2.494    -0.206 |
  |      219    221.866              -2.866    -0.192 |
  |      213    224.011             -11.011    -0.736 |
  |      167    169.632              -2.632    -0.202 |
  |      121    102.763              18.237     1.799 |
  |       45     51.878              -6.878    -0.955 |
  |       26     22.448               3.552     0.750 |
  |        8      8.500              -0.500    -0.171 |
  |        1      2.861   *          -1.861    -1.100 |
  |        1      1.183   *          -0.183    -0.168 |
  +---------------------------------------------------+

*  1 <= expected < 5

In practice I would always look at a graph too.

There is no magic sample size that is enough. A virtue (?) of the chi-square test, as least as implemented above, is that it squawks whenever your expected frequencies are small.

Comment

Gaurav Dhamija

Join Date: May 2016

Posts: 35
#15

31 May 2019, 09:56

Dear Nick,

Thank you for the detailed description.
Comment

Announcement