Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fitting Poisson Distribution

    Dear all,
    I have a variable - number of people who are malnourished in a PSU. I have approximately 2000 independently surveyed PSUs. Data would like as follows:
    X
    20
    30
    50
    70
    .
    .
    .
    .
    80
    Now, my doubt is how should I check whether this variable is following a Poisson distribution? I know I can compare mean and variance and expect them to be same. Is there any test which can be used to argue that this variable follows a Poisson distribution?

    Thank you.

  • #2
    Gaurav:
    see -estat gof- under -help poisson postestimation-.
    As you're dealing with survey data, see also -help svy_estimation-.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Dear Carlo,
      Thank you for your response.
      I can use -estat gof- after I use the following command -poisson dependent independent-
      However, in my case, there is no independent variable and I just want to check whether X follows Poisson distribution or not.

      Comment


      • #4
        Gaurav:
        you are correct.
        The following toy-example shows that the variable does not follow a Poisson distribution:
        Code:
        . use http://www.stata-press.com/data/r15/airline.dta
        
        . sum injuries
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
            injuries |          9    7.111111    5.487359          1         19
        
        . di r(sd)^2/r(mean)
        4.234375
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Dear Carlo,
          If I am not wrong, it is the index of dispersion. Since it is not equal to 1, it implies that mean and variance are not the same. Can we claim this statistically?
          I mean - can we say index of dispersion is statistically different from 1.
          If not, can we use any test to claim the same?

          Comment


          • #6
            Gaurav:
            you are correct again.
            The example provided in #4 tells that the -poisson- is actually overdispersed (as it is most frequently the case with empirical data) as the variance/mean ratio well exceed 1.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Dear Carlo,
              But I have to claim this statistically.
              Say, I want to do the following hypothesis testing:
              H0: ID = 1
              H1: ID != 1
              where ID is the variance/mean ratio. Is there any test which can be used to claim that H0 is rejected?

              Comment


              • #8
                Gaurav:
                a dispersion index>1 enough to reject the null.
                However, you can elaborate on this with the following toy-example, that applies -poisson- to a constant-only model:
                Code:
                . use "http://www.stata-press.com/data/r15/airline.dta", clear
                
                . poisson injuries
                
                Iteration 0:   log likelihood = -31.909771 
                Iteration 1:   log likelihood = -31.909771 
                
                Poisson regression                              Number of obs     =          9
                                                                LR chi2(0)        =       0.00
                                                                Prob > chi2       =          .
                Log likelihood = -31.909771                     Pseudo R2         =     0.0000
                
                ------------------------------------------------------------------------------
                    injuries |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                       _cons |   1.961659       .125    15.69   0.000     1.716663    2.206654
                ------------------------------------------------------------------------------
                
                . estat gof
                
                         Deviance goodness-of-fit =  31.85932
                         Prob > chi2(8)           =    0.0001
                
                         Pearson goodness-of-fit  =    33.875
                         Prob > chi2(8)           =    0.0000
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  Dear Carlo,

                  Okay.
                  But I had to claim this statistically. I think this result would be sufficient.
                  Thank you so much.

                  Comment


                  • #10
                    The main test in the poisson results of #8 is confirming that the mean really isn't zero. That is not itself assessing fit to a Poisson.

                    What precisely are the goodness of fit measures measuring?

                    Comment


                    • #11
                      Dear Carlo and Nick,

                      In that case what should be the correct way to check (statistically) whether a particular variable follows Poisson distribution or not?

                      Since the variance of variable 'injuries' is 30.11.
                      Shall I do a hypothesis testing?
                      H0 : mean = 30.11
                      H1 : mean != 30.11

                      If the test significantly rejects the null hypothesis then it would imply that the mean is significantly different from variance and hence doesn't follow the Poisson distribution.

                      In addition to this, Is there any minimum number of observations required to do check the fit of the Poisson distribution?
                      Last edited by Gaurav Dhamija; 31 May 2019, 00:30.

                      Comment


                      • #12
                        Gaurav:
                        see Nick Cox https://www.stata.com/statalist/arch.../msg00267.html
                        Kind regards,
                        Carlo
                        (StataNow 18.5)

                        Comment


                        • #13
                          Dear Carlo,

                          Thanks for sharing this link. But 'chitest' is used to check whether a variable follows uniform distribution or not. How can we use this to check for the Poisson Distribution?

                          It would be great if you can shed some light on the second query also.
                          Is there any minimum number of observations required to do check the fit of the Poisson distribution?

                          Comment


                          • #14
                            Not so. The help for chitest gives as its first code example

                            Code:
                            chitest count Poisson, nfit(1)
                            which was surely intended as a hint. Testing uniformity is merely the default.

                            So, you need to do a little work to set it up. The twists here are that you must be careful what you count, including values that don't occur! You must also ensure that the sum of expected frequencies is correct, as the largest bin you use will always be open-ended.

                            Here the sandbox dataset is a Poisson sample with mean 3. I then forget that I know that, use the observed mean and count how many times the variable is 0, 1, ..., up to the maximum observed. As said, we then adjust the last expected frequency.

                            Code:
                            clear
                            set obs 1000
                            set seed 2803
                            gen y = rpoisson(3)
                            
                            su y, meanonly
                            
                            scalar mean = r(mean)
                            local N = r(N)
                            local maxp1 = r(max) + 1
                            
                            gen obs = .
                            gen exp = .
                            
                            qui forval x = 1/`maxp1' {
                                count if y == `x' - 1
                                replace obs = r(N) in `x'
                                replace exp = `N' * poissonp(mean, `x' - 1) in `x'
                            }
                            
                            su exp, meanonly
                            replace exp = exp + (`N' - r(sum)) in `maxp1'
                            
                            chitest obs exp, nfit(1) sep(0)
                            
                            
                            
                            
                            observed frequencies from obs; expected frequencies from exp
                            
                                     Pearson chi2(9) =   7.5502   Pr =  0.580
                            likelihood-ratio chi2(9) =   7.7716   Pr =  0.557
                            
                              +---------------------------------------------------+
                              | observed   expected   notes   obs - exp   Pearson |
                              |---------------------------------------------------|
                              |       55     48.364               6.636     0.954 |
                              |      144    146.494              -2.494    -0.206 |
                              |      219    221.866              -2.866    -0.192 |
                              |      213    224.011             -11.011    -0.736 |
                              |      167    169.632              -2.632    -0.202 |
                              |      121    102.763              18.237     1.799 |
                              |       45     51.878              -6.878    -0.955 |
                              |       26     22.448               3.552     0.750 |
                              |        8      8.500              -0.500    -0.171 |
                              |        1      2.861   *          -1.861    -1.100 |
                              |        1      1.183   *          -0.183    -0.168 |
                              +---------------------------------------------------+
                            
                            *  1 <= expected < 5
                            In practice I would always look at a graph too.

                            There is no magic sample size that is enough. A virtue (?) of the chi-square test, as least as implemented above, is that it squawks whenever your expected frequencies are small.

                            Comment


                            • #15
                              Dear Nick,

                              Thank you for the detailed description.

                              Comment

                              Working...
                              X