Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unexpected goodness of fit results, Poisson regresion

    Hello,

    I am studying mortality rates, and have modeled rates and some explanatory variables with a Poisson regression.
    My dataset contains 500.000 individual records with the vital status, exposure time, and explanatory variables; I have st_setted and st_splitted them.
    Here are the variables:

    _d = death status
    curage= 5y agegroup at start of interval
    gewest= region
    educat4 = educational level in 4 categories
    empl4= employment status in 4 categories
    newcomf= comfort level of the house (4 cat)
    owner= being owner or tenant (2 cat)
    isol= entourage (3 categories)
    y= time of exposure

    I have run a Poisson regression, then looked at the goodness of fit of my model.

    Code:
      poisson _d curage i.gewest ib4.educat4 i.empl4 ib4.newcomf2 if sex==1, e(y) irr
    estat gof
    My problem is that the results of the goodness of fit tests (the deviance and the Pearson chi square) look strange :
    they both give extreme results , 0 and 1; moreover, they give opposite results.
    This is unexpected , so I suppose I made something wrong ; please could you help to find my mistake ?

    HTML Code:
    Deviance goodness-of-fit = 61023.65
    Prob > chi2(443788) = 1.0000
    Pearson goodness-of-fit = 3062899
    Prob > chi2(443788) = 0.0000

    Thanks, Françoise

  • #2
    Francoise:
    I would look at the standard errors first, searching for some "weird" values.
    As an aside, it is better to post the whole Stata output via CODE delimiters (no HTML code), so that other listers can look at the big picture and reply consistently. Thanks.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Let me expand on Carlo's suggestion. From your Stata Results window, you should copy everything starting with the line including the poisson command and its output, any intervening commands you might have run and their output, and ending with the estat gof command and its output. These results should be pasted into a code block (as you did in your first example above) and not into an HTML block. Sometimes the forum editor is confusing to use, but you can also create your code block by hand by typing [code] and [/code] before and after your material by hand.

      For example, the following:

      [code]
      . sysuse auto, clear
      (1978 Automobile Data)

      . describe make price

      storage display value
      variable name type format label variable label
      -----------------------------------------------------------------
      make str18 %-18s Make and Model
      price int %8.0gc Price
      [/code]

      will be presented in the post as the following, which preserves spacing and is clearly readable.
      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . describe make price
      
                    storage   display    value
      variable name   type    format     label      variable label
      -----------------------------------------------------------------
      make            str18   %-18s                 Make and Model
      price           int     %8.0gc                Price
      Last edited by William Lisowski; 22 Mar 2016, 08:02.

      Comment


      • #4
        Hi again,
        so I'll try to do as recommended. I run 2 models, the first one being very basic and modeling only the death rates in fct of the agegroup, the second one is the full model with all the variables (without interaction). I think there is no problem of empty cells with huge standard error. My sample is quite big (473.000 observations).

        So my problem is = why are the goodness of fit test so extreme ? Why are they going in opposite direction ? I suppose those results are wrong ? Or can they be correct ?
        thanks in advance for your suggestions....

        Code:
          
        .POISSONS REGRESSIONS
        
        . *MALES
        
        *Basic model= just keep age group at entry(curage)
        
        . poisson _d i.curage if sex==1&educat4!=.&empl4 !=.&newcomf2!=.&owner!=.&isol!=.,e(y) irr
        
        Iteration 0:   log likelihood = -38569.398  
        Iteration 1:   log likelihood = -37938.495  
        Iteration 2:   log likelihood = -37932.809  
        Iteration 3:   log likelihood = -37932.805  
        
        Poisson regression                              Number of obs     =    443,798
                                                        LR chi2(8)        =    6159.31
                                                        Prob > chi2       =     0.0000
        Log likelihood = -37932.805                     Pseudo R2         =     0.0751
        
        ------------------------------------------------------------------------------
                  _d |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              curage |
                 30  |   1.177219   .1880098     1.02   0.307     .8608248    1.609904
                 35  |   1.466694   .2296834     2.45   0.014     1.079051    1.993596
                 40  |   2.221993   .3402212     5.21   0.000     1.645928    2.999678
                 45  |   3.929681   .5880222     9.15   0.000     2.930804    5.268995
                 50  |   5.502652   .8201806    11.44   0.000     4.108648    7.369623
                 55  |    8.57218   1.264484    14.57   0.000     6.419931    11.44596
                 60  |   13.47368   1.971928    17.77   0.000     10.11369    17.94994
                 65  |   22.39002   3.272162    21.27   0.000     16.81345     29.8162
                     |
               _cons |   .0011189   .0001615   -47.08   0.000     .0008432    .0014847
               ln(y) |          1  (exposure)
        ------------------------------------------------------------------------------
        
        
        
        . estat gof 
        
                 Deviance goodness-of-fit =  61919.61
                 Prob > chi2(443789)      =    1.0000
        
                 Pearson goodness-of-fit  =   3128773
                 Prob > chi2(443789)      =    0.0000
        
        . 
        
        *FULL MODEL(Put all the variables; but no interaction yet)
        
        . poisson _d  i.curage i.gewest  ib4.educat4 i.empl4 ib4.newcomf2 ib1.owner i.isol if sex==1,e(y) irr
        
        Iteration 0:   log likelihood = -38196.444  
        Iteration 1:   log likelihood = -37068.603  
        Iteration 2:   log likelihood = -37062.497  
        Iteration 3:   log likelihood = -37062.493  
        
        Poisson regression                              Number of obs     =    443,798
                                                        LR chi2(22)       =    7899.93
                                                        Prob > chi2       =     0.0000
        Log likelihood = -37062.493                     Pseudo R2         =     0.0963
        
        ----------------------------------------------------------------------------------
                      _d |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -----------------+----------------------------------------------------------------
                  curage |
                     30  |   1.195605   .1909627     1.12   0.263     .8742455    1.635092
                     35  |   1.538264   .2410159     2.75   0.006     1.131526     2.09121
                     40  |   2.364945   .3625385     5.61   0.000     1.751195    3.193798
                     45  |   4.144675   .6214087     9.48   0.000     3.089373    5.560459
                     50  |   5.635693    .842374    11.57   0.000     4.204528    7.554009
                     55  |   7.888711   1.169545    13.93   0.000     5.899438    10.54876
                     60  |   10.63228   1.573766    15.97   0.000     7.954875    14.21082
                     65  |   15.91713   2.368933    18.59   0.000     11.88998    21.30828
                         |
                  gewest |
                      2  |   1.095599   .0511996     1.95   0.051     .9997077    1.200687
                      3  |   1.339904   .0341849    11.47   0.000      1.27455    1.408608
                         |
                 educat4 |
                Primary  |   1.430178   .0654803     7.81   0.000     1.307429    1.564452
               LowerSec  |    1.32926   .0660559     5.73   0.000     1.205898    1.465242
          upper/postSec  |   1.280785   .0631451     5.02   0.000     1.162814    1.410724
                         |
                   empl4 |
             unemployed  |   1.450908   .0734096     7.36   0.000     1.313932    1.602164
        stopped working  |    1.58336   .0540017    13.47   0.000     1.480979    1.692819
           non-working   |   2.890198   .1404271    21.84   0.000     2.627665    3.178962
                         |
                newcomf2 |
           insufficient  |   1.493224    .056431    10.61   0.000     1.386618    1.608025
                    low  |   1.270889   .0389984     7.81   0.000     1.196707     1.34967
                average  |   1.291292   .0491621     6.71   0.000     1.198443    1.391334
                         |
                   owner |
                 tenant  |   1.345984   .0371505    10.77   0.000     1.275105    1.420803
                         |
                    isol |
               isolated  |   1.408863   .0439322    10.99   0.000     1.325336    1.497654
                   else  |   1.586157     .12932     5.66   0.000     1.351909    1.860994
                         |
                   _cons |   .0005049   .0000757   -50.60   0.000     .0003763    .0006775
                   ln(y) |          1  (exposure)
        ----------------------------------------------------------------------------------
        
        
        
        . estat gof 
        
                 Deviance goodness-of-fit =  60178.99
                 Prob > chi2(443775)      =    1.0000
        
                 Pearson goodness-of-fit  =   2884654
                 Prob > chi2(443775)      =    0.0000

        Comment


        • #5
          The formulae for the 2 GoF test statistics is given in the Manuals, page 1891: "poisson postestimation — Postestimation tools for poisson" Also look at page 1882 on the Manuals under "poisson — Poisson regression", Methods and Formulae section. That will provide an explanation of how the statistics are calculated and hence how they differ. Once you've worked out which elements of the formulae differ, you should be able to do some detective work in your sample e.g. are there particular observations that are driving this.

          Comment


          • #6
            The 10-year-old FAQ at http://www.stata.com/support/faqs/st...-squared-test/ suggests that your Pearson goodness-of-fit suffers from your data being ungrouped, so that you have 443,798 observations where for each of them the expected count will be <1 (since there is only one exposure per observation). The result will be an artificially significant Pearson goodness-of-fit. This is, I think, just another example of the sensitivity of the general Pearson chi-square goodness-of-fit test to the grouping of the data into cells.

            And the UCLA discussion at http://www.ats.ucla.edu/stat/stata/dae/poissonreg.htm presents an example where like yours the Pearson goodness-of-fit has a much lower p-value than the (deviance) goodness of fit (although in this case neither is significant). Their discussion ignores the Pearson goodness-of-fit in favor of the (deviance) goodness-of-fit.

            Comment


            • #7
              Thank you William, I think it is the more plausible explanation.
              I suppose I should rather look at the BIC to compare models.

              Comment


              • #8
                What leads you to look at BIC? William's helpful answer didn't suggest that. What's wrong with the deviance goodness of fit (otherwise known as a likelihood ratio test statistic) in your context?

                Comment


                • #9
                  Dear Francoise,

                  On a different note, I wonder whether you need these statistics at all. That is, do you really need your data to be Poisson? If you just want to estimate the conditional mean and do not need to compute the probability of given events, it is perfectly fine to estimate the model by Poisson regression even if the data are not Poisson.

                  All the best,

                  Joao

                  Comment


                  • #10
                    @ Joao, thanks for this reflection.
                    Actually, I wanted to look if my model was more or less acceptable.
                    Therefore, I followed the example they give in the Stata manual: they use the Doll and Hill study, perform in a first step a Poisson regression without interaction, and observe the GOF test (Pearson and deviance) has a very small p value (0.02); they conclude their model is bad; afterwards they add an interaction test that improves the model (p value =0.7);
                    I think that my problem comes, as William suggested, from the fact that my data are individual data and not grouped data, what inflates the Pearson Chisquare, that becomes uninterpretable


                    @Stephen: the deviance goodness of fit has a probability of 1; how should I interpret this ? as a very very very large imoprovement from the nul model ? or as a

                    Comment


                    • #11
                      Sorry the message has gone away ....

                      @ Joao, thanks for this reflection.
                      Actually, I wanted to look if my model was more or less acceptable.
                      Therefore, I followed the example they gave in the Stata manual: they used the Doll and Hill study, performed in a first step a Poisson regression without interaction, and observed the GOF test (Pearson and deviance) had a very small p value (0.02); they concluded their model was bad; afterwards they added an interaction term that improved the model (p value =0.7);
                      I think that my problem comes, as William suggested, from the fact that my data are individual data and not grouped data, what inflates the Pearson Chisquare, that becomes uninterpretable


                      @Stephen: the deviance goodness of fit has a probability of 1; how should I interpret this ? as a huge improvement from the null model ? or as a mistake I did ?
                      thanks

                      Comment


                      • #12
                        the LRT statistic for restricted model versus full model is 2*(37933 - 37062) = 1542, to compare against chi-squared critical value with (I think) 14 degrees of freedom. (This is also the difference between -61920 and -60179 shown in the estat gof output.)
                        Code:
                        . di invchi2(14, .95)
                        23.684791
                        
                        . di invchi2(14, .99)
                        29.141238
                        So you would reject the null hypothesis (i.e. restrictions on the full model encapsulated in the smaller model). If you want to look at the "gof" for each model separately, then I think interpretation of the numbers pumped out also needs to think about what the restricted model is, in each case. (I haven't checked in this case -- no time -- but it is usually a model with a constant term and no other covariate. )
                        I suspect that the "Prob > chi2(443775) = 1" is something related to your sample size and the degrees of freedom in the implicit "test" being undertaken. Methods and Formulae should help here.

                        Comment


                        • #13
                          From the Stata manual section on Postestimation tools for poisson, in the Description for estat paragraph, we have

                          estat gof performs a goodness-of-fit test of the model. Both the deviance statistic and the Pearson statistic are reported. If the tests are significant, the Poisson regression model is inappropriate.
                          And in Example 1 below that, which you cited, they do not "conclude that their model was bad", they instead say

                          The deviance goodness-of-fit test tells us that, given the model, we can reject the hypothesis that these data are Poisson distributed.
                          They then go on to create an expanded model for which the deviance goodness-of-fit test does not reject the Poisson distribution of the data.

                          With that in mind, when you write

                          the deviance goodness of fit has a probability of 1; how should I interpret this ? as a huge improvement from the null model ?
                          you are misinterpreting the meaning of the test. If you review the Methods and formulas section of the Stata manual, you will see that the deviance goodness-of-fit is a likelihood ratio test comparing a fully-saturated model to your model, not your model to the null model.

                          With all that in mind, then, I'd expect that, with a probability of 1 for the deviance goodness-of-fit, they would say something like "The deviance goodness-of-fit test tells us that, given the model, we cannot reject the hypothesis that these data are Poisson distributed." Or more colloquially, as they wrote for the expanded model,

                          The goodness-of-fit is now small; we are no longer running roughshod over the data.

                          Comment


                          • #14
                            Dear Françoise,

                            I am sorry for continuing to try to shift the focus of this discussion, but if you just want to check the adequacy of your model and you do not need the data to be Poisson, why don't you just do a RESET test? The goodness-of-fit tests you are considering check whether the data really follows a Poisson distribution, which does not appear to be what you want to do.

                            All the best,

                            Joao

                            Comment

                            Working...
                            X