Hello all,
I’m using Stata 12.1 to analyse data on the number of cases of cancer in the population during three-year periods by quintiles of a socioeconomic indicator (SIMD). My underlying data consists of strata containing counts of cases and population estimates by sex, age group, SIMD quintile and time period.
I am using the poisson command in Stata, adjusting for age group and stratifying by year group and sex.
For example,
Very few of the strata contain zero cases - I have run ZIP models with the same covariates, finding no evidence of excess zeros from the Vuong test.
I have also run negative binomial models with the same covariates to check for overdispersion – the LR test results suggest this is not a problem.
However, when I rerun the model using the glm poisson command instead of the standard poisson command, predict the deviance residuals and plot them on a pnorm plot, they appear to be very non-normal. Similarly, when I plot the residuals against the linear predictor, it looks like the assumption of constant variance of residuals is also violated.
Although I have only included two covariates in the model (SIMD quintile and age group), so it might appear under-specified, this is because I have stratified for the others I think a priori are likely to be relevant (sex and time period).
I’m therefore unsure where the problems with non-normality and heteroscedasticity are coming from and how to solve them. My reading suggests that calculating robust standard errors in the Poisson model might help – however, I am unsure whether it is possible to do this and then re-check the normality/constant variance assumptions?
Any suggestions would be very gratefully received – and if anything in the above description is unclear, I’m more than happy to clarify.
Many thanks
Emily
I’m using Stata 12.1 to analyse data on the number of cases of cancer in the population during three-year periods by quintiles of a socioeconomic indicator (SIMD). My underlying data consists of strata containing counts of cases and population estimates by sex, age group, SIMD quintile and time period.
I am using the poisson command in Stata, adjusting for age group and stratifying by year group and sex.
For example,
Code:
xi: poisson numcases i.simd_quin i.age_broad if yeargroup==0 & sex==0, exposure(population) irr xi: poisson numcases i.simd_quin i.age_broad if yeargroup==0 & sex==1, exposure(population) irr
I have also run negative binomial models with the same covariates to check for overdispersion – the LR test results suggest this is not a problem.
However, when I rerun the model using the glm poisson command instead of the standard poisson command, predict the deviance residuals and plot them on a pnorm plot, they appear to be very non-normal. Similarly, when I plot the residuals against the linear predictor, it looks like the assumption of constant variance of residuals is also violated.
Although I have only included two covariates in the model (SIMD quintile and age group), so it might appear under-specified, this is because I have stratified for the others I think a priori are likely to be relevant (sex and time period).
I’m therefore unsure where the problems with non-normality and heteroscedasticity are coming from and how to solve them. My reading suggests that calculating robust standard errors in the Poisson model might help – however, I am unsure whether it is possible to do this and then re-check the normality/constant variance assumptions?
Any suggestions would be very gratefully received – and if anything in the above description is unclear, I’m more than happy to clarify.
Many thanks
Emily
Comment