I’ve recently started using STATA to carry out Poisson regression analysis on aggregate data and had a question regarding the use of the robust method for estimating standard errors. After running a preliminary model, I decided to collapse the dataset further across a small number of variables that I was not using in the models, and reran the same model. I noticed that although the model coefficients stayed the same the robust standard errors changed. However, if I use the default maximum likelihood method (OIM) instead, standard errors stay the same before and after collapsing the data further. I also noted that the OIM method provides the exact same standard errors whether based on individual or aggregate data whereas that does not appear to be the case for the robust standard error method.
For example, say we have an individual level dataset with person years recorded on 1000 people along with whether or not an event occurred (Y) together with 3 two-level factors: (A, B, and C).
use "individual", clear
collapse (sum) pyr Y , by(A)
xi: poisson Y i.A, exposure(pyr) irr vce(robust)
provides one set of points estimates and standard errors, whereas:
use "individual", clear
collapse (sum) pyr Y , by(A B C)
xi: poisson outcome i.A, exposure(pyr) irr vce(robust)
provides different standard errors.
I'm unclear what the robust method is doing here and if it is appropriate to use with aggregate data given that the model appears to differ depending on which variables I collapse by. I’m aware of the justification for using robust standard errors relates to whether errors might be considered heteroskedastic and/or correlated and that alternative model options should sometimes be considered, e.g. negative binomial, Quasi-Poisson, random effects, etc. My question is more out of curiosity in this particular situation when you might have aggregate data across different factors/variables. I’m inclined to just stick with maximum likelihood method for my dataset that I am currently working on but would still be very grateful for any advice insights you may have on the above.
Thank you in advance
Steve Vander Hoorn
For example, say we have an individual level dataset with person years recorded on 1000 people along with whether or not an event occurred (Y) together with 3 two-level factors: (A, B, and C).
use "individual", clear
collapse (sum) pyr Y , by(A)
xi: poisson Y i.A, exposure(pyr) irr vce(robust)
provides one set of points estimates and standard errors, whereas:
use "individual", clear
collapse (sum) pyr Y , by(A B C)
xi: poisson outcome i.A, exposure(pyr) irr vce(robust)
provides different standard errors.
I'm unclear what the robust method is doing here and if it is appropriate to use with aggregate data given that the model appears to differ depending on which variables I collapse by. I’m aware of the justification for using robust standard errors relates to whether errors might be considered heteroskedastic and/or correlated and that alternative model options should sometimes be considered, e.g. negative binomial, Quasi-Poisson, random effects, etc. My question is more out of curiosity in this particular situation when you might have aggregate data across different factors/variables. I’m inclined to just stick with maximum likelihood method for my dataset that I am currently working on but would still be very grateful for any advice insights you may have on the above.
Thank you in advance
Steve Vander Hoorn
Comment