Hi everyone and thanks in advance for reading.
I have panel data at the school level, following over 50 thousand schools for two academic years. My outcomes of interests have a lot of zeros. For instance, one of my outcomes of interest is the number of students that were arrested at school during an academic year. This means the majority of schools have zero counts, while a few others report positive counts, and even fewer large counts. Specifically for that outcome, about 90% of schools report zero arrests, with the rest reporting positive numbers. My questions are the following:
1) Which regression model is best to use? I'm currently considering xtpoisson or xtnbreg. Is there another model I should be considering?
2) What are your suggestions on the dependent variables? Should I keep them as raw counts or as rates per 100 students or should I consider a transformation like inverse hyperbolic sine?
Thanks!
I have panel data at the school level, following over 50 thousand schools for two academic years. My outcomes of interests have a lot of zeros. For instance, one of my outcomes of interest is the number of students that were arrested at school during an academic year. This means the majority of schools have zero counts, while a few others report positive counts, and even fewer large counts. Specifically for that outcome, about 90% of schools report zero arrests, with the rest reporting positive numbers. My questions are the following:
1) Which regression model is best to use? I'm currently considering xtpoisson or xtnbreg. Is there another model I should be considering?
2) What are your suggestions on the dependent variables? Should I keep them as raw counts or as rates per 100 students or should I consider a transformation like inverse hyperbolic sine?
Thanks!
Comment