Hi everyone,
I have a cross-sectional panel dataset with years nested within universities, which are nested within US states. My dependent variable is the annual count (y) at the university level, and my key independent variable (x) is binary at the state level; x varies among states, but does not change over time within states. My goal is to understand whether universities in states with x = 1 have increased/decreased y in comparison to those in states with x = 0, and I'm not trying to generalize the findings outside of the US.
Searching around on the forum, I came across a few possible options to model this relationship, but I'm not sure which is best. I wanted to use GEE to account for clustering within universities and fixed effects for year to handle time, but this does not address the fact that universities are nested within states. I need to use GEE over two-way fixed effects because several of my controls at both levels (and my key IV) do not vary over time and thus would drop out of the models.
Would adding a dummy indicator for state address my issue? For example:
xtset university year
xtpoisson y x controls i.year i.state, pa vce(robust) exposure(population)
These models are having trouble converging, even excluding controls.
Or do I not need the state fixed effects? I could disaggregate state-level variables (z) with variability over time into within and between state components, and then leave x as is (no variability over time):
by state, sort: egen z_mean = mean(z)
gen z_dev = z - z_mean
xtset university year
xtpoisson y x z_mean z_dev i.year, pa vce(robust) exp(population)
Another option is to use multi-level mixed effects regression:
xtset university year
xtmepoisson y x controls i.year || state:, exposure(population)
But I am not sure if treating state as a random effect is appropriate.
Any advice on how best to proceed would be appreciated!
Best,
Emma
I have a cross-sectional panel dataset with years nested within universities, which are nested within US states. My dependent variable is the annual count (y) at the university level, and my key independent variable (x) is binary at the state level; x varies among states, but does not change over time within states. My goal is to understand whether universities in states with x = 1 have increased/decreased y in comparison to those in states with x = 0, and I'm not trying to generalize the findings outside of the US.
Searching around on the forum, I came across a few possible options to model this relationship, but I'm not sure which is best. I wanted to use GEE to account for clustering within universities and fixed effects for year to handle time, but this does not address the fact that universities are nested within states. I need to use GEE over two-way fixed effects because several of my controls at both levels (and my key IV) do not vary over time and thus would drop out of the models.
Would adding a dummy indicator for state address my issue? For example:
xtset university year
xtpoisson y x controls i.year i.state, pa vce(robust) exposure(population)
These models are having trouble converging, even excluding controls.
Or do I not need the state fixed effects? I could disaggregate state-level variables (z) with variability over time into within and between state components, and then leave x as is (no variability over time):
by state, sort: egen z_mean = mean(z)
gen z_dev = z - z_mean
xtset university year
xtpoisson y x z_mean z_dev i.year, pa vce(robust) exp(population)
Another option is to use multi-level mixed effects regression:
xtset university year
xtmepoisson y x controls i.year || state:, exposure(population)
But I am not sure if treating state as a random effect is appropriate.
Any advice on how best to proceed would be appreciated!
Best,
Emma