offest in gee models and collinearity issues

Fernando Lima

Join Date: Oct 2024

Posts: 4
#1

offest in gee models and collinearity issues

31 Oct 2024, 21:02

Hi everyone,

I am working on a paper looking at the mental health hospitalisations of mother of children placed in foster care 2 years before and 2 after placement, aiming to assess if mothers are more likely to increase mh hospitalisation after child removal, and if this differences are higher than other group. I am comparing three groups: children placed in foster care (care group N=1150), children never placed and never in contact with child protection services (no_contact group N=8500), and children who had a contact with child protection services but were not placed (contact group N=4150). Groups were matched and the age at placement of the care group children were used as dummy placement dates for the comparison groups. The outcome variable is binary (mhdiag) coded 1 if the the mother had a mh hospitalisation and 0 otherwise, and 'before_after' codes 0 if the hospitalisation was before placement and 1 if it was after placement.

I am using a GEE model and a modified passion approach to estimate RRs instead of ORs. I would like to include an offset to account for the population at risk in each group. However, given that my group variable is highly correlated with the log of the population variable (lpop) my RRs go from 7 in the model without offset to 55 when including the offset. I also tried to include lpop as a covariate but stata removes it from the model because of collinearity.

Model :
xtgee mhdiag ib3.group##i.before_after , family(poisson) link(log) corr(independent) offset(lpop) vce(robust) eform

My questions are:
a) Am I using correctly the offset?
b) Can I use the offset option with a binary outcome, or it is only for count?
c) Is there any other way to account for the differences in population between groups?
d) Is the exposure variable ‘group’ already accounting for the population difference so there is no need to include an offset?

Thanks for your help!

Last edited by Fernando Lima; 31 Oct 2024, 21:07.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

01 Nov 2024, 08:40

In this situation there is no role for the -offset()- (or -exposure()-) option. Those are used when the equation, as applied to an individual observation, is attempting to estimate and rate and the numerator is given in the outcome variable while the denominator appears in the -exposure()- option (or its log appears in the -offset()- option). You don't have that here. Your observations correspond to just one mother. The "rate" you are estimating has a denominator that is reflected in the number of observations in the group. That is, any given woman either has (0) or does not have (1) a psychiatric hospitalization. She does not have 1 psychiatric hospitalization per # of women in her group.

The typical use of -offset- or -population- is something like a study of potholes in roads. The outcome variable would be the number of potholes in the road, and the -exposure()- option would contain the number of km of the length of the road. That way the model estimates potholes per km of road.

I would not go so far as to say that these options are used only for count outcomes, but it would be a very unusual dichotomous outcome where they would be appropriate. It's hard to think of a realistic example.
Comment
Fernando Lima

Join Date: Oct 2024

Posts: 4
#3

05 Nov 2024, 20:04

Thank you very much Clyde!
Comment
Fernando Lima

Join Date: Oct 2024

Posts: 4
#4

06 Nov 2024, 22:30

Sorry to bother again.

I was able to get the number of mental health hospitalisations each mother had in the 2 years before and the 2 years after child removal. So now my mhdiag outcome variable is a count variable. The model I am running is the same as before, the only difference is that my outcome is count instead of binary. I realised that I can also run the model with a count outcome to assess service use.

xtgee mhdiag_count ib3.group##i.before_after , family(poisson) link(log) corr(independent) vce(robust) eform

However, again the IRR is extremely high when I add the option offset(lpop) to the model. The IRR, for the model without offset, for care group vs the no contact is 11.29 and when running the same model with the offset is 83.41.
I am not sure what is going on. Is the correlation between my group varaible and lpop included in the offset causing this issue? Am I defining the population variable correctly? My lpop variable takes three different values which are the natural log of the population in each group: ln(1150) for those in care group, ln(8500) for those in the no contact group and ln(4150) for those in the contact group; which is almost perfectly correlated to the group variable, which takes three values (1,2,3) for the same number of observations.

Thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#5

07 Nov 2024, 09:16

There is no role for an exposure (nor offset) in your situation, even with a count variable. You have one observation per mother, and the outcome variable is the number of mental health hospitalizations during a fixed four year period for that one mother. It is not the number of such events for some population of mothers, it is just for that mother. So there is no population involved. (Or, strictly speaking, the population variable is actually a constant 1, and can be ignored)
Comment
Fernando Lima

Join Date: Oct 2024

Posts: 4
#6

07 Nov 2024, 18:15

Very clear now. Thanks so much for your help.

Last edited by Fernando Lima; 07 Nov 2024, 18:17.
Comment

Announcement

offest in gee models and collinearity issues

Comment

Comment

Comment

Comment

Comment