xtlogit or xtgee

John Larsson

Join Date: May 2014

Posts: 33
#1

xtlogit or xtgee

03 Jan 2015, 11:28

Hello,
I have a model with a dichotomous outcome representing a transportation preference, with an N of around 700. The individual observations are associated with a district and I have some grounds to expect that preferences are be more similar within districts, of which there are 15 in the population. I want to make sure I am generating the appropriate standard errors. It seems that there are a couple of options. One could be to use a fixed effects model as in:

xtset district

xtlogit Tpreference i.Predictor, fe

Alternatively, I could use generalized estimating equations, as in:

xtgee Tpreference i.Predictor, link(logit) family(binomial) vce(robust), specifying the appropriate working correlation structure.

Can anyone advise on what is the appropriate model, or any other considerations pertinent to the choice of models?

Much Appreciated,
John L.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

03 Jan 2015, 11:35

Generalized estimating equations estimates the population-averaged effect of the predictor; xtlogit estimates the unit-level effect. These are different.in non-linear models such as logit. So which you should use depends on which effect you are interested in.

See http://www.stata.com/support/faqs/st...tion-averaged/ for a fuller explanation. (That explanation focuses on random effects rather than fixed effects, but the ideas are the same.)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

03 Jan 2015, 15:35

What is really important to be said, it was already commented by Clyde.

I just wish to add some information regarding the coefficients as well as standard errors. Perhaps it might be of some interest.

In general, for binary outcomes, when we compare a mixed model with one random intercept with a GEE with an exchangeable correlation structure, the coefficients from the mixed model tend to be higher (in absolute values) than the ones from GEE. Also, the standard errors tend to be higher in the mixed models.

However, in the outcome variable is continuous, the coefficients and standard errors tend to be practically the same.

Best regards,

Marcos
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

03 Jan 2015, 15:37

What is really important to be said, it was already commented by Clyde.

I just wish to add some information regarding the coefficients as well as standard errors. Perhaps it might be of some interest.

In general, for binary outcomes, when we compare a mixed model with one random intercept with a GEE with an exchangeable correlation structure, the coefficients from the mixed model tend to be higher (in absolute values) than the ones from GEE. Also, the standard errors tend to be higher in the mixed models.

However, if the outcome variable is continuous, the coefficients and standard errors tend to be practically the same.

Best regards,

Marcos
Comment
John Larsson

Join Date: May 2014

Posts: 33
#5

04 Jan 2015, 13:29

Hello,
Thanks Clyde and Marcos for the clarification. I have a follow-up question which perhaps should be another topic. But if anyone has suggestions I'd appreciate it.
I can see a secular trend in my outcome variable over time with increasing probability of endorsement at later times. I would also like to model this trend using either xtlogit ir xtgee.
However, in my case, the panel and time variables would not uniquely identify an observation since the data consists of responses by different people within the same district at the same times.
Is there a way around this, or is there any other apparent way of modeling this time component?
Best regards,
John L.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#6

04 Jan 2015, 14:03

It doesn't matter that the panel and time variables will not identify observations uniquely for this purpose. Just include the time variable in the model (or appropriate variables derived from time if the secular trend isn't linear in the log odds of outcome).

I should add, another issue in modeling secular trend in this setting is whether there is a single secular trend across all districts, or if it varies by district. That is, you need to think about whether an adequate model of the secular trend also requires an interaction with the district effect.

Last edited by Clyde Schechter; 04 Jan 2015, 14:38.
Comment
John Larsson

Join Date: May 2014

Posts: 33
#7

05 Jan 2015, 10:04

Hello Clyde,
Just to clarify. Are you saying that I should just use the panel variable in the xtset command, and leave out the timevar? If I try to include the latter I get the message "repeated time values within panel." However, I don't see a way to, say, experiment with an AR1 working correlation structure if I cannot set timevar in the xtset command.
Regards,
John C.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#8

05 Jan 2015, 10:51

Yes, that's what I had in mind--omit the timevar from the -xtset- command. If you have multiple observations for a single district and time period, then AR1 correlation structure has no meaning.

Perhaps I have misunderstood the nature of your data in the first place. You have indicated that you have multiple individual respondents within districts. Do the same individual respondents have a series of responses gathered over time? If so, you have a three level data set that would probably not be adequately modeled with the two-levels afforded by -xtgee- or -xtlogit-. In that case, you will need to take a look at -melogit-, though as far as I know, it does not accommodate autoregressive errors.

If the individual respondents are different at different times (which is what I assumed in post #6), then the notion of autoregressive correlation is simply not applicable, and I would just omit the timevar from the -xtset- command and forget about autoregressive correlation.
Comment
John Larsson

Join Date: May 2014

Posts: 33
#9

05 Jan 2015, 15:14

Hi Clyde, There are multiple individuals within a district who may have given responses at different times. However, others within the district may also have given responses at the same times. Yes, I think a three-level model might have been more appropriate but perhaps not worth the effort.
Cheers,
John L.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment