"2416 observations completely determined. Standard errors questionable"

Chris Rooney

Join Date: Apr 2014

Posts: 167
#1

"2416 observations completely determined. Standard errors questionable"

03 Aug 2014, 07:23

Hello all,

I am running a multinomial logit model, and when I include time dummies [Period 1 = _d1, Period = _d2 etc) I get the message "2416 observations completely determined". If I don't include the time dummies, this isn't a problem. However, I'm using an mlogit within a discrete time survival analysis setting, and according to Willett and Singer (2003), I have to include these time dummies, so leaving them out is not an option.

My dependent variable is constructed in such a way that 0 = censored, 1 = graduated, 2 = academically withdrew from university. It is not obvious to me why this message is being brought up as in every time period, there are at least two events which happened, and from year 3 onwards, 3 events happening during every time period. How can I combat this problem?
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#2

03 Aug 2014, 09:00

I would presume here that your -mlogit- command references other explanatory variables, say gender, ethnicity, etc. in addition to your time dummies. The problem you describe could occur if, at some time period, all the women of a certain ethnicity were still in school. Try running a model with just the time dummies, then successive models with some of your other predictors until you get the problem you describe.

Another comment, of a more substantive nature, concerning the use of -mlogit- in this competing events context:

I have done quite a bit of the same kind of analysis you are doing, i.e., treating student degree completion as a survival process, and trying to account for "withdrawn" and "graduated" as competing events by using a multiomial logit model on a person-period file. As I worked with this kind of a model and thought about it, I became convinced that the IIA violation that will almost certain obtain in a model like this makes this attractively simple approach inadequate for the competing events problem. The literature is rather thin on this issue, to my knowledge. If anyone else on the list here has some good suggestions, I'd be very interested, and perhaps a new thread would in order at that point.

Regards, Mike
Comment
Chris Rooney

Join Date: Apr 2014

Posts: 167
#3

03 Aug 2014, 11:03

Thanks for the answer Mike Lacy.

I've done as you said, and it seems the "CumulativeGPA" variable is at fault. This is kind of weird, as it is a continuous variable. I don't quite understand.

Cumulative GPA is basically the cumulative GPA of a university student by year. So if someone is at university for 3 years, their 2nd year cumulative GPA will be a simple average of first + second year GPA, and their 3rd year will be a simple average of 1st + 2nd + 3rd year.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#4

03 Aug 2014, 12:25

I might suspect something funny regarding missing or 0.0 GPAs being linked to withdrawal. - Mike
Comment
Andrew Lover

Join Date: Apr 2014

Posts: 182
#5

03 Aug 2014, 20:44

You might consider -stcrreg- to more comprehensively model your data (explicitly considering time-to-event as well as competing risks), although you might have similar problems with separation and quasi-separation.

Two good primers:

http://data.princeton.edu/pop509/justices.html
http://www.ctu.mrc.ac.uk/cascade/con...isks_guide.pdf

Last edited by Andrew Lover; 03 Aug 2014, 20:46. Reason: Added refs.

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#6

04 Aug 2014, 14:02

Hi Andrew: The event in question, graduated/still in school/withdraw, is discrete, with events that can only occur two times year, when the spring or fall academic terms end. The range of event times in such data in the U.S. context is generally 1, ..., 12, with considerable heaping of events at t = 2 and t = 8-10.

Is having a lot of ties and discrete event times not matter for the model used in -stcrreg-? I had assumed that it would be problematic there as it would be in a Cox model, but perhaps in modeling cumulative incidence rather than hazard, the issues are different.

Regards, Mike
Comment
Andrew Lover

Join Date: Apr 2014

Posts: 182
#7

04 Aug 2014, 14:38

Hi Mike, I haven't needed to analyze any data with many ties, but code below suggests that it isn't a problem per se. However, there could very well still be issues with separation, esp. if there are many categorical vars.

Code:

webuse hypoxia// , clear * std example stset dftime, failure(failtype==1) stcrreg ifp tumsize pelnode, compete(failtype==2) * creating ties gen dftime2 = floor(dftime) + 0.5 hist dftime2, by(failtype) disc stset dftime2, failure(failtype==1) stcrreg ifp tumsize pelnode, compete(failtype==2)

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
Comment
Andrew Lover

Join Date: Apr 2014

Posts: 182
#8

04 Aug 2014, 19:17

(Slightly) better toy data with discrete fail times. The -st- manual doesn't have too much to say about ties (aside from unrelated info on pg 212). I am not fully convinced about the instantaneous events; but it's encouraging that the CIs don't get unstable. Likely best to check the Fine & Grey 1999 article.

Code:

webuse hypoxia// , clear gen dftime2 = . replace dftime2 = 1 if dftime < 3.2 replace dftime2 = 8 if dftime >= 3.2 stset dftime2, failure(failtype==1) stcrreg ifp tumsize pelnode, compete(failtype==2) sort dftime gen grp2 = . replace grp2 = 1,, in 1/36 replace grp2 = 2,, in 37/72 replace grp2 = 3,, if grp2 ==. stcrreg ifp tumsize pelnode i.grp2, compete(failtype==2) stcompet ci=ci, compet(2) twoway( (line ci _t if failtype==1, c(J) sort) /// (line ci _t if failtype==2, c(J) sort)) exit

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
Comment

Announcement

"2416 observations completely determined. Standard errors questionable"

Comment

Comment

Comment

Comment

Comment

Comment

Comment