Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "2416 observations completely determined. Standard errors questionable"

    Hello all,

    I am running a multinomial logit model, and when I include time dummies [Period 1 = _d1, Period = _d2 etc) I get the message "2416 observations completely determined". If I don't include the time dummies, this isn't a problem. However, I'm using an mlogit within a discrete time survival analysis setting, and according to Willett and Singer (2003), I have to include these time dummies, so leaving them out is not an option.

    My dependent variable is constructed in such a way that 0 = censored, 1 = graduated, 2 = academically withdrew from university. It is not obvious to me why this message is being brought up as in every time period, there are at least two events which happened, and from year 3 onwards, 3 events happening during every time period. How can I combat this problem?

  • #2
    I would presume here that your -mlogit- command references other explanatory variables, say gender, ethnicity, etc. in addition to your time dummies. The problem you describe could occur if, at some time period, all the women of a certain ethnicity were still in school. Try running a model with just the time dummies, then successive models with some of your other predictors until you get the problem you describe.


    Another comment, of a more substantive nature, concerning the use of -mlogit- in this competing events context:

    I have done quite a bit of the same kind of analysis you are doing, i.e., treating student degree completion as a survival process, and trying to account for "withdrawn" and "graduated" as competing events by using a multiomial logit model on a person-period file. As I worked with this kind of a model and thought about it, I became convinced that the IIA violation that will almost certain obtain in a model like this makes this attractively simple approach inadequate for the competing events problem. The literature is rather thin on this issue, to my knowledge. If anyone else on the list here has some good suggestions, I'd be very interested, and perhaps a new thread would in order at that point.

    Regards, Mike

    Comment


    • #3
      Thanks for the answer Mike Lacy.

      I've done as you said, and it seems the "CumulativeGPA" variable is at fault. This is kind of weird, as it is a continuous variable. I don't quite understand.

      Cumulative GPA is basically the cumulative GPA of a university student by year. So if someone is at university for 3 years, their 2nd year cumulative GPA will be a simple average of first + second year GPA, and their 3rd year will be a simple average of 1st + 2nd + 3rd year.

      Comment


      • #4
        I might suspect something funny regarding missing or 0.0 GPAs being linked to withdrawal. - Mike

        Comment


        • #5
          You might consider -stcrreg- to more comprehensively model your data (explicitly considering time-to-event as well as competing risks), although you might have similar problems with separation and quasi-separation.

          Two good primers:

          http://data.princeton.edu/pop509/justices.html
          http://www.ctu.mrc.ac.uk/cascade/con...isks_guide.pdf
          Last edited by Andrew Lover; 03 Aug 2014, 21:46. Reason: Added refs.
          __________________________________________________ __
          Assistant Professor, Department of Biostatistics and Epidemiology
          School of Public Health and Health Sciences
          University of Massachusetts- Amherst

          Comment


          • #6
            Hi Andrew: The event in question, graduated/still in school/withdraw, is discrete, with events that can only occur two times year, when the spring or fall academic terms end. The range of event times in such data in the U.S. context is generally 1, ..., 12, with considerable heaping of events at t = 2 and t = 8-10.

            Is having a lot of ties and discrete event times not matter for the model used in -stcrreg-? I had assumed that it would be problematic there as it would be in a Cox model, but perhaps in modeling cumulative incidence rather than hazard, the issues are different.


            Regards, Mike

            Comment


            • #7
              Hi Mike, I haven't needed to analyze any data with many ties, but code below suggests that it isn't a problem per se. However, there could very well still be issues with separation, esp. if there are many categorical vars.

              Code:
              webuse hypoxia// , clear
              
              * std example
              
              stset dftime, failure(failtype==1)
              stcrreg ifp tumsize pelnode, compete(failtype==2)
              
              * creating ties
              
              gen dftime2 = floor(dftime) + 0.5
              hist dftime2, by(failtype) disc
              
              stset dftime2, failure(failtype==1)
              
              stcrreg ifp tumsize pelnode, compete(failtype==2)
              __________________________________________________ __
              Assistant Professor, Department of Biostatistics and Epidemiology
              School of Public Health and Health Sciences
              University of Massachusetts- Amherst

              Comment


              • #8
                (Slightly) better toy data with discrete fail times. The -st- manual doesn't have too much to say about ties (aside from unrelated info on pg 212). I am not fully convinced about the instantaneous events; but it's encouraging that the CIs don't get unstable. Likely best to check the Fine & Grey 1999 article.


                Code:
                webuse hypoxia// , clear
                
                gen dftime2 = .
                
                replace dftime2 = 1 if dftime < 3.2
                replace dftime2 = 8 if dftime >= 3.2
                
                stset dftime2, failure(failtype==1)
                
                stcrreg ifp tumsize pelnode, compete(failtype==2)
                
                sort dftime
                gen grp2 = .
                
                replace grp2 = 1,, in 1/36
                replace grp2 = 2,, in 37/72
                replace grp2 = 3,, if grp2 ==.
                
                stcrreg ifp tumsize pelnode i.grp2, compete(failtype==2)
                
                stcompet ci=ci, compet(2)
                
                twoway( (line ci _t if failtype==1, c(J) sort) ///
                        (line ci _t if failtype==2, c(J) sort))
                
                exit
                __________________________________________________ __
                Assistant Professor, Department of Biostatistics and Epidemiology
                School of Public Health and Health Sciences
                University of Massachusetts- Amherst

                Comment

                Working...
                X