Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate time distribution variable for duration models with discrete data

    Hi,

    I want to run a parametric duration model with discrete data and struggle to find out how to generate the different distributions.

    From the Stata Manual I know that I have to generate a time variable representing the wanted distribution and include
    this variable as independent variable in my regression.

    The Manual mention that the corresponding time variable for a Weibull distribution is log(time),
    but what about the other distributions Gamma, Loglogistic, Lognormal, Gompertz and Exponential?

    Can anybody tell me how to compute these time variable or where to find this information?

    Thanks a lot.

  • #2
    The Stata Manual (ST) refers to continuous time models only. For models with discrete time / interval-censored / grouped survival time data, you need a different approach. For some free materials, see http://www.iser.essex.ac.uk/survival-analysis especially Lesson 6 and the corresponding section in the "Survival Analysis" manuscript.

    Comment


    • #3
      Thank you Prof. Jenkins for sharing all that material, it is very useful.

      After reading it I am still a little bit lost. You mention in Ch.6 (and Ch.3, and the exercise) that there is the possibility of using log(time) to create a weibull distribution. Time dummies or interval dummies would create a semi-parametric approach similar to the Cox- Model or piece-wise constant exponential Model. And you mention a cubical distribution (t^1,t^2,t^3).

      - Is the use of cubical time variable similar to one of the distributions used in the continuous case? And what would be the approach for the other distributions?

      - Another question is how do I decide if I should use the logistic (logit) or complementary log-log (cloglog) function. Do I only compare AIC and BIC?

      Thanks again

      Comment


      • #4
        Q1. I think I wrote (or should have) about the discrete time analogue to a continuous time Weibull distribution, not to a Weibull distribution itself. The correspondence is not exact for reasons explained in my manuscript. [One can, in principle, fit a continuous time Weibull model to interval-censored data, but you can't use the "easy estimation" methods that I write about. See intcens on SSC.]

        Q2. There is no direct correspondence between a polynomial specification for "time" in the discrete case and in the continuous case (with the exception of the linear model)

        Q3. The models are not nested so, yes, criteria such as you cite could be used. Other criteria are relevant too, notably the interpretation of the estimated parameters (of which more in the materials cited). In practice, the models tend to have similar fit.

        Be sure that you understand the differences in the nature of "time" in discrete time (interval censored) and continuous time models. In the latter, survival time is a continuous variable; in the former, it really is discrete -- one does not observe the precise point within the interval where events occur (if they do). The "time" variable in the discrete time regression specifications that you cite above refers to "time" counted as an integer number of time intervals (since first at risk of the event), not to the exact time at a risk since first at risk. Clearly, as the time-interval width gets smaller and smaller, a discrete time model will more and more closely approximate some continuous time model

        Comment


        • #5
          Thank you very much that helps a lot.

          Comment


          • #6
            Hello Stephen Jenkins sir,

            I am trying to fit a discrete hazard model for neonatal deaths using DHS data. I also have doubts on which transformation to be used while fitting a logistic model for the baseline hazard. Should I use log of time or square or cube the time variable?

            Also by nested model, do you mean hierarchical/multilevel model? Which one should I go for (logistic or cloglog) if I want to fit multilevel survival model to neonatal deaths? Will melogit work fine for logistic model?

            Comment


            • #7
              Should I use log of time or square or cube the time variable?
              Lifetable estimates of the hazard function should give you some idea of how the hazard varies with elapsed duration. Then choose some function of elapsed duration ("time") that mimics that.

              Also by nested model, do you mean hierarchical/multilevel model?
              No, the issue is whether one model is a special case of the other (whether by constraining the parameter(s) of one model you would end up with the other).

              I'm not going to comment on your question about "multilevel" modelling. I think you need to first back up and think what the model is that you are trying to fit. (What are the assumptions? Are they appropriate? What have others in your field used?).

              Comment

              Working...
              X