Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help using stset with imputed duration variable for discrete logistic hazard model

    Hi Stata List,

    I am working with panel data and am interested in time to event measured in years. For some respondents, time of event (TOE) is missing due to the design of the survey. There is one additional variable with missing values. I imputed using:

    Code:
    mi set mlong
    
    mi register imputed TOE X2
    
    mi impute chained (pmm, knn(10) omit(X2)) TOE ///
    (pmm, knn(25) bootstrap omit(TOE X1)) X2 = X1 X3 X4 X5, add(20)
    This worked fine and the distributions of both imputed variables look similar to the original unimputed data.
    So then I wanted to estimate a discrete time logistic hazard model of EVENT. I run the following code to stset the data:

    Code:
    mi stset TOE, id(ID) failure(EVENT==1) origin(time 0)
    Where ID is the unique respondent identifier.

    When I try to stset the data, however, I get the following error:

    variable TOE registered as imputed Imputed and passive variables may not be used as the basis for mi stset.

    Imputing the time duration variable makes the time of event vary within ID, and it seems like stset can't handle that. This is an essentially identical setup and error documented here: https://www.statalist.org/forums/for...cox-regression

    The suggestion there is to use a loop to extract each of the imputed datasets, stset it and estimate the model, and then pool all the results at the end.

    Question 1: Is there any other way to do this than using a loop and pooling the results at the end?

    Question 2: If the above approach is necessary, can anyone recommend a standard approach for pooling the results?
    I have looked at this resource: https://www.stata.com/support/faqs/s...nd-chow-tests/
    I do not think this method is appropriate for my needs--I would have to separate the "groups" in order to stset the data and then regress after stset, unless there is another way to do this that makes it possible to pool results as explained in the faq?
    To be clear, I can get the regression results from the return list no problem; I just can't seem to figure out how to put it all together at the end. I can't just take the average of the beta coefficients and the standard errors across the imputed datasets, right?

    Question 3) Should I consider a different approach completely? Is imputing the duration variable not recommended at all? Someone I know recommended single imputation for the duration variable, but I am concerned about underestimating the errors.

    Thank you very much to anyone taking the time to read this and think about a solution.

    With best regards,
    John M. Towey

    Notes:
    1) Using Stata 16 on Windows Server 2012 R2; the data set is sensitive/confidential, and the remote server has no access to SSC or the web from within it.
    2) I first posted my question as a response to the original post on the same error and waited more than a month before posting this question.
    3) I have looked through the text suggested in an answer to the original post: https://www.stata.com/bookstore/surv...-introduction/ I found it informative but without the answer to the particular problem I have (I accessed it online through my library after posting the follow-up question to the original post).
    3) I have looked through other potential posts here on Stata List and consulted other Stata users I know, none of whom had a solution to this problem.

  • #2
    Hey, did you have any luck with this? Or how did you move forward?

    I'm having a similar problem as my failure variable is passive.

    Comment


    • #3
      I'm having the same issue. Any different solution that doesn't imply looping and pooling the results?

      Comment

      Working...
      X