Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Seeking Advice on Handling Missing Data in Survival Analysis Dataset

    Dear List Members,

    I am currently working with a time-to-event dataset for a survival analysis on farmed goats. I wanted to address a particular aspect of the dataset to ensure accurate analysis.

    In the dataset, we have recorded the date of birth (dob) for each individual goat. However, for goats that have either died or were sold (censored) before the end of the study, we only have information about the date of death (dod). There is no “dod” information available for goats that are still alive.

    To calculate the survival time, I used the following approach:

    gen start_date = date(dob, "MDY")
    gen end_date = date(dod, "MDY")
    gen time = end_date - start_date

    In the dataset, dead goats are coded as "1," while goats that were sold (censored) or are still alive are coded as "0." This coding indicates whether an event (death) has occurred or not, respectively. Values for the variables "dod," "start_date," and "end_date" for all the living goats on the farm are currently empty in the dataset’s editor.

    Considering this setup, I would greatly appreciate your expertise and insights regarding the appropriateness of my approach. If there are any recommendations or suggestions for proceeding with the analysis to ensure accuracy and reliability, your input would be invaluable. I am using Stata IC v.14.

    Thank you for your time and assistance.

    Best regards,

    Aminu

    Dr. Aminu Shittu
    Department of Theriogenology and Animal Production
    Faculty of Veterinary Medicine
    Usmanu Danfodiyo University Sokoto
    Sokoto State
    Nigeria.

  • #2
    As you have no date of death for the censored goats, your time variable will have a missing value for those goats. As a result they will be excluded altogether from the analysis. This is less than ideal. Ideally, you would have information on the date they were sold. Then you would set your time variable equal to date_sold - dob. This, combined with your 1/0 variable representing dead/sold being designated as the -failure()- option in you -stset- command would properly treat these goats as censored observations.

    Comment


    • #3
      Hi Clyde.

      Thanks for the initial thoughts.

      In the description of dataset above, I mentioned that only the date of death (dod) of all the goats alive is not recorded; however, their event status with the goats sold (censored) was recorded as zero (0), and dead goats as one (1). Is it appropriate assign today's date, for example, to all the goats alive or could there be a way of retaining them instead of dropping them out of the analysis?

      Aminu.

      Comment


      • #4
        Is it appropriate assign today's date, for example, to all the goats alive or could there be a way of retaining them instead of dropping them out of the analysis?
        Maybe. Do you know for a fact that all of the goats that were sold are really still alive? Maybe some of them died after being sold, and you don't know about that. The time value for a censored goat (one whose death you do not know the date of, for whatever reason) should be calculated based on the last date at which they were known to be alive.

        From your description in #1, I would take that to be the date of sale: I assume that nobody is buying dead goats. If, in fact, you know that the sold goats are all still alive today, then, today would be the last date at which they are known to be alive, and your proposal in #3 would work. But if it is possible that, unbeknownst to you, some of the goats died subsequent to sale, your proposal would overstate the survival of those now-dead goats and would be incorrect.

        Comment

        Working...
        X