Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to choose which duplicates to drop/keep?

    Hi there,

    I'm working on data from 2012-2018 concerning HIV-positive pregnant women in treatment and loss to follow-up. For those who have been in treatment more than once in relation to having more children, their ID occurs two, three or four times. The information attached to the duplicate ID's are, among other, start and end date (some end dates are missing) of treatment. I need to maintain the "latest" IDs and date variables for those that occur more than once in order to trace their lastes contact (among those where end date is missing) with the clinic and register whether or not they are LTFU. Each woman is only supposed to occur once during the study period. If I make a simple "duplicates drop idp, force" I will have to use the start date of their first treatment and their latest contact with the clinic from later visits to calculates their follow-up time which then will be too long.

    Any thoughts? (I'm using Stata 14 on a Mac)

    Thank you!

    Best regards, Laura

    Click image for larger version

Name:	Skærmbillede 2019-04-30 kl. 08.26.08.png
Views:	1
Size:	12.9 KB
ID:	1495855


  • #2
    This does not sound like a case for duplicates at all. Perhaps you should work with maximum and minimum dates. For example, commands of the form

    Code:
    egen max_y = max(y), by(idp)
    
    egen min_y = min(y), by(idp)
    will calculate first and last dates. If you wish you can then do analyses conditional on an observation being the first date, or the last date.

    Code:
    .... if y == max_y
    Most drastic of all would be to reduce the dataset to one observation per patient.

    Code:
    keep if y == max_y
    Here naturally y is generic as other than idp you don't give any variable names.

    Comment

    Working...
    X