Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • selecting first case meeting certain criteria across duplicate ids

    Hi,

    My dataset contains data on hospitalizations with participants each having multiple hospitalizations. I want to select the first hospitalization for each participant that meets certain diagnostic criteria. The diagnostic criteria variable (diag) is coded as 1-5. I am using a time variable (days between baseline assessment and hospitalization--labeled "caseday") to base what is considered the "first" hospitalization. I want the first hospitalization that is diag==1|2|3. IF participants have NO hospitalizations that are 1|2|3, then I want their first hospitalization that is either diag = 4|5|.

    I'm using the following to select their first hosp that is 1|2|3.
    duplicates tag id, generate(duplicate)
    gen diag_include=1 if diag<4
    egen dup_n = min(casedy) if diag_include==1 & duplicate!=0, by(id)
    gen include=1 if dup_n==casedy

    However, I'm getting stuck on how to pull in the cases where there was NO 1|2|3.

    Any advice appreciated. Thanks

  • #2
    On the assumption that the variable diag is never missing (an assumption verified by the code in the first line -- do not proceed if this command produces an error message), you can do this as follows:
    Code:
    assert !missing(diag)
    gen byte priority_dx = diag <= 3
    gsort id -priority_dx caseday
    by id: gen byte select= (_n == 1)
    This code is untested as no example data was provided. In the future, when asking for help with code, please use the -dataex- command and show example data. Although sometimes, as here, it is possible to give an answer that has a reasonable probability of being correct, this is usually not the case. Moreover, such answers are necessarily based on experience-based guesses or intuitions about the nature of your data. When those guesses are wrong, both you and the person trying to help you have wasted their time as you end up with useless code. To avoid this, a -dataex- based example provides all of the information needed to develop and test a solution.

    If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thanks so much,
      This code worked perfectly. In the future, I'll provide example data to optimize the assistance. Thanks for this guidance as well.

      Comment

      Working...
      X