Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Moving from encounter-level to daily-level dataset w counts of incident and prevalent episodes for each date

    Hello,

    I need guidance about how to move from an encounter-level dataset to a daily-level dataset with both incidence (new episodes) and prevalence (all current episodes) totals for each date. I am a relative beginner in Stata but will do my best to explain the situation and follow the FAQ.

    I have an encounter-level dataset with 121,640 observations and 131 variables including an episode start date (admdt) and episode end date (dcdt). All included episodes have a start date within a 12-month period. There is also a binary variable “died” which corresponds to whether the encounter ended in death. The “dataex” command gives me “input statement exceeds linesize limit” so I dropped all but a select few necessary variables relevant to my question below and here it is:

    input float(died admdt dcdt)
    0 22137 22138
    0 22075 22088
    0 22106 22109
    0 22298 22301
    0 22220 22223
    1 22070 22085
    0 22337 22338
    1 22154 22163
    0 22175 22176
    0 22248 22249
    0 22025 22028
    0 22064 22071
    0 22159 22159
    0 22228 22231
    0 22250 22257
    0 22300 22303
    0 22327 22330

    My first aim was to create a daily-level dataset with 365 observations corresponding to each unique “admdt” and variables corresponding to 1) the total number of encounters which started on that date, 2) the total number of encounters which started on that date and ended in death (to calculate daily mortality), 3) a few other categories based on other variables. I succeeded in doing this much by first tallying all new episodes and deaths for each admit date and then dropping all duplicates, so I was left with a single observation for each date.

    The successful code:

    quietly summarize admdt
    local datemax=r(max)
    local datemin=r(min)
    local daterange=`datemax'-`datemin'
    display "`daterange'"

    gen countall=0
    forvalues d = `datemin'(1)`datemax'{
    quietly tab admdt if admdt==`d'
    replace countall=r(N) if admdt==`d'
    }

    gen count1=0
    forvalues d = `datemin'(1)`datemax'{
    quietly tab admdt if admdt==`d' & def1==1
    replace count1=r(N) if admdt==`d'
    }

    **etc additional variables**

    bysort admdt: egen diedall=total(died)
    bysort admdt: egen died1=total(died) if def1==1
    egen max_died1 = max(died1), by(admdt)
    drop died1
    rename max_died1 died1
    replace died1=0 if died1==.

    keep admdt count1 countall diedall died1
    sort admdt count*
    quietly bys admdt count*: gen duplicate= cond(_N==1,0,_n)
    tab duplicate
    drop if duplicate>1

    Here I was left with successful daily-level dataset with 365 observations and variable “countall” with number of incident encounters (admissions), “diedall” with number of admissions ending with death, etc. Great.



    MY QUESTION:

    I want to make an alternative daily-level dataset which has tallies of all PREVALENT episodes, rather than incident ones. In other words, my first dataset is for new hospital admissions. I want to create a second dataset with all daily hospitalizations—or, how many encounters in this initial dataset overlap with each given date.


    So, I go back to my initial encounter-level dataset with 122,000 observations.

    My first instinct was to create 365 dummy variables corresponding to each date, then set each date variable=1 if the datevar>=admdt & datevar<=dcdt, and then use total(admdtvar) to tally up all encounters for each date.

    I used dummieslab to create the 365 variables from admdt, but I realized I can’t use the variablename as a date in my stata commands. It also is clear this will get bulky very very quickly.

    Wondering if any of you pros have suggestions for how to approach this task. I hope I’ve provide enough, but not too much, information.

    Thanks so much,
    Clark

  • #2
    Your code for creating the incidence/mortality file is much more complicated than it needs to be. The following code will create both of the files you want. At the end of the code the frame called incidence_mortality will include the incidence and mortality file, and the frame called prevalence wil contain the prevalent admissions file. You can save them, or export them, or whatever from there.

    Code:
    //  COPY THE DATA TO ANOTHER FRAME WHERE WE WILL CREATE THE PREVALENT ADMISSIONS FILE
    frame copy default prevalence
    frame rename default incidence_mortality
    
    //  INCIDENCE AND MORTALITY SUMMARY FILE
    assert inlist(died, 0, 1)
    collapse (sum) mortality = died (count) incident_admits = died, by(admdt)
    format admdt %td
    tsset admdt
    tsfill
    replace mortality = 0 if missing(mortality)
    replace incident_admits = 0 if missing(incident_admits)
    
    //  NOW CREATE THE PREVALENT ADMISSIONS FILE
    frame change prevalence
    drop died
    gen long obs_no = _n
    reshape long @dt, i(obs_no) j(event) string
    gen int effect = cond(event == "adm", 1, -1)
    gsort dt -effect
    gen prevalent_admits = sum(effect)
    by dt: keep if _n == _N
    drop event effect obs_no
    format dt %td
    tsset dt
    tsfill
    replace prevalent_admits = L.prevalent_admits if missing(prevalent_admits)
    Notes:

    1. The prevalence results will inevitably be wrong in the earliest dates because you will not have taken into account admissions that began prior to the first date in your data set which may still be ongoing.
    2. Code for incidence mortality file assumes and verifies that the variable died is always 0 or 1, and is never missing.
    3. Because the example data is very sparse, there are many dates where the incident admissions are 0, and the prevalent admissions are just carry-forwards from the last date for which there was something happening. It is extremely unlikely that the real data set will present these problems.

    Comment


    • #3
      Sorry, just realized that the code in #2 wrong assigns the deaths to the date of admission, rather than to dcdt. The following segment of code should replace the block of code creating the incidence and mortality summary file in #2:
      Code:
      //  INCIDENCE AND MORTALITY SUMMARY FILE
      assert inlist(died, 0, 1)
      preserve
      collapse (count) incident_admits = died, by(admdt)
      format admdt %td
      rename admdt date
      tempfile admits
      save `admits'
      restore
      collapse (sum) mortality = died, by(dcdt)
      rename dcdt date
      format date %td
      merge 1:1 date using `admits', nogenerate
      tsset date
      tsfill
      replace mortality = 0 if missing(mortality)
      replace incident_admits = 0 if missing(incident_admits)

      Comment

      Working...
      X