Moving from encounter-level to daily-level dataset w counts of incident and prevalent episodes for each date

Clark Sherman

Join Date: Mar 2022

Posts: 1
#1

Moving from encounter-level to daily-level dataset w counts of incident and prevalent episodes for each date

25 Mar 2022, 10:56

Hello,

I need guidance about how to move from an encounter-level dataset to a daily-level dataset with both incidence (new episodes) and prevalence (all current episodes) totals for each date. I am a relative beginner in Stata but will do my best to explain the situation and follow the FAQ.

I have an encounter-level dataset with 121,640 observations and 131 variables including an episode start date (admdt) and episode end date (dcdt). All included episodes have a start date within a 12-month period. There is also a binary variable “died” which corresponds to whether the encounter ended in death. The “dataex” command gives me “input statement exceeds linesize limit” so I dropped all but a select few necessary variables relevant to my question below and here it is:

input float(died admdt dcdt)
0 22137 22138
0 22075 22088
0 22106 22109
0 22298 22301
0 22220 22223
1 22070 22085
0 22337 22338
1 22154 22163
0 22175 22176
0 22248 22249
0 22025 22028
0 22064 22071
0 22159 22159
0 22228 22231
0 22250 22257
0 22300 22303
0 22327 22330

My first aim was to create a daily-level dataset with 365 observations corresponding to each unique “admdt” and variables corresponding to 1) the total number of encounters which started on that date, 2) the total number of encounters which started on that date and ended in death (to calculate daily mortality), 3) a few other categories based on other variables. I succeeded in doing this much by first tallying all new episodes and deaths for each admit date and then dropping all duplicates, so I was left with a single observation for each date.

The successful code:

quietly summarize admdt
local datemax=r(max)
local datemin=r(min)
local daterange=`datemax'-`datemin'
display "`daterange'"

gen countall=0
forvalues d = `datemin'(1)`datemax'{
quietly tab admdt if admdt==`d'
replace countall=r(N) if admdt==`d'
}

gen count1=0
forvalues d = `datemin'(1)`datemax'{
quietly tab admdt if admdt==`d' & def1==1
replace count1=r(N) if admdt==`d'
}

**etc additional variables**

bysort admdt: egen diedall=total(died)
bysort admdt: egen died1=total(died) if def1==1
egen max_died1 = max(died1), by(admdt)
drop died1
rename max_died1 died1
replace died1=0 if died1==.

keep admdt count1 countall diedall died1
sort admdt count*
quietly bys admdt count*: gen duplicate= cond(_N==1,0,_n)
tab duplicate
drop if duplicate>1

Here I was left with successful daily-level dataset with 365 observations and variable “countall” with number of incident encounters (admissions), “diedall” with number of admissions ending with death, etc. Great.

MY QUESTION:

I want to make an alternative daily-level dataset which has tallies of all PREVALENT episodes, rather than incident ones. In other words, my first dataset is for new hospital admissions. I want to create a second dataset with all daily hospitalizations—or, how many encounters in this initial dataset overlap with each given date.

So, I go back to my initial encounter-level dataset with 122,000 observations.

My first instinct was to create 365 dummy variables corresponding to each date, then set each date variable=1 if the datevar>=admdt & datevar<=dcdt, and then use total(admdtvar) to tally up all encounters for each date.

I used dummieslab to create the 365 variables from admdt, but I realized I can’t use the variablename as a date in my stata commands. It also is clear this will get bulky very very quickly.

Wondering if any of you pros have suggestions for how to approach this task. I hope I’ve provide enough, but not too much, information.

Thanks so much,
Clark
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

25 Mar 2022, 11:51

Your code for creating the incidence/mortality file is much more complicated than it needs to be. The following code will create both of the files you want. At the end of the code the frame called incidence_mortality will include the incidence and mortality file, and the frame called prevalence wil contain the prevalent admissions file. You can save them, or export them, or whatever from there.

Code:

// COPY THE DATA TO ANOTHER FRAME WHERE WE WILL CREATE THE PREVALENT ADMISSIONS FILE frame copy default prevalence frame rename default incidence_mortality // INCIDENCE AND MORTALITY SUMMARY FILE assert inlist(died, 0, 1) collapse (sum) mortality = died (count) incident_admits = died, by(admdt) format admdt %td tsset admdt tsfill replace mortality = 0 if missing(mortality) replace incident_admits = 0 if missing(incident_admits) // NOW CREATE THE PREVALENT ADMISSIONS FILE frame change prevalence drop died gen long obs_no = _n reshape long @dt, i(obs_no) j(event) string gen int effect = cond(event == "adm", 1, -1) gsort dt -effect gen prevalent_admits = sum(effect) by dt: keep if _n == _N drop event effect obs_no format dt %td tsset dt tsfill replace prevalent_admits = L.prevalent_admits if missing(prevalent_admits)

Notes:

1. The prevalence results will inevitably be wrong in the earliest dates because you will not have taken into account admissions that began prior to the first date in your data set which may still be ongoing.
2. Code for incidence mortality file assumes and verifies that the variable died is always 0 or 1, and is never missing.
3. Because the example data is very sparse, there are many dates where the incident admissions are 0, and the prevalent admissions are just carry-forwards from the last date for which there was something happening. It is extremely unlikely that the real data set will present these problems.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30101

25 Mar 2022, 11:59

Sorry, just realized that the code in #2 wrong assigns the deaths to the date of admission, rather than to dcdt. The following segment of code should replace the block of code creating the incidence and mortality summary file in #2:

Code:

//  INCIDENCE AND MORTALITY SUMMARY FILE
assert inlist(died, 0, 1)
preserve
collapse (count) incident_admits = died, by(admdt)
format admdt %td
rename admdt date
tempfile admits
save `admits'
restore
collapse (sum) mortality = died, by(dcdt)
rename dcdt date
format date %td
merge 1:1 date using `admits', nogenerate
tsset date
tsfill
replace mortality = 0 if missing(mortality)
replace incident_admits = 0 if missing(incident_admits)

Announcement

Moving from encounter-level to daily-level dataset w counts of incident and prevalent episodes for each date

Comment

Comment