Hello,
I need guidance about how to move from an encounter-level dataset to a daily-level dataset with both incidence (new episodes) and prevalence (all current episodes) totals for each date. I am a relative beginner in Stata but will do my best to explain the situation and follow the FAQ.
I have an encounter-level dataset with 121,640 observations and 131 variables including an episode start date (admdt) and episode end date (dcdt). All included episodes have a start date within a 12-month period. There is also a binary variable “died” which corresponds to whether the encounter ended in death. The “dataex” command gives me “input statement exceeds linesize limit” so I dropped all but a select few necessary variables relevant to my question below and here it is:
input float(died admdt dcdt)
0 22137 22138
0 22075 22088
0 22106 22109
0 22298 22301
0 22220 22223
1 22070 22085
0 22337 22338
1 22154 22163
0 22175 22176
0 22248 22249
0 22025 22028
0 22064 22071
0 22159 22159
0 22228 22231
0 22250 22257
0 22300 22303
0 22327 22330
My first aim was to create a daily-level dataset with 365 observations corresponding to each unique “admdt” and variables corresponding to 1) the total number of encounters which started on that date, 2) the total number of encounters which started on that date and ended in death (to calculate daily mortality), 3) a few other categories based on other variables. I succeeded in doing this much by first tallying all new episodes and deaths for each admit date and then dropping all duplicates, so I was left with a single observation for each date.
The successful code:
quietly summarize admdt
local datemax=r(max)
local datemin=r(min)
local daterange=`datemax'-`datemin'
display "`daterange'"
gen countall=0
forvalues d = `datemin'(1)`datemax'{
quietly tab admdt if admdt==`d'
replace countall=r(N) if admdt==`d'
}
gen count1=0
forvalues d = `datemin'(1)`datemax'{
quietly tab admdt if admdt==`d' & def1==1
replace count1=r(N) if admdt==`d'
}
**etc additional variables**
bysort admdt: egen diedall=total(died)
bysort admdt: egen died1=total(died) if def1==1
egen max_died1 = max(died1), by(admdt)
drop died1
rename max_died1 died1
replace died1=0 if died1==.
keep admdt count1 countall diedall died1
sort admdt count*
quietly bys admdt count*: gen duplicate= cond(_N==1,0,_n)
tab duplicate
drop if duplicate>1
Here I was left with successful daily-level dataset with 365 observations and variable “countall” with number of incident encounters (admissions), “diedall” with number of admissions ending with death, etc. Great.
MY QUESTION:
I want to make an alternative daily-level dataset which has tallies of all PREVALENT episodes, rather than incident ones. In other words, my first dataset is for new hospital admissions. I want to create a second dataset with all daily hospitalizations—or, how many encounters in this initial dataset overlap with each given date.
So, I go back to my initial encounter-level dataset with 122,000 observations.
My first instinct was to create 365 dummy variables corresponding to each date, then set each date variable=1 if the datevar>=admdt & datevar<=dcdt, and then use total(admdtvar) to tally up all encounters for each date.
I used dummieslab to create the 365 variables from admdt, but I realized I can’t use the variablename as a date in my stata commands. It also is clear this will get bulky very very quickly.
Wondering if any of you pros have suggestions for how to approach this task. I hope I’ve provide enough, but not too much, information.
Thanks so much,
Clark
I need guidance about how to move from an encounter-level dataset to a daily-level dataset with both incidence (new episodes) and prevalence (all current episodes) totals for each date. I am a relative beginner in Stata but will do my best to explain the situation and follow the FAQ.
I have an encounter-level dataset with 121,640 observations and 131 variables including an episode start date (admdt) and episode end date (dcdt). All included episodes have a start date within a 12-month period. There is also a binary variable “died” which corresponds to whether the encounter ended in death. The “dataex” command gives me “input statement exceeds linesize limit” so I dropped all but a select few necessary variables relevant to my question below and here it is:
input float(died admdt dcdt)
0 22137 22138
0 22075 22088
0 22106 22109
0 22298 22301
0 22220 22223
1 22070 22085
0 22337 22338
1 22154 22163
0 22175 22176
0 22248 22249
0 22025 22028
0 22064 22071
0 22159 22159
0 22228 22231
0 22250 22257
0 22300 22303
0 22327 22330
My first aim was to create a daily-level dataset with 365 observations corresponding to each unique “admdt” and variables corresponding to 1) the total number of encounters which started on that date, 2) the total number of encounters which started on that date and ended in death (to calculate daily mortality), 3) a few other categories based on other variables. I succeeded in doing this much by first tallying all new episodes and deaths for each admit date and then dropping all duplicates, so I was left with a single observation for each date.
The successful code:
quietly summarize admdt
local datemax=r(max)
local datemin=r(min)
local daterange=`datemax'-`datemin'
display "`daterange'"
gen countall=0
forvalues d = `datemin'(1)`datemax'{
quietly tab admdt if admdt==`d'
replace countall=r(N) if admdt==`d'
}
gen count1=0
forvalues d = `datemin'(1)`datemax'{
quietly tab admdt if admdt==`d' & def1==1
replace count1=r(N) if admdt==`d'
}
**etc additional variables**
bysort admdt: egen diedall=total(died)
bysort admdt: egen died1=total(died) if def1==1
egen max_died1 = max(died1), by(admdt)
drop died1
rename max_died1 died1
replace died1=0 if died1==.
keep admdt count1 countall diedall died1
sort admdt count*
quietly bys admdt count*: gen duplicate= cond(_N==1,0,_n)
tab duplicate
drop if duplicate>1
Here I was left with successful daily-level dataset with 365 observations and variable “countall” with number of incident encounters (admissions), “diedall” with number of admissions ending with death, etc. Great.
MY QUESTION:
I want to make an alternative daily-level dataset which has tallies of all PREVALENT episodes, rather than incident ones. In other words, my first dataset is for new hospital admissions. I want to create a second dataset with all daily hospitalizations—or, how many encounters in this initial dataset overlap with each given date.
So, I go back to my initial encounter-level dataset with 122,000 observations.
My first instinct was to create 365 dummy variables corresponding to each date, then set each date variable=1 if the datevar>=admdt & datevar<=dcdt, and then use total(admdtvar) to tally up all encounters for each date.
I used dummieslab to create the 365 variables from admdt, but I realized I can’t use the variablename as a date in my stata commands. It also is clear this will get bulky very very quickly.
Wondering if any of you pros have suggestions for how to approach this task. I hope I’ve provide enough, but not too much, information.
Thanks so much,
Clark
Comment