Combining multiple ICD-10 codes into one encounter

Sarah Haynes

Join Date: Dec 2016
Posts: 84

Combining multiple ICD-10 codes into one encounter

11 Feb 2021, 14:56

Hello,

I have an encounter-level dataset that has participant IDs (pid), visit dates (appt_date) and one or more ICD-10 codes (current_icd10_list). , Unfortunately, some of the visits are counted as multiple encounters with a different ICD-10 code for each encounter (see pid=6 in my dataex example), rather than appropriately as one encounter with a list of ICD-10 codes (as in pid=40 in my dataex example). Can you help me figure out how to combine encounters that occur for the same pid on the same date into one encounter?

Thank you,

Sarah

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float pid str30 current_icd10_list str8 appt_date
 1 "NA"           "10/21/19"
 2 "NA"           "9/4/19"  
 3 "E03.9"        "9/3/19"  
 4 "NA"           "9/3/19"  
 5 "M81.0"        "9/17/19" 
 6 "E03.9"        "9/12/19" 
 6 "M81.0"        "9/12/19" 
 6 "E55.9"        "9/12/19" 
 6 "E29.1"        "9/12/19" 
 7 "Z86.39"       "9/12/19" 
 8 "NA"           "9/6/19"  
 9 "NA"           "9/9/19"  
10 "E29.1"        "9/5/19"  
11 "E05.00"       "9/3/19"  
12 "M81.0"        "9/3/19"  
13 "E03.9"        "9/5/19"  
14 "E27.40"       "9/12/19" 
15 "NA"           "9/11/19" 
16 "E89.0"        "9/9/19"  
16 "E05.00"       "9/9/19"  
17 "E11.9"        "9/9/19"  
18 "E03.9"        "9/3/19"  
19 "E11.9"        "9/12/19" 
20 "NA"           "10/1/19" 
21 "NA"           "9/4/19"  
22 "NA"           "9/10/19" 
23 "NA"           "9/10/19" 
24 "NA"           "9/9/19"  
25 "NA"           "9/10/19" 
26 "NA"           "9/13/19" 
27 "M81.0"        "9/9/19"  
27 "E83.42"       "9/9/19"  
28 "NA"           "9/16/19" 
29 "NA"           "9/23/19" 
30 "NA"           "9/17/19" 
31 "R97.20"       "9/12/19" 
31 "E29.1"        "9/12/19" 
32 "NA"           "9/11/19" 
33 "R74.8"        "9/9/19"  
33 "E05.00"       "9/9/19"  
33 "D64.9"        "9/9/19"  
33 "B18.2"        "9/9/19"  
33 "N18.30"       "9/9/19"  
33 "N18.30"       "9/9/19"  
33 "I10"          "9/9/19"  
33 "E05.00"       "9/9/19"  
33 "I42.8"        "9/9/19"  
34 "E10.8"        "9/16/19" 
35 "NA"           "9/11/19" 
36 "E27.1"        "9/11/19" 
37 "NA"           "9/11/19" 
38 "M81.0"        "9/10/19" 
38 "M06.9"        "9/10/19" 
38 "E83.42"       "9/10/19" 
39 "NA"           "9/23/19" 
40 "E11.9, Z79.4" "9/16/19" 
41 "NA"           "9/17/19" 
42 "M81.0"        "9/17/19" 
43 "E87.1"        "9/11/19" 
43 "M19.90"       "9/11/19" 
43 "M81.0"        "9/11/19" 
44 "NA"           "9/16/19" 
45 "M81.0"        "9/25/19" 
45 "E03.9"        "9/25/19" 
46 "M81.0"        "9/11/19" 
47 "S72.446S"     "9/11/19" 
47 "M81.0"        "9/11/19" 
47 "Q79.60"       "9/11/19" 
47 "E55.9"        "9/11/19" 
48 "NA"           "9/13/19" 
49 "E78.5"        "9/17/19" 
49 "E06.3"        "9/17/19" 
49 "E11.9"        "9/17/19" 
50 "NA"           "9/9/19"  
51 "NA"           "9/10/19" 
52 "NA"           "9/19/19" 
53 "NA"           "9/16/19" 
54 "NA"           "9/18/19" 
55 "N92.6"        "9/12/19" 
55 "R53.83"       "9/12/19" 
55 "E06.3"        "9/12/19" 
56 "E03.9"        "9/12/19" 
57 "M81.0"        "9/18/19" 
58 "E03.9"        "9/17/19" 
59 "NA"           "9/19/19" 
60 "NA"           "9/18/19" 
61 "NA"           "9/9/19"  
62 "E89.0"        "9/16/19" 
62 "Z85.850"      "9/16/19" 
62 "E55.9"        "9/16/19" 
63 "D35.2"        "9/24/19" 
64 "NA"           "9/17/19" 
65 "E55.9"        "9/23/19" 
65 "E89.0"        "9/23/19" 
65 "Z85.850"      "9/23/19" 
66 "NA"           "9/17/19" 
67 "E11.9"        "9/27/19" 
68 "E55.9"        "9/16/19" 
68 "E04.1"        "9/16/19" 
68 "E03.9"        "9/16/19" 
end

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

11 Feb 2021, 16:54

Code:

by pid appt_date, sort: gen wanted = current_icd10_list[1] if _n == 1 by pid appt_date: replace wanted = wanted[_n-1] + ", " + current_icd10_list /// if _n > 1 & !inlist(current_icd10_list, "", "NA") by pid appt_date: keep if _n == _N drop current_icd10_list

Are you sure you really want to do that? I work with data containing lists of ICD codes often, and I always find myself doing the reverse: trying to split up a variable that has a bunch of ICD codes into separate observations with single codes. There is very little data analysis you can effectively do when the codes are all smashed into a single observation. Just sayin'.
1 like
Comment
Sarah Haynes

Join Date: Dec 2016

Posts: 84
#3

11 Feb 2021, 20:24

Thanks Clyde!

Yes I totally understand what you're saying. For this case though I'm trying to remove visits that don't have one of a very specific set. For the ones that were smashed together I successfully used:
split current_icd10_list, parse(,) destring gen(dx)
to separate them into multiple variables dx1-4. But I just want one line per visit.

Does that make sense?

Sarah
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

12 Feb 2021, 11:06

Well, it makes sense, but that is not the best way to do it. This can be done better with the data in the original layout. Let's say you are looking to keep only visits with at least one E27 code, just to illustrate the approach:

Code:

by pid appt_date, sort: egen has_e27 = max(strpos(current_icd10_list, "E27")) keep if has_e27

If the specific ICD10 codes you are looking for do not lend themselves to a simple string match of the type with -strpos()- I just showed, the -inlist()- function for picking out the codes will do. Also useful for this is the -icd10 generate- command which can create an indicator variable corresponding to a list of ICD10 codes in the -range()- option. Do read the help files for -inlist()- and -icd10- if you are not familiar with these functions.
1 like
Comment
Sarah Haynes

Join Date: Dec 2016

Posts: 84
#5

12 Feb 2021, 11:49

Hi Clyde,

I tried your approach but for some reason it didn't drop all of the visits that do not have one of the desired codes (which are E08.*, E09.*, E10.*, and E11.*). Specifically, I ran:
by pid appt_date, sort: egen has_e08 = max(strpos(current_icd10_list, "E08"))
by pid appt_date, sort: egen has_e09 = max(strpos(current_icd10_list, "E09"))
by pid appt_date, sort: egen has_e10 = max(strpos(current_icd10_list, "E10"))
by pid appt_date, sort: egen has_e11 = max(strpos(current_icd10_list, "E11"))
drop if (has_e08==0 & has_e09==0 & has_e10==0 & has_e11==0)

On the other hand, if I run:
gen diabetes = 0
foreach v of var dx* {
foreach s in E10. E11. E08. E09 {
replace diabetes = 1 if strpos(`v', "`s'")
}

}

AND THEN run your first response:

by pid appt_date, sort: gen list2 = current_icd10_list[1] if _n == 1
by pid appt_date: replace list2 = list2[_n-1] + ", " + current_icd10_list ///
if _n > 1 & !inlist(current_icd10_list, "", "NA")
by pid appt_date: keep if _n == _N

This does seem to do the trick.
Comment
Sarah Haynes

Join Date: Dec 2016

Posts: 84
#6

12 Feb 2021, 11:51

I should mention I first ran: split current_icd10_list, parse(,) destring gen(dx).

Does this make sense do you think or am I missing anything?
Comment
Sarah Haynes

Join Date: Dec 2016

Posts: 84
#7

12 Feb 2021, 12:28

Clyde: There is a problem with your code I think. When I run "by pid appt_date: keep if _n==_N" it deletes all of the observations that were created in for list2 (or "wanted"). Can you help?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

12 Feb 2021, 12:28

When I wrote #4, I had forgotten that your starting data includes some comma-separated lists in the icd10 variable. So it did need the kind of prior preparation you note. Here's how I would do this:

Code:

split current_icd10_list, parse(",") gen(dx) gen long obs_no = _n reshape long dx, i(obs_no) drop if missing(dx) & _j > 1 drop current_icd10_list _j icd10 generate dm_code = dx, range(E08* E09* E10* E11*) by pid appt_date, sort: egen byte keeper = max(dm_code) keep if keeper

Added: Crossed with #7.

When I run "by pid appt_date: keep if _n==_N" it deletes all of the observations that were created in for list2 (or "wanted"). Can you help?

Can you describe what is happening in more detail and post a new data example that exhibits the problem you are having. Bear in mind that the variable wanted is built up by adding one ICD code at a time in successive observations. Only the final observation for a given pid appt_date has the full list of ICD10 codes. So to simplify the data set, that -keep- command gets rid of all the earlier observations, which contain no information that isn't also in the final retained one.

But as I ponder this, I can imagine that you have other variables not shown in your example, that may vary across observations within pid appt_date, and which you need to retain for other purposes. If that is the issue, my code destroys those. The way to resolve that is to eliminate the -by pid appt_date: keep if _n == _N- command and replace it with:

Code:

by pid appt_date: replace wanted = wanted[_N]

That way the full list of ICD10 codes is copied to all of the original observations, and you will retain any information in other variables.

All of that said, I still recommend just staying with the longer layout and using the code shown at the top of this post.

Last edited by Clyde Schechter; 12 Feb 2021, 12:38.
Comment

Sarah Haynes

Join Date: Dec 2016
Posts: 84

12 Feb 2021, 13:09

Hi Clyde,

I redid using your updated code:
split current_icd10_list, parse(",") gen(dx)
gen long obs_no = _n
reshape long dx, i(obs_no)
drop if missing(dx) & _j > 1
drop current_icd10_list _j

icd10 generate dm_code = dx, range(E08* E09* E10* E11* E13*)
by pid appt_date, sort: egen byte keeper = max(dm_code)
keep if keeper

That has given me the dataset below (see dataex example). But I feel like I'm just back where I started now. I still have numerous rows for one single appointment. To clarify, I want my dataset to have a unique row for each appt_date within each pid, with then a variable for each dx. Like this:

pid	appt_date	dx1	dx2	dx3
1	2/5/20	E10.8	E11.9	E11.65
1	2/20/20	E10.8
1	5/20/20	E9.0
2	6/2/20	E11.65
2	8/9/20	E11.65

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float pid str8 appt_date str9 dx byte dm_code
 17 "9/9/19"   "E11.9"    1
 19 "9/12/19"  "E11.9"    1
 34 "9/16/19"  "E10.8"    1
 40 "9/16/19"  "E11.9"    1
 40 "9/16/19"  " Z79.4"   0
 49 "9/17/19"  "E11.9"    1
 49 "9/17/19"  "E78.5"    0
 49 "9/17/19"  "E06.3"    0
 67 "9/27/19"  "E11.9"    1
 69 "9/19/19"  "E11.9"    1
 77 "9/17/19"  "E11.65"   1
 77 "9/17/19"  "I25.10"   0
 77 "9/17/19"  " Z98.61"  0
 77 "9/17/19"  "E78.5"    0
 77 "9/17/19"  "E08.69"   1
 81 "9/25/19"  "E11.9"    1
 82 "9/27/19"  "E13.9"    1
 87 "9/23/19"  "E11.9"    1
 91 "9/24/19"  "E11.69"   1
 91 "9/24/19"  " K86.9"   0
107 "9/23/19"  " Z79.4"   0
107 "9/23/19"  "E11.9"    1
108 "9/23/19"  " Z79.4"   0
108 "9/23/19"  "E11.9"    1
120 "10/7/19"  "E11.9"    1
142 "9/23/19"  "E10.9"    1
152 "10/8/19"  "E11.9"    1
193 "9/20/19"  "E10.42"   1
193 "9/20/19"  " E10.65"  1
196 "10/8/19"  "E10.9"    1
198 "9/10/19"  "E13.9"    1
214 "10/11/19" "E10.9"    1
216 "10/18/19" "E03.9"    0
216 "10/18/19" "M81.0"    0
216 "10/18/19" "E10.51"   1
234 "10/15/19" "E11.9"    1
238 "10/17/19" "E27.49"   0
238 "10/17/19" "E23.2"    0
238 "10/17/19" "D44.4"    0
238 "10/17/19" "E03.8"    0
238 "10/17/19" "E23.0"    0
238 "10/17/19" "E11.9"    1
245 "10/15/19" "C73"      0
245 "10/15/19" "E11.9"    1
254 "9/4/19"   "E11.9"    1
255 "9/3/19"   "E10.8"    1
256 "9/17/19"  "G40.909"  0
256 "9/17/19"  "E10.9"    1
257 "10/25/19" " R80.9"   0
257 "10/25/19" "E11.29"   1
259 "9/17/19"  "E78.00"   0
259 "9/17/19"  "E11.8"    1
259 "9/17/19"  " Z79.4"   0
261 "11/12/19" "K74.69"   0
261 "11/12/19" "E08.00"   1
261 "11/12/19" "Z86.19"   0
265 "10/22/19" "E11.9"    1
267 "9/19/19"  "E11.9"    1
274 "9/12/19"  "E11.9"    1
276 "9/20/19"  " Z79.4"   0
276 "9/20/19"  "E11.9"    1
278 "10/25/19" "E11.9"    1
280 "10/25/19" "E11.9"    1
281 "10/18/19" "E11.9"    1
284 "10/18/19" "E11.9"    1
294 "10/21/19" "E11.65"   1
294 "10/21/19" " Z79.4"   0
295 "9/12/19"  "E03.8"    0
295 "9/12/19"  "E11.9"    1
295 "9/12/19"  "E23.0"    0
295 "9/12/19"  "E29.1"    0
295 "9/12/19"  "Q99.9"    0
312 "9/17/19"  "E10.65"   1
318 "10/23/19" "E11.9"    1
318 "10/23/19" "E06.3"    0
322 "10/24/19" "E10.65"   1
322 "10/24/19" "H35.00"   0
324 "10/30/19" "E11.9"    1
332 "10/24/19" " Z79.4"   0
332 "10/24/19" "E11.9"    1
335 "10/24/19" "E78.00"   0
335 "10/24/19" "E11.9"    1
337 "9/13/19"  "E11.9"    1
374 "10/28/19" "E08.29"   1
383 "9/3/19"   "E11.65"   1
383 "9/3/19"   " Z79.4"   0
383 "9/3/19"   "IMO0001"  0
384 "9/24/19"  "E11.3513" 1
384 "9/24/19"  " Z79.4"   0
386 "9/17/19"  "E10.65"   1
390 "9/3/19"   " Z79.4"   0
390 "9/3/19"   "E11.21"   1
390 "9/3/19"   "N18.4"    0
390 "9/3/19"   "N18.4"    0
390 "9/3/19"   "D63.1"    0
393 "9/17/19"  "E11.10"   1
393 "9/17/19"  " Z79.4"   0
403 "11/5/19"  "E10.9"    1
405 "9/3/19"   "E11.69"   1
405 "9/3/19"   "E11.65"   1
end

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

12 Feb 2021, 17:04

To get to that layout:

Code:

duplicates drop drop dm_code by pid appt_date (dx), sort: gen _j = _n reshape wide dx, i(pid appt_date) j(_j)

But I reiterate my point that you will probably regret doing this. It is likely that whatever additional analysis you plan after this point will be better with the long layout you showed with -dataex- in #9. But it's your decision.
Comment
Sarah Haynes

Join Date: Dec 2016

Posts: 84
#11

12 Feb 2021, 20:14

Hi Clyde,

But this is still a long layout, correct? I'm not trying to condense to one line per person, I'm just trying to have one row per encounter. That is in my mind still a long dataset...I don't understand what the purpose of having the various ICD-10 codes for the same visit on different rows would be. Because I'm interested in frequency of diabetes visits, I believe my dataset should be a long one with multiple encounters for each patient ID. (But having numerous rows within one visit for each ICD-10 code just seems like noise). But please convince me otherwise!

Sarah
Comment

Announcement

Combining multiple ICD-10 codes into one encounter

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment