Listing the all ICD-10 codes along with the enrolid

Samruddhi Borate

Join Date: Jul 2024

Posts: 4
#1

Listing the all ICD-10 codes along with the enrolid

29 Jul 2024, 14:30

Hello,
I am working on claims data, and each patient is identified by a unique ID called enrolee ID (enrolid). There are 4 variables that I am interested in dx1, dx2, dx3 and dx4, these variables contain ICD-10 codes (disease diagnosis codes). As this is a claims dataset, there are changes in the diagnosis codes over the years (different dates/years may have different or additional diagnosis for the patient). How do I get a table that has a column for an enrolid and another column for all the dx1s reported in the dataset, and same for dx2, dx3 and dx4
For eg:
enrolid dx1 dx2 dx3 dx4

1 z21 L40 L409 N4230

1 z21 z113 R748

1 z21 J069

1 H35433 z113 R748

2 z21 L40 L409 N4230

I am trying to convert this dataset into a table that looks like this
enrolid dx1 dx2 dx3 dx4

1 z21, H35433 L40, z113, J069 L409, R748 N4230

and similarly for other enrolids

Guidance on this would be greatly appreciated!

Last edited by Samruddhi Borate; 29 Jul 2024, 14:39.
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10195

29 Jul 2024, 16:47

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte enrolid str6 dx1 str4(dx2 dx3) str5 dx4
1 "z21"    "L40"  "L409" "N4230"
1 "z21"    "z113" "R748" ""    
1 "z21"    "J069" ""     ""    
1 "H35433" "z113" "R748" ""    
2 "z21"    "L40"  "L409" "N4230"
end

bys enrolid: gen long which=_n
reshape long dx, i(enrolid which)
drop if missing(dx)
duplicates drop enrolid _j dx, force
bys enrolid _j (which): replace which=_n
bys enrolid _j (which): gen dx_=dx[1]+ ", "
qui sum which
forval i= 2/`=r(max)'{
   by enrolid _j: replace dx_= dx_[_n-1] + dx[`i'] + ", " if _n==`i'
}
by enrolid _j (which): keep if _n==_N
replace dx_= ustrregexra(dx_, "(.*)\,\s+$", "$1")
drop which dx
reshape wide dx_, i(enrolid) j(_j)

Res.:

Code:


. l

     +--------------------------------------------------------------+
     | enrolid          dx_1              dx_2         dx_3    dx_4 |
     |--------------------------------------------------------------|
  1. |       1   z21, H35433   L40, z113, J069   L409, R748   N4230 |
  2. |       2           z21               L40         L409   N4230 |
     +--------------------------------------------------------------+

Last edited by Andrew Musau; 29 Jul 2024, 16:49.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30101

29 Jul 2024, 17:55

Here's another approach, mostly the same as shown in #2, but it avoids the somewhat obscure aspects of regular expressions, and with a simpler structure.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte enrolid str7 dx1 str5(dx2 dx3 dx4)
1 "z21 "    "L40 "  "L409 " "N4230"
1 "z21 "    "z113 " "R748 " ""     
1 "z21 "    "J069 " ""      ""     
1 "H35433 " "z113 " "R748 " ""     
2 "z21 "    "L40 "  "L409 " "N4230"
end

gen `c(obs_t)' obs_no = _n

reshape long dx, i(obs_no)


drop if missing(dx)
by enrolid _j dx (obs_no), sort: keep if _n == 1

by enrolid _j (obs_no), sort: gen _dx = dx if _n == 1
by enrolid _j (obs_no): replace _dx = _dx[_n-1] + ", " + dx if _n > 1
by enrolid _j (obs_no): keep if _n == _N
drop dx obs_no
rename _dx dx
reshape wide dx, i(enrolid) j(_j)

Announcement

Listing the all ICD-10 codes along with the enrolid

Comment

Comment