Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Listing the all ICD-10 codes along with the enrolid

    Hello,
    I am working on claims data, and each patient is identified by a unique ID called enrolee ID (enrolid). There are 4 variables that I am interested in dx1, dx2, dx3 and dx4, these variables contain ICD-10 codes (disease diagnosis codes). As this is a claims dataset, there are changes in the diagnosis codes over the years (different dates/years may have different or additional diagnosis for the patient). How do I get a table that has a column for an enrolid and another column for all the dx1s reported in the dataset, and same for dx2, dx3 and dx4
    For eg:
    enrolid dx1 dx2 dx3 dx4
    1 z21 L40 L409 N4230
    1 z21 z113 R748
    1 z21 J069
    1 H35433 z113 R748
    2 z21 L40 L409 N4230
    I am trying to convert this dataset into a table that looks like this
    enrolid dx1 dx2 dx3 dx4
    1 z21, H35433 L40, z113, J069 L409, R748 N4230
    and similarly for other enrolids

    Guidance on this would be greatly appreciated!
    Last edited by Samruddhi Borate; 29 Jul 2024, 14:39.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte enrolid str6 dx1 str4(dx2 dx3) str5 dx4
    1 "z21"    "L40"  "L409" "N4230"
    1 "z21"    "z113" "R748" ""    
    1 "z21"    "J069" ""     ""    
    1 "H35433" "z113" "R748" ""    
    2 "z21"    "L40"  "L409" "N4230"
    end
    
    bys enrolid: gen long which=_n
    reshape long dx, i(enrolid which)
    drop if missing(dx)
    duplicates drop enrolid _j dx, force
    bys enrolid _j (which): replace which=_n
    bys enrolid _j (which): gen dx_=dx[1]+ ", "
    qui sum which
    forval i= 2/`=r(max)'{
       by enrolid _j: replace dx_= dx_[_n-1] + dx[`i'] + ", " if _n==`i'
    }
    by enrolid _j (which): keep if _n==_N
    replace dx_= ustrregexra(dx_, "(.*)\,\s+$", "$1")
    drop which dx
    reshape wide dx_, i(enrolid) j(_j)
    Res.:

    Code:
    
    . l
    
         +--------------------------------------------------------------+
         | enrolid          dx_1              dx_2         dx_3    dx_4 |
         |--------------------------------------------------------------|
      1. |       1   z21, H35433   L40, z113, J069   L409, R748   N4230 |
      2. |       2           z21               L40         L409   N4230 |
         +--------------------------------------------------------------+
    Last edited by Andrew Musau; 29 Jul 2024, 16:49.

    Comment


    • #3
      Here's another approach, mostly the same as shown in #2, but it avoids the somewhat obscure aspects of regular expressions, and with a simpler structure.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte enrolid str7 dx1 str5(dx2 dx3 dx4)
      1 "z21 "    "L40 "  "L409 " "N4230"
      1 "z21 "    "z113 " "R748 " ""     
      1 "z21 "    "J069 " ""      ""     
      1 "H35433 " "z113 " "R748 " ""     
      2 "z21 "    "L40 "  "L409 " "N4230"
      end
      
      gen `c(obs_t)' obs_no = _n
      
      reshape long dx, i(obs_no)
      
      
      drop if missing(dx)
      by enrolid _j dx (obs_no), sort: keep if _n == 1
      
      by enrolid _j (obs_no), sort: gen _dx = dx if _n == 1
      by enrolid _j (obs_no): replace _dx = _dx[_n-1] + ", " + dx if _n > 1
      by enrolid _j (obs_no): keep if _n == _N
      drop dx obs_no
      rename _dx dx
      reshape wide dx, i(enrolid) j(_j)

      Comment

      Working...
      X