Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining multiple ICD-10 codes into one encounter

    Hello,

    I have an encounter-level dataset that has participant IDs (pid), visit dates (appt_date) and one or more ICD-10 codes (current_icd10_list). , Unfortunately, some of the visits are counted as multiple encounters with a different ICD-10 code for each encounter (see pid=6 in my dataex example), rather than appropriately as one encounter with a list of ICD-10 codes (as in pid=40 in my dataex example). Can you help me figure out how to combine encounters that occur for the same pid on the same date into one encounter?

    Thank you,

    Sarah



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float pid str30 current_icd10_list str8 appt_date
     1 "NA"           "10/21/19"
     2 "NA"           "9/4/19"  
     3 "E03.9"        "9/3/19"  
     4 "NA"           "9/3/19"  
     5 "M81.0"        "9/17/19" 
     6 "E03.9"        "9/12/19" 
     6 "M81.0"        "9/12/19" 
     6 "E55.9"        "9/12/19" 
     6 "E29.1"        "9/12/19" 
     7 "Z86.39"       "9/12/19" 
     8 "NA"           "9/6/19"  
     9 "NA"           "9/9/19"  
    10 "E29.1"        "9/5/19"  
    11 "E05.00"       "9/3/19"  
    12 "M81.0"        "9/3/19"  
    13 "E03.9"        "9/5/19"  
    14 "E27.40"       "9/12/19" 
    15 "NA"           "9/11/19" 
    16 "E89.0"        "9/9/19"  
    16 "E05.00"       "9/9/19"  
    17 "E11.9"        "9/9/19"  
    18 "E03.9"        "9/3/19"  
    19 "E11.9"        "9/12/19" 
    20 "NA"           "10/1/19" 
    21 "NA"           "9/4/19"  
    22 "NA"           "9/10/19" 
    23 "NA"           "9/10/19" 
    24 "NA"           "9/9/19"  
    25 "NA"           "9/10/19" 
    26 "NA"           "9/13/19" 
    27 "M81.0"        "9/9/19"  
    27 "E83.42"       "9/9/19"  
    28 "NA"           "9/16/19" 
    29 "NA"           "9/23/19" 
    30 "NA"           "9/17/19" 
    31 "R97.20"       "9/12/19" 
    31 "E29.1"        "9/12/19" 
    32 "NA"           "9/11/19" 
    33 "R74.8"        "9/9/19"  
    33 "E05.00"       "9/9/19"  
    33 "D64.9"        "9/9/19"  
    33 "B18.2"        "9/9/19"  
    33 "N18.30"       "9/9/19"  
    33 "N18.30"       "9/9/19"  
    33 "I10"          "9/9/19"  
    33 "E05.00"       "9/9/19"  
    33 "I42.8"        "9/9/19"  
    34 "E10.8"        "9/16/19" 
    35 "NA"           "9/11/19" 
    36 "E27.1"        "9/11/19" 
    37 "NA"           "9/11/19" 
    38 "M81.0"        "9/10/19" 
    38 "M06.9"        "9/10/19" 
    38 "E83.42"       "9/10/19" 
    39 "NA"           "9/23/19" 
    40 "E11.9, Z79.4" "9/16/19" 
    41 "NA"           "9/17/19" 
    42 "M81.0"        "9/17/19" 
    43 "E87.1"        "9/11/19" 
    43 "M19.90"       "9/11/19" 
    43 "M81.0"        "9/11/19" 
    44 "NA"           "9/16/19" 
    45 "M81.0"        "9/25/19" 
    45 "E03.9"        "9/25/19" 
    46 "M81.0"        "9/11/19" 
    47 "S72.446S"     "9/11/19" 
    47 "M81.0"        "9/11/19" 
    47 "Q79.60"       "9/11/19" 
    47 "E55.9"        "9/11/19" 
    48 "NA"           "9/13/19" 
    49 "E78.5"        "9/17/19" 
    49 "E06.3"        "9/17/19" 
    49 "E11.9"        "9/17/19" 
    50 "NA"           "9/9/19"  
    51 "NA"           "9/10/19" 
    52 "NA"           "9/19/19" 
    53 "NA"           "9/16/19" 
    54 "NA"           "9/18/19" 
    55 "N92.6"        "9/12/19" 
    55 "R53.83"       "9/12/19" 
    55 "E06.3"        "9/12/19" 
    56 "E03.9"        "9/12/19" 
    57 "M81.0"        "9/18/19" 
    58 "E03.9"        "9/17/19" 
    59 "NA"           "9/19/19" 
    60 "NA"           "9/18/19" 
    61 "NA"           "9/9/19"  
    62 "E89.0"        "9/16/19" 
    62 "Z85.850"      "9/16/19" 
    62 "E55.9"        "9/16/19" 
    63 "D35.2"        "9/24/19" 
    64 "NA"           "9/17/19" 
    65 "E55.9"        "9/23/19" 
    65 "E89.0"        "9/23/19" 
    65 "Z85.850"      "9/23/19" 
    66 "NA"           "9/17/19" 
    67 "E11.9"        "9/27/19" 
    68 "E55.9"        "9/16/19" 
    68 "E04.1"        "9/16/19" 
    68 "E03.9"        "9/16/19" 
    end

  • #2
    Code:
    by pid appt_date, sort: gen wanted = current_icd10_list[1] if _n == 1
    by pid appt_date: replace wanted = wanted[_n-1] + ", " + current_icd10_list ///
        if _n > 1 & !inlist(current_icd10_list, "", "NA")
    by pid appt_date: keep if _n == _N
    drop current_icd10_list
    Are you sure you really want to do that? I work with data containing lists of ICD codes often, and I always find myself doing the reverse: trying to split up a variable that has a bunch of ICD codes into separate observations with single codes. There is very little data analysis you can effectively do when the codes are all smashed into a single observation. Just sayin'.

    Comment


    • #3
      Thanks Clyde!

      Yes I totally understand what you're saying. For this case though I'm trying to remove visits that don't have one of a very specific set. For the ones that were smashed together I successfully used:
      split current_icd10_list, parse(,) destring gen(dx)
      to separate them into multiple variables dx1-4. But I just want one line per visit.

      Does that make sense?

      Sarah

      Comment


      • #4
        Well, it makes sense, but that is not the best way to do it. This can be done better with the data in the original layout. Let's say you are looking to keep only visits with at least one E27 code, just to illustrate the approach:
        Code:
        by pid appt_date, sort: egen has_e27 = max(strpos(current_icd10_list, "E27"))
        keep if has_e27
        If the specific ICD10 codes you are looking for do not lend themselves to a simple string match of the type with -strpos()- I just showed, the -inlist()- function for picking out the codes will do. Also useful for this is the -icd10 generate- command which can create an indicator variable corresponding to a list of ICD10 codes in the -range()- option. Do read the help files for -inlist()- and -icd10- if you are not familiar with these functions.

        Comment


        • #5
          Hi Clyde,

          I tried your approach but for some reason it didn't drop all of the visits that do not have one of the desired codes (which are E08.*, E09.*, E10.*, and E11.*). Specifically, I ran:
          by pid appt_date, sort: egen has_e08 = max(strpos(current_icd10_list, "E08"))
          by pid appt_date, sort: egen has_e09 = max(strpos(current_icd10_list, "E09"))
          by pid appt_date, sort: egen has_e10 = max(strpos(current_icd10_list, "E10"))
          by pid appt_date, sort: egen has_e11 = max(strpos(current_icd10_list, "E11"))
          drop if (has_e08==0 & has_e09==0 & has_e10==0 & has_e11==0)

          On the other hand, if I run:
          gen diabetes = 0
          foreach v of var dx* {
          foreach s in E10. E11. E08. E09 {
          replace diabetes = 1 if strpos(`v', "`s'")
          }

          }

          AND THEN run your first response:

          by pid appt_date, sort: gen list2 = current_icd10_list[1] if _n == 1
          by pid appt_date: replace list2 = list2[_n-1] + ", " + current_icd10_list ///
          if _n > 1 & !inlist(current_icd10_list, "", "NA")
          by pid appt_date: keep if _n == _N

          This does seem to do the trick.

          Comment


          • #6
            I should mention I first ran: split current_icd10_list, parse(,) destring gen(dx).

            Does this make sense do you think or am I missing anything?

            Comment


            • #7
              Clyde: There is a problem with your code I think. When I run "by pid appt_date: keep if _n==_N" it deletes all of the observations that were created in for list2 (or "wanted"). Can you help?

              Comment


              • #8
                When I wrote #4, I had forgotten that your starting data includes some comma-separated lists in the icd10 variable. So it did need the kind of prior preparation you note. Here's how I would do this:

                Code:
                split current_icd10_list, parse(",") gen(dx)
                gen long obs_no = _n
                reshape long dx, i(obs_no)
                drop if missing(dx) & _j > 1
                drop current_icd10_list _j
                
                
                icd10 generate dm_code = dx, range(E08* E09* E10* E11*)
                by pid appt_date, sort: egen byte keeper = max(dm_code)
                keep if keeper
                Added: Crossed with #7.

                When I run "by pid appt_date: keep if _n==_N" it deletes all of the observations that were created in for list2 (or "wanted"). Can you help?
                Can you describe what is happening in more detail and post a new data example that exhibits the problem you are having. Bear in mind that the variable wanted is built up by adding one ICD code at a time in successive observations. Only the final observation for a given pid appt_date has the full list of ICD10 codes. So to simplify the data set, that -keep- command gets rid of all the earlier observations, which contain no information that isn't also in the final retained one.

                But as I ponder this, I can imagine that you have other variables not shown in your example, that may vary across observations within pid appt_date, and which you need to retain for other purposes. If that is the issue, my code destroys those. The way to resolve that is to eliminate the -by pid appt_date: keep if _n == _N- command and replace it with:
                Code:
                by pid appt_date: replace wanted = wanted[_N]
                That way the full list of ICD10 codes is copied to all of the original observations, and you will retain any information in other variables.

                All of that said, I still recommend just staying with the longer layout and using the code shown at the top of this post.
                Last edited by Clyde Schechter; 12 Feb 2021, 12:38.

                Comment


                • #9
                  Hi Clyde,

                  I redid using your updated code:
                  split current_icd10_list, parse(",") gen(dx)
                  gen long obs_no = _n
                  reshape long dx, i(obs_no)
                  drop if missing(dx) & _j > 1
                  drop current_icd10_list _j

                  icd10 generate dm_code = dx, range(E08* E09* E10* E11* E13*)
                  by pid appt_date, sort: egen byte keeper = max(dm_code)
                  keep if keeper

                  That has given me the dataset below (see dataex example). But I feel like I'm just back where I started now. I still have numerous rows for one single appointment. To clarify, I want my dataset to have a unique row for each appt_date within each pid, with then a variable for each dx. Like this:
                  pid appt_date dx1 dx2 dx3
                  1 2/5/20 E10.8 E11.9 E11.65
                  1 2/20/20 E10.8
                  1 5/20/20 E9.0
                  2 6/2/20 E11.65
                  2 8/9/20 E11.65


                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float pid str8 appt_date str9 dx byte dm_code
                   17 "9/9/19"   "E11.9"    1
                   19 "9/12/19"  "E11.9"    1
                   34 "9/16/19"  "E10.8"    1
                   40 "9/16/19"  "E11.9"    1
                   40 "9/16/19"  " Z79.4"   0
                   49 "9/17/19"  "E11.9"    1
                   49 "9/17/19"  "E78.5"    0
                   49 "9/17/19"  "E06.3"    0
                   67 "9/27/19"  "E11.9"    1
                   69 "9/19/19"  "E11.9"    1
                   77 "9/17/19"  "E11.65"   1
                   77 "9/17/19"  "I25.10"   0
                   77 "9/17/19"  " Z98.61"  0
                   77 "9/17/19"  "E78.5"    0
                   77 "9/17/19"  "E08.69"   1
                   81 "9/25/19"  "E11.9"    1
                   82 "9/27/19"  "E13.9"    1
                   87 "9/23/19"  "E11.9"    1
                   91 "9/24/19"  "E11.69"   1
                   91 "9/24/19"  " K86.9"   0
                  107 "9/23/19"  " Z79.4"   0
                  107 "9/23/19"  "E11.9"    1
                  108 "9/23/19"  " Z79.4"   0
                  108 "9/23/19"  "E11.9"    1
                  120 "10/7/19"  "E11.9"    1
                  142 "9/23/19"  "E10.9"    1
                  152 "10/8/19"  "E11.9"    1
                  193 "9/20/19"  "E10.42"   1
                  193 "9/20/19"  " E10.65"  1
                  196 "10/8/19"  "E10.9"    1
                  198 "9/10/19"  "E13.9"    1
                  214 "10/11/19" "E10.9"    1
                  216 "10/18/19" "E03.9"    0
                  216 "10/18/19" "M81.0"    0
                  216 "10/18/19" "E10.51"   1
                  234 "10/15/19" "E11.9"    1
                  238 "10/17/19" "E27.49"   0
                  238 "10/17/19" "E23.2"    0
                  238 "10/17/19" "D44.4"    0
                  238 "10/17/19" "E03.8"    0
                  238 "10/17/19" "E23.0"    0
                  238 "10/17/19" "E11.9"    1
                  245 "10/15/19" "C73"      0
                  245 "10/15/19" "E11.9"    1
                  254 "9/4/19"   "E11.9"    1
                  255 "9/3/19"   "E10.8"    1
                  256 "9/17/19"  "G40.909"  0
                  256 "9/17/19"  "E10.9"    1
                  257 "10/25/19" " R80.9"   0
                  257 "10/25/19" "E11.29"   1
                  259 "9/17/19"  "E78.00"   0
                  259 "9/17/19"  "E11.8"    1
                  259 "9/17/19"  " Z79.4"   0
                  261 "11/12/19" "K74.69"   0
                  261 "11/12/19" "E08.00"   1
                  261 "11/12/19" "Z86.19"   0
                  265 "10/22/19" "E11.9"    1
                  267 "9/19/19"  "E11.9"    1
                  274 "9/12/19"  "E11.9"    1
                  276 "9/20/19"  " Z79.4"   0
                  276 "9/20/19"  "E11.9"    1
                  278 "10/25/19" "E11.9"    1
                  280 "10/25/19" "E11.9"    1
                  281 "10/18/19" "E11.9"    1
                  284 "10/18/19" "E11.9"    1
                  294 "10/21/19" "E11.65"   1
                  294 "10/21/19" " Z79.4"   0
                  295 "9/12/19"  "E03.8"    0
                  295 "9/12/19"  "E11.9"    1
                  295 "9/12/19"  "E23.0"    0
                  295 "9/12/19"  "E29.1"    0
                  295 "9/12/19"  "Q99.9"    0
                  312 "9/17/19"  "E10.65"   1
                  318 "10/23/19" "E11.9"    1
                  318 "10/23/19" "E06.3"    0
                  322 "10/24/19" "E10.65"   1
                  322 "10/24/19" "H35.00"   0
                  324 "10/30/19" "E11.9"    1
                  332 "10/24/19" " Z79.4"   0
                  332 "10/24/19" "E11.9"    1
                  335 "10/24/19" "E78.00"   0
                  335 "10/24/19" "E11.9"    1
                  337 "9/13/19"  "E11.9"    1
                  374 "10/28/19" "E08.29"   1
                  383 "9/3/19"   "E11.65"   1
                  383 "9/3/19"   " Z79.4"   0
                  383 "9/3/19"   "IMO0001"  0
                  384 "9/24/19"  "E11.3513" 1
                  384 "9/24/19"  " Z79.4"   0
                  386 "9/17/19"  "E10.65"   1
                  390 "9/3/19"   " Z79.4"   0
                  390 "9/3/19"   "E11.21"   1
                  390 "9/3/19"   "N18.4"    0
                  390 "9/3/19"   "N18.4"    0
                  390 "9/3/19"   "D63.1"    0
                  393 "9/17/19"  "E11.10"   1
                  393 "9/17/19"  " Z79.4"   0
                  403 "11/5/19"  "E10.9"    1
                  405 "9/3/19"   "E11.69"   1
                  405 "9/3/19"   "E11.65"   1
                  end

                  Comment


                  • #10
                    To get to that layout:
                    Code:
                    duplicates drop
                    drop dm_code
                    by pid appt_date (dx), sort: gen _j = _n
                    reshape wide dx, i(pid appt_date) j(_j)
                    But I reiterate my point that you will probably regret doing this. It is likely that whatever additional analysis you plan after this point will be better with the long layout you showed with -dataex- in #9. But it's your decision.

                    Comment


                    • #11
                      Hi Clyde,

                      But this is still a long layout, correct? I'm not trying to condense to one line per person, I'm just trying to have one row per encounter. That is in my mind still a long dataset...I don't understand what the purpose of having the various ICD-10 codes for the same visit on different rows would be. Because I'm interested in frequency of diabetes visits, I believe my dataset should be a long one with multiple encounters for each patient ID. (But having numerous rows within one visit for each ICD-10 code just seems like noise). But please convince me otherwise!

                      Sarah

                      Comment

                      Working...
                      X