Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ICD-10 Codes

    Hi everyone,

    I just started using Stata in the last month so am very inexperienced. However, I can run some basic codes and analyses. I have been given a database that uses ICD 10 codes and need to extract data regarding a specific set of codes. I have absolutely no idea how to do this. I have tried several codes but none have worked and I have either not gotten data that I had expected or have just gotten error messages. Can someone set me down a path to even begin to wrap my head around this? Thank you so much!

  • #2
    In order to get help, please read the FAQ section first and get familiar with the forum posting rules. You are expected to explain terms (terms familiar to you may not be familar to others), provide data example using -dataex- and provide the Stata codes by usinng code dilimiters. Welcome to Statalist.
    Roman

    Comment


    • #3
      maybe you'll find the following example useful,
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str4 code
      "E109"
      "J449"
      "I259"
      end
      
      *keep observations with code J449,
      
      keep if code == "J449"
      
      *keep observations with code J449 or code I259,
      
      keep if code == "J449" | code == "I259"

      Comment


      • #4
        Thank you, Oyvind! I think this is exactly what I need....I don't have a lot of experience with Stata but here is the situation (after looking more closely at the data). I have 25 fields named dx1, dx2, dx3,.....dx25. I have nearly 500,000 individuals and each of them has a series of ICD-10 codes associated with their visit. Still nailing down the exact ICD-10 codes I am going to use but there will likely be 5-7 different codes that I'd like to identify. I need a code that will allow me to do the following: keep if dx1 or dx2 or dx3, etc., etc. =="code 1"; keep if dx1 or dx2 or dx3, etc.etc. =="code 2"

        Is there a faster way for me to do this than listing out each code with all 25 fields included? Also, I want to make sure that observations are not deleted if they don't have the first code since it's possible they won't have the first code associated with their visit but could have one of the other codes.

        I hope this makes sense! Thank you!

        Comment


        • #5
          Code:
          gen wanted = 0
          foreach var of varlist dx* {
          replace wanted = 1 if inlist(`var',"code 1","code 2","code 3","code 4")
          }
          keep if wanted

          Comment


          • #6
            Originally posted by Melissa Eggen View Post
            Thank you, Oyvind! I think this is exactly what I need....I don't have a lot of experience with Stata but here is the situation (after looking more closely at the data). I have 25 fields named dx1, dx2, dx3,.....dx25. I have nearly 500,000 individuals and each of them has a series of ICD-10 codes associated with their visit. Still nailing down the exact ICD-10 codes I am going to use but there will likely be 5-7 different codes that I'd like to identify. I need a code that will allow me to do the following: keep if dx1 or dx2 or dx3, etc., etc. =="code 1"; keep if dx1 or dx2 or dx3, etc.etc. =="code 2"

            Is there a faster way for me to do this than listing out each code with all 25 fields included? Also, I want to make sure that observations are not deleted if they don't have the first code since it's possible they won't have the first code associated with their visit but could have one of the other codes.

            I hope this makes sense! Thank you!
            In addition to the loop with wildcard characters (I.e. dx* means all variables which start with dx), you might benefit from the icd10 suite of commands. They enable you to specify a range of codes you’re interested in. Frequently, people in this situation will be interested in a bunch of codes that are adjacent. For example, say you are interested in claims with any type of spacecraft accident. That’s the range V9540XA to V9549XS.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              regexm could be useful here too. For example:

              Code:
              gen event =1 if regexm(icd10var,"^[N][1][0-6]")
              I often use the Stata ICD10 command to check whether any mistakes were made:

              Code:
              icd10 gen event   = icd10var, range(N10 N11 N12 N13 N14 N15 N16)
              It is also worth first cleaning the variable that contains the ICD10 string variable. For example, I usually check the length of the ICD10 string variable (3 or 4 digits) and the format (ICD10 always has to start with upper case letter).

              Code:
              confirm string variable icd10var
              assert length(icd10var)<=4 
              assert regexm(icd10var,"^[A-Z]][0-9][0-9]")

              Comment

              Working...
              X