Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a variable that allows for overlap of diagnosis code observations in MEPS

    I am using the Medical Expenditure Panel Survey for a project and analyzing the data in Stata. I am new to both and would like to make sure that I am coding my variables properly.

    I want to look at several ICD-10 codes separately and also assess how they overlap within patients in the cohort. However, when defining each disease as a new variable (for example, gen fatigue=1 if ICD10CDX=="R53"), I realized that I cannot simply generate a variable for each disease of study. If I create a dummy variable for one ICD-10 code, it excludes the possibility of overlap with a different code. Specifically, a single patient who has several diagnoses associated with their patient ID (dupersid) is not registering. For example, if patient 1 has cancer and diabetes (and I have created a dummy variable for each using the defined ICD10CDX codes), then I use the tab function for “tab cancer diabetes,” I get 0 overlap even though I can see from the data browser that the same patient has both diagnoses associated with their unique dupersid ID.

    I understand that this has to do with computer language and how the software reads the data and my commands, however I have not been able to figure out how to these generate variables in a way that Stata will understand. I tried to research it and thought that maybe collapsing dupersid would help, but it did not solve my problem as each ICD is still mutually exclusive. I think theoretically I could painstakingly compare each ICD code to create an array by hand, but there are thousands of observations.

    I am wondering if there is a process in Stata that will allow for overlap with ICD codes by the patient identifier (dupersid), thus allowing me to tabulate the overlap between diseases in the population.

    For example:
    Person 1 has disease A, B, C.
    Person 2 has disease B, C.
    Person 3 has disease A.
    Person 4 has disease A, C.
    Is there a way to generate a variable for A, B, and C so that I can tab A and C, and see that there are 3 people with A, 3 people with C, and 2 people with both?

    Thank you in advance for your help!

    **for additional context, the ICD code in MEPS is single variable that includes all of the diagnosis codes within it. There is not a separate variable for each diagnosis code, so in order to look at them separately we have to define the variable as exampled above with fatigue

  • #2
    for additional context, the ICD code in MEPS is single variable that includes all of the diagnosis codes within it. There is not a separate variable for each diagnosis code, so in order to look at them separately we have to define the variable as exampled above with fatigue
    This is probably the source of the problem. This structure is called the long form, in which each row is not a patient, but a disease entry, so you will not get the cross-tabulation you wish from it without managing the data down to patient level. Here is one of the ways to show the idea. You can first compute the binary indicator, then use either sum or max to collapse them into patient level, then you should be able to get the table you're envisioning:

    Code:
    clear
    input id str10 icd
    1 001
    1 003
    1 201
    1 576
    2 001
    2 009
    2 201
    2 299
    3 001
    3 003
    3 201
    3 576
    4 009
    4 008
    4 440
    4 401
    end
    
    levelsof icd, local(disease)
    
    foreach x in `disease'{
        gen icd`x' = (icd == "`x'")
    }
    
    drop icd
    
    collapse (max) icd*, by(id)
    
    tab icd001 icd003

    Comment


    • #3
      Originally posted by Ken Chui View Post

      This is probably the source of the problem. This structure is called the long form, in which each row is not a patient, but a disease entry, so you will not get the cross-tabulation you wish from it without managing the data down to patient level. Here is one of the ways to show the idea. You can first compute the binary indicator, then use either sum or max to collapse them into patient level, then you should be able to get the table you're envisioning:

      Code:
      clear
      input id str10 icd
      1 001
      1 003
      1 201
      1 576
      2 001
      2 009
      2 201
      2 299
      3 001
      3 003
      3 201
      3 576
      4 009
      4 008
      4 440
      4 401
      end
      
      levelsof icd, local(disease)
      
      foreach x in `disease'{
      gen icd`x' = (icd == "`x'")
      }
      
      drop icd
      
      collapse (max) icd*, by(id)
      
      tab icd001 icd003
      Thank you for the explanation Ken!

      From the example code you provided it seems that I should replace it with the MEPS codes as follows:
      icd --> icd10cdx
      id --> dupersid
      1, 2, 3, 4 --> numeric dupersid ID for all the patients in my cohort
      001, 003, etc --> icd10 codes of study (ex R53)

      Please let me know if that is the correct interpretation. Also, as I have already destrung dupersid, so do I need to include "str10" from your example?

      Comment


      • #4
        Originally posted by Caterina Priz View Post

        Thank you for the explanation Ken!

        From the example code you provided it seems that I should replace it with the MEPS codes as follows:
        icd --> icd10cdx
        id --> dupersid
        1, 2, 3, 4 --> numeric dupersid ID for all the patients in my cohort
        001, 003, etc --> icd10 codes of study (ex R53)

        Please let me know if that is the correct interpretation. Also, as I have already destrung dupersid, so do I need to include "str10" from your example?
        The first part, from "clear" to "end" was only for me to create a fake data to make a point (because your post in #1 did not supply any data). In order to get answers/codes that would actually work for you, please read and follow the FAQ (http://www.statalist.org/forums/help) on how to provide some sample data using -dataex-. Without seeing the variable format and data structure, I cannot confirm anything stated in post #3.

        Comment


        • #5
          Originally posted by Ken Chui View Post

          The first part, from "clear" to "end" was only for me to create a fake data to make a point (because your post in #1 did not supply any data). In order to get answers/codes that would actually work for you, please read and follow the FAQ (http://www.statalist.org/forums/help) on how to provide some sample data using -dataex-. Without seeing the variable format and data structure, I cannot confirm anything stated in post #3.
          Thank you! Below is a dataex sample.
          I would like to have two types of dummy variables: one that includes solitary ICD codes (F32) and one that includes a group of similar codes (F32, F34, F39).

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double DUPERSID str4 ICD10CDX
          10001101 "R52"
          10001101 "F32"
          10001101 "F32"
          10001101 "R52"
          10001101 "G89"
          10001102 "F32"
          10001102 "F32"
          10001103 "G89"
          10001103 "G89"
          10001103 "G89"
          10001103 "F32"
          10001104 "F32"
          10001104 "R52"
          10001104 "F32"
          10001104 "F32"
          10001104 "F32"
          10001104 "R52"
          10002101 "R52"
          10002101 "R52"
          10002101 "R52"
          10002101 "F32"
          10002101 "F32"
          10002101 "F32"
          10002101 "F32"
          10002101 "F32"
          10002101 "G89"
          10004101 "G89"
          10005101 "F32"
          10005101 "R52"
          10005101 "L40"
          10005101 "F32"
          10005101 "F32"
          10005102 "G89"
          10005102 "F32"
          10005102 "R52"
          10005102 "F39"
          10005102 "F32"
          10005103 "R52"
          10005103 "F32"
          10006101 "F32"
          10006101 "F32"
          10006101 "R53"
          10006101 "F32"
          10006101 "R52"
          10006101 "G89"
          10006101 "F32"
          10006101 "F32"
          10006101 "F32"
          10006101 "F32"
          10006101 "F32"
          10006102 "F32"
          10006102 "F32"
          10006102 "R53"
          10006102 "F32"
          10006102 "R53"
          10006102 "F32"
          10006102 "G89"
          10006102 "R52"
          10006102 "F32"
          10008101 "R52"
          10008102 "L40"
          10008102 "F32"
          10008102 "G89"
          10008102 "R52"
          10008102 "F32"
          10008103 "F32"
          10008103 "F32"
          10008103 "F32"
          10008103 "F32"
          10008103 "F32"
          10008103 "F32"
          10008103 "F32"
          10008103 "F32"
          10008104 "G89"
          10008104 "R52"
          10008105 "F32"
          10008105 "R52"
          10008105 "F32"
          10008106 "F32"
          10008106 "F32"
          10008106 "G89"
          10008106 "R52"
          10008107 "F32"
          10008107 "G89"
          10009101 "G89"
          10009102 "F32"
          10009102 "F32"
          10009102 "R52"
          10009102 "F32"
          10009103 "G89"
          10009103 "G89"
          10009103 "R53"
          10009103 "F32"
          10010101 "F32"
          10010101 "G89"
          10010101 "F32"
          10010101 "F32"
          10010102 "R53"
          10010102 "F32"
          10014101 "F32"
          end
          Last edited by Caterina Priz; 13 Jan 2022, 14:45.

          Comment


          • #6
            The first two lines may be useful for computing binary variable that reflects 1 code and multiple codes. And then the last collapse command can get the data into patient-level, allowing the type of comorbidity tables you're looking for:

            Code:
            gen icdF32 = (ICD10CDX == "F32")
            gen icdGroupX = inlist(ICD10CDX, "F32", "F34", "F39")
            
            collapse (max) icdF32 icdGroupX, by(DUPERSID)

            Comment


            • #7
              That worked, THANK YOU!

              Comment

              Working...
              X