Generating a variable that allows for overlap of diagnosis code observations in MEPS

Caterina Priz

Join Date: Jul 2021

Posts: 6
#1

Generating a variable that allows for overlap of diagnosis code observations in MEPS

13 Jan 2022, 11:01

I am using the Medical Expenditure Panel Survey for a project and analyzing the data in Stata. I am new to both and would like to make sure that I am coding my variables properly.

I want to look at several ICD-10 codes separately and also assess how they overlap within patients in the cohort. However, when defining each disease as a new variable (for example, gen fatigue=1 if ICD10CDX=="R53"), I realized that I cannot simply generate a variable for each disease of study. If I create a dummy variable for one ICD-10 code, it excludes the possibility of overlap with a different code. Specifically, a single patient who has several diagnoses associated with their patient ID (dupersid) is not registering. For example, if patient 1 has cancer and diabetes (and I have created a dummy variable for each using the defined ICD10CDX codes), then I use the tab function for “tab cancer diabetes,” I get 0 overlap even though I can see from the data browser that the same patient has both diagnoses associated with their unique dupersid ID.

I understand that this has to do with computer language and how the software reads the data and my commands, however I have not been able to figure out how to these generate variables in a way that Stata will understand. I tried to research it and thought that maybe collapsing dupersid would help, but it did not solve my problem as each ICD is still mutually exclusive. I think theoretically I could painstakingly compare each ICD code to create an array by hand, but there are thousands of observations.

I am wondering if there is a process in Stata that will allow for overlap with ICD codes by the patient identifier (dupersid), thus allowing me to tabulate the overlap between diseases in the population.

For example:
Person 1 has disease A, B, C.
Person 2 has disease B, C.
Person 3 has disease A.
Person 4 has disease A, C.
Is there a way to generate a variable for A, B, and C so that I can tab A and C, and see that there are 3 people with A, 3 people with C, and 2 people with both?

Thank you in advance for your help!

**for additional context, the ICD code in MEPS is single variable that includes all of the diagnosis codes within it. There is not a separate variable for each diagnosis code, so in order to look at them separately we have to define the variable as exampled above with fatigue
Tags: None
Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

13 Jan 2022, 11:15

for additional context, the ICD code in MEPS is single variable that includes all of the diagnosis codes within it. There is not a separate variable for each diagnosis code, so in order to look at them separately we have to define the variable as exampled above with fatigue

This is probably the source of the problem. This structure is called the long form, in which each row is not a patient, but a disease entry, so you will not get the cross-tabulation you wish from it without managing the data down to patient level. Here is one of the ways to show the idea. You can first compute the binary indicator, then use either sum or max to collapse them into patient level, then you should be able to get the table you're envisioning:

Code:

clear input id str10 icd 1 001 1 003 1 201 1 576 2 001 2 009 2 201 2 299 3 001 3 003 3 201 3 576 4 009 4 008 4 440 4 401 end levelsof icd, local(disease) foreach x in `disease'{ gen icd`x' = (icd == "`x'") } drop icd collapse (max) icd*, by(id) tab icd001 icd003
Comment
Caterina Priz

Join Date: Jul 2021

Posts: 6
#3

13 Jan 2022, 11:42

Originally posted by Ken Chui View Post

This is probably the source of the problem. This structure is called the long form, in which each row is not a patient, but a disease entry, so you will not get the cross-tabulation you wish from it without managing the data down to patient level. Here is one of the ways to show the idea. You can first compute the binary indicator, then use either sum or max to collapse them into patient level, then you should be able to get the table you're envisioning:

Code:

clear input id str10 icd 1 001 1 003 1 201 1 576 2 001 2 009 2 201 2 299 3 001 3 003 3 201 3 576 4 009 4 008 4 440 4 401 end levelsof icd, local(disease) foreach x in `disease'{ gen icd`x' = (icd == "`x'") } drop icd collapse (max) icd*, by(id) tab icd001 icd003

Thank you for the explanation Ken!

From the example code you provided it seems that I should replace it with the MEPS codes as follows:
icd --> icd10cdx
id --> dupersid
1, 2, 3, 4 --> numeric dupersid ID for all the patients in my cohort
001, 003, etc --> icd10 codes of study (ex R53)

Please let me know if that is the correct interpretation. Also, as I have already destrung dupersid, so do I need to include "str10" from your example?
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1058
#4

13 Jan 2022, 12:58

Originally posted by Caterina Priz View Post

Thank you for the explanation Ken!

From the example code you provided it seems that I should replace it with the MEPS codes as follows:
icd --> icd10cdx
id --> dupersid
1, 2, 3, 4 --> numeric dupersid ID for all the patients in my cohort
001, 003, etc --> icd10 codes of study (ex R53)

Please let me know if that is the correct interpretation. Also, as I have already destrung dupersid, so do I need to include "str10" from your example?

The first part, from "clear" to "end" was only for me to create a fake data to make a point (because your post in #1 did not supply any data). In order to get answers/codes that would actually work for you, please read and follow the FAQ (http://www.statalist.org/forums/help) on how to provide some sample data using -dataex-. Without seeing the variable format and data structure, I cannot confirm anything stated in post #3.
Comment

Caterina Priz

Join Date: Jul 2021
Posts: 6

13 Jan 2022, 13:42

Originally posted by Ken Chui View Post

The first part, from "clear" to "end" was only for me to create a fake data to make a point (because your post in #1 did not supply any data). In order to get answers/codes that would actually work for you, please read and follow the FAQ (http://www.statalist.org/forums/help) on how to provide some sample data using -dataex-. Without seeing the variable format and data structure, I cannot confirm anything stated in post #3.

Thank you! Below is a dataex sample.
I would like to have two types of dummy variables: one that includes solitary ICD codes (F32) and one that includes a group of similar codes (F32, F34, F39).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double DUPERSID str4 ICD10CDX
10001101 "R52"
10001101 "F32"
10001101 "F32"
10001101 "R52"
10001101 "G89"
10001102 "F32"
10001102 "F32"
10001103 "G89"
10001103 "G89"
10001103 "G89"
10001103 "F32"
10001104 "F32"
10001104 "R52"
10001104 "F32"
10001104 "F32"
10001104 "F32"
10001104 "R52"
10002101 "R52"
10002101 "R52"
10002101 "R52"
10002101 "F32"
10002101 "F32"
10002101 "F32"
10002101 "F32"
10002101 "F32"
10002101 "G89"
10004101 "G89"
10005101 "F32"
10005101 "R52"
10005101 "L40"
10005101 "F32"
10005101 "F32"
10005102 "G89"
10005102 "F32"
10005102 "R52"
10005102 "F39"
10005102 "F32"
10005103 "R52"
10005103 "F32"
10006101 "F32"
10006101 "F32"
10006101 "R53"
10006101 "F32"
10006101 "R52"
10006101 "G89"
10006101 "F32"
10006101 "F32"
10006101 "F32"
10006101 "F32"
10006101 "F32"
10006102 "F32"
10006102 "F32"
10006102 "R53"
10006102 "F32"
10006102 "R53"
10006102 "F32"
10006102 "G89"
10006102 "R52"
10006102 "F32"
10008101 "R52"
10008102 "L40"
10008102 "F32"
10008102 "G89"
10008102 "R52"
10008102 "F32"
10008103 "F32"
10008103 "F32"
10008103 "F32"
10008103 "F32"
10008103 "F32"
10008103 "F32"
10008103 "F32"
10008103 "F32"
10008104 "G89"
10008104 "R52"
10008105 "F32"
10008105 "R52"
10008105 "F32"
10008106 "F32"
10008106 "F32"
10008106 "G89"
10008106 "R52"
10008107 "F32"
10008107 "G89"
10009101 "G89"
10009102 "F32"
10009102 "F32"
10009102 "R52"
10009102 "F32"
10009103 "G89"
10009103 "G89"
10009103 "R53"
10009103 "F32"
10010101 "F32"
10010101 "G89"
10010101 "F32"
10010101 "F32"
10010102 "R53"
10010102 "F32"
10014101 "F32"
end

Last edited by Caterina Priz; 13 Jan 2022, 13:45.

Comment

Ken Chui

Join Date: Aug 2014

Posts: 1058
#6

13 Jan 2022, 13:51

The first two lines may be useful for computing binary variable that reflects 1 code and multiple codes. And then the last collapse command can get the data into patient-level, allowing the type of comorbidity tables you're looking for:

Code:

gen icdF32 = (ICD10CDX == "F32") gen icdGroupX = inlist(ICD10CDX, "F32", "F34", "F39") collapse (max) icdF32 icdGroupX, by(DUPERSID)
1 like
Comment
Caterina Priz

Join Date: Jul 2021

Posts: 6
#7

13 Jan 2022, 13:56

That worked, THANK YOU!
Comment

Announcement