I am using the Medical Expenditure Panel Survey for a project and analyzing the data in Stata. I am new to both and would like to make sure that I am coding my variables properly.
I want to look at several ICD-10 codes separately and also assess how they overlap within patients in the cohort. However, when defining each disease as a new variable (for example, gen fatigue=1 if ICD10CDX=="R53"), I realized that I cannot simply generate a variable for each disease of study. If I create a dummy variable for one ICD-10 code, it excludes the possibility of overlap with a different code. Specifically, a single patient who has several diagnoses associated with their patient ID (dupersid) is not registering. For example, if patient 1 has cancer and diabetes (and I have created a dummy variable for each using the defined ICD10CDX codes), then I use the tab function for “tab cancer diabetes,” I get 0 overlap even though I can see from the data browser that the same patient has both diagnoses associated with their unique dupersid ID.
I understand that this has to do with computer language and how the software reads the data and my commands, however I have not been able to figure out how to these generate variables in a way that Stata will understand. I tried to research it and thought that maybe collapsing dupersid would help, but it did not solve my problem as each ICD is still mutually exclusive. I think theoretically I could painstakingly compare each ICD code to create an array by hand, but there are thousands of observations.
I am wondering if there is a process in Stata that will allow for overlap with ICD codes by the patient identifier (dupersid), thus allowing me to tabulate the overlap between diseases in the population.
For example:
Person 1 has disease A, B, C.
Person 2 has disease B, C.
Person 3 has disease A.
Person 4 has disease A, C.
Is there a way to generate a variable for A, B, and C so that I can tab A and C, and see that there are 3 people with A, 3 people with C, and 2 people with both?
Thank you in advance for your help!
**for additional context, the ICD code in MEPS is single variable that includes all of the diagnosis codes within it. There is not a separate variable for each diagnosis code, so in order to look at them separately we have to define the variable as exampled above with fatigue
I want to look at several ICD-10 codes separately and also assess how they overlap within patients in the cohort. However, when defining each disease as a new variable (for example, gen fatigue=1 if ICD10CDX=="R53"), I realized that I cannot simply generate a variable for each disease of study. If I create a dummy variable for one ICD-10 code, it excludes the possibility of overlap with a different code. Specifically, a single patient who has several diagnoses associated with their patient ID (dupersid) is not registering. For example, if patient 1 has cancer and diabetes (and I have created a dummy variable for each using the defined ICD10CDX codes), then I use the tab function for “tab cancer diabetes,” I get 0 overlap even though I can see from the data browser that the same patient has both diagnoses associated with their unique dupersid ID.
I understand that this has to do with computer language and how the software reads the data and my commands, however I have not been able to figure out how to these generate variables in a way that Stata will understand. I tried to research it and thought that maybe collapsing dupersid would help, but it did not solve my problem as each ICD is still mutually exclusive. I think theoretically I could painstakingly compare each ICD code to create an array by hand, but there are thousands of observations.
I am wondering if there is a process in Stata that will allow for overlap with ICD codes by the patient identifier (dupersid), thus allowing me to tabulate the overlap between diseases in the population.
For example:
Person 1 has disease A, B, C.
Person 2 has disease B, C.
Person 3 has disease A.
Person 4 has disease A, C.
Is there a way to generate a variable for A, B, and C so that I can tab A and C, and see that there are 3 people with A, 3 people with C, and 2 people with both?
Thank you in advance for your help!
**for additional context, the ICD code in MEPS is single variable that includes all of the diagnosis codes within it. There is not a separate variable for each diagnosis code, so in order to look at them separately we have to define the variable as exampled above with fatigue
Comment