Hello,
Little experience with stata and in need of some help. I am working with a dataset in long format of patients who have been followed for several years. Thus, for each patient there are 1 to ∞ visits that has been coded with the ICD-10 codes for the diagnoses that they were relevant at that particular visit. In other words, (for chronic diseases) the fact that a diagnosis has not been entered at one visit does not mean that the patient does not have the disease anymore, but rather that it was not the one that the doctor was focusing on. There are millions of visits, and each visit has its own row with a total of 88 different variables (including up to 9 ICD-codes) where some may be specific to that particular visit and some are not.
The dataset looks something like this (I have removed some of the 88 variables and a lot of the millions of visits for ease of reading):
ID year Hospital Unit ICD10_1 ICD10_2 ICD10_3
1 1999 5 6 I10 L400
1 2000 3 2 J448 M059
1 2001 4 1 C401
1 2003 2 8 L400
2 1996 1 9 I251 I10 M059
2 2008 1 9 I48
3 1996 2 7 I10 E113 C711
4 2006 2 1 J448
4 2007 2 1 C20 N188
4 2010 7 5 M059
5 1999 4 5 M059 C401
6 2000 3 3 E113
6 2005 1 1 I10 C20
As a previous spss-user I am used to the wide format and although I have "seen the light" and am now transitioning to stata, I must say this long format is making me a bit nauseous. However, I am trying to keep my head cool and not reformat to wide as I keep reading that that this is better in stata.
Now, onto my question: What I want to do is to set up a simple 2 x 2 table where I compare the frequency of ever having had a diagnosis/ICD10-code (lets say I48 atrial fibrillation) across patients with or without a different diagnosis/ICD-10 code (lets say rectum cancer).
Thank you so much!
Oh, and I am using stata 16.1
Little experience with stata and in need of some help. I am working with a dataset in long format of patients who have been followed for several years. Thus, for each patient there are 1 to ∞ visits that has been coded with the ICD-10 codes for the diagnoses that they were relevant at that particular visit. In other words, (for chronic diseases) the fact that a diagnosis has not been entered at one visit does not mean that the patient does not have the disease anymore, but rather that it was not the one that the doctor was focusing on. There are millions of visits, and each visit has its own row with a total of 88 different variables (including up to 9 ICD-codes) where some may be specific to that particular visit and some are not.
The dataset looks something like this (I have removed some of the 88 variables and a lot of the millions of visits for ease of reading):
ID year Hospital Unit ICD10_1 ICD10_2 ICD10_3
1 1999 5 6 I10 L400
1 2000 3 2 J448 M059
1 2001 4 1 C401
1 2003 2 8 L400
2 1996 1 9 I251 I10 M059
2 2008 1 9 I48
3 1996 2 7 I10 E113 C711
4 2006 2 1 J448
4 2007 2 1 C20 N188
4 2010 7 5 M059
5 1999 4 5 M059 C401
6 2000 3 3 E113
6 2005 1 1 I10 C20
As a previous spss-user I am used to the wide format and although I have "seen the light" and am now transitioning to stata, I must say this long format is making me a bit nauseous. However, I am trying to keep my head cool and not reformat to wide as I keep reading that that this is better in stata.
Now, onto my question: What I want to do is to set up a simple 2 x 2 table where I compare the frequency of ever having had a diagnosis/ICD10-code (lets say I48 atrial fibrillation) across patients with or without a different diagnosis/ICD-10 code (lets say rectum cancer).
Thank you so much!
Oh, and I am using stata 16.1
Comment