I use Stata 13.1 on a Mac and have obtained data from the National Patient Registry in Denmark, which means that the data come in multiple datasets. Each observation has a recordid which is unique to the hospital contact, i.e. if you were admitted to hospital all the tests that you were exposed to during the admission would belong to that one contact. Only one of the tables also include a personid uniquely identifying each person in the dataset.
I believe the format of the data is long, i.e. each participant can be registered at several occasions, with each occasion constituting one observation.
Now, my dependent variables are the diagnoses and survival, and independent variables are the diagnostic and surgical procedures that each individual was exposed to. So, in order to be able to conduct the regression analyses, I really need data to be in one file. But the only common variable is recordid, which is not unique. And according to the Stata Manual,
And I understand why. But it is not completely clear to me, whether joinby has the same unnattractive features (being dependent on current sort order and potentially ruining my data). I have tried to use joinby (see below), but I am not sure, if it did something different than the m:m merge would do.
joinby recordid using "filename", unmatched(using)
Finally: My question is, if I can use joinby as above? Or if I should try to reshape data to be wide to obtain unique ids and then merge 1:1? Or perhaps some other solution that I have not considered?
Tabel | Contents | Examples |
Adm | Administrative information | Date of admission, discharge, primary diagnosis, hospital department, and personid |
Sur | Surgical procedures | Date of surgery, surgical procedures, type of procedure, additional procedures, and hospital department |
Dit | Diagnostic procedures and treatment | Diagnostic tests, diagnostic procedures, type of procedure, additional procedures, and hospital department |
Vit | Vital status | Date of birth, vital status of the person |
Dia | Diagnosis for contact | Temporary (or permanent) diagnosis, type of diagnosis, additional diagnoses, and hospital department |
Now, my dependent variables are the diagnoses and survival, and independent variables are the diagnostic and surgical procedures that each individual was exposed to. So, in order to be able to conduct the regression analyses, I really need data to be in one file. But the only common variable is recordid, which is not unique. And according to the Stata Manual,
First, if you think you need to perform an m:m merge, then we suspect you are wrong.
joinby recordid using "filename", unmatched(using)
Finally: My question is, if I can use joinby as above? Or if I should try to reshape data to be wide to obtain unique ids and then merge 1:1? Or perhaps some other solution that I have not considered?
Comment