Hi,
I have a matched dataset of ~150k patients, with each matched pair consisting of one case and one control. I used ccmatch; cases and controls were matched on patient_id, a string variable unique to each patient. Each matched pair has a unique value for the variable match.
Now I have appended a new file to this dataset. Content from the new file has many (but not all) of the patient_ids in the existing dataset, plus some additional patient_ids (who I will want to discard). Since matching was not done on the new file, the match variable value is missing for all those patient_ids. I want to populate match for the newly-appended patient_ids who are in the existing dataset, i.e., for the matched cases and controls. After that, I will drop the excess patients who came from the new file (i.e., those patients not matched in the original dataset). The excess patients should be easy to identify since at that point, the non-cases and non-controls should all have missing match values and should be the only patients who have missing match values.
My question: how can I populate a unique value for one variable based on the unique value of another variable? Specifically, how can I populate missing match values, based on existing match and patient_id values? I am thinking this may start with a replacement of match if match==., based on patient_id, but am unsure exactly how to write this out.
An example, using match value 5565, is below. First is the case, then the control. In the newly-appended data, patient_ids 12345 and 67890 may be present, but match (and match_id) would be missing.

I have a matched dataset of ~150k patients, with each matched pair consisting of one case and one control. I used ccmatch; cases and controls were matched on patient_id, a string variable unique to each patient. Each matched pair has a unique value for the variable match.
Now I have appended a new file to this dataset. Content from the new file has many (but not all) of the patient_ids in the existing dataset, plus some additional patient_ids (who I will want to discard). Since matching was not done on the new file, the match variable value is missing for all those patient_ids. I want to populate match for the newly-appended patient_ids who are in the existing dataset, i.e., for the matched cases and controls. After that, I will drop the excess patients who came from the new file (i.e., those patients not matched in the original dataset). The excess patients should be easy to identify since at that point, the non-cases and non-controls should all have missing match values and should be the only patients who have missing match values.
My question: how can I populate a unique value for one variable based on the unique value of another variable? Specifically, how can I populate missing match values, based on existing match and patient_id values? I am thinking this may start with a replacement of match if match==., based on patient_id, but am unsure exactly how to write this out.
An example, using match value 5565, is below. First is the case, then the control. In the newly-appended data, patient_ids 12345 and 67890 may be present, but match (and match_id) would be missing.
Comment