Hello Stata experts,
I am trying to merge two datasets (pre- and post- of an intervention). I understand that this cannot be done without each observation in the selected variable being unique, however, I actually need to retain those duplicate observations as these are the primary identifier of each row.
The survey was not created at an academic standardized extent as it is for a small non-profit organization, therefore other identifying variables were not required. The same participant can participate in multiple sessions of the intervention in a given time (e.g., a month), and the identifier is obviously the same for that participant (e.g., phone number). We do have another secondary identifier, which is the date of the session. Additionally, the survey participation is completely voluntary, and therefore, some participants do not always complete both pre- and post- surveys. I am still trying to use those two variables to merge the two datasets, but I am not sure how to approach assigning a variable to each observation that can be used as an identifier. I know this isn't ideal, but I have to work with what I have.
The ideal combined dataset would be matched by the primary identifier (e.g., phone number), and then also by the secondary identifier (e.g., date of the session).
I have tried these two approaches, but I am still not sure if there would be an easier way to give each observation a unique value without relying on the row number as these values will be different in the two datasets due to the fact that participants do not always complete both surveys:
https://www.statalist.org/forums/for...other-variable
https://www.statalist.org/forums/for...plicate-values
Would you be able to suggest a direction for me to resolve the issue?
Thank you so much,
Jessica
I am trying to merge two datasets (pre- and post- of an intervention). I understand that this cannot be done without each observation in the selected variable being unique, however, I actually need to retain those duplicate observations as these are the primary identifier of each row.
The survey was not created at an academic standardized extent as it is for a small non-profit organization, therefore other identifying variables were not required. The same participant can participate in multiple sessions of the intervention in a given time (e.g., a month), and the identifier is obviously the same for that participant (e.g., phone number). We do have another secondary identifier, which is the date of the session. Additionally, the survey participation is completely voluntary, and therefore, some participants do not always complete both pre- and post- surveys. I am still trying to use those two variables to merge the two datasets, but I am not sure how to approach assigning a variable to each observation that can be used as an identifier. I know this isn't ideal, but I have to work with what I have.
The ideal combined dataset would be matched by the primary identifier (e.g., phone number), and then also by the secondary identifier (e.g., date of the session).
I have tried these two approaches, but I am still not sure if there would be an easier way to give each observation a unique value without relying on the row number as these values will be different in the two datasets due to the fact that participants do not always complete both surveys:
https://www.statalist.org/forums/for...other-variable
https://www.statalist.org/forums/for...plicate-values
Would you be able to suggest a direction for me to resolve the issue?
Thank you so much,
Jessica
Comment