Hi everyone,
Thank you for reading this in advance. I tried to look up answers to this question before on this forum and elsewhere but I could not find someone having the exact problem I have encountered (please do let me know if there's another answer I should look at however). I am basically trying to find duplicates across columns and rows at the same time. I can explain further by describing my data and giving examples of what I want to find.
Basically, I have a variable called mesaid which has three individuals associated with it, which are identified by my variables: f_p_cedula, f_v1_cedula and f_v2_cedula. Technically, these individuals identified in variables f_p_cedula, f_v1_cedula and f_v2_cedula should not appear on multiple mesaid's (multiple rows), but I discovered that it does happen a few times. Right now, I've been only able to catch duplicates that appear across rows, but within the same column. For that I can just run the following:
This has allowed me to discover the following cases:
One thing that I would still want to check for however, is whether an individual appears on multiple mesaid's but perhaps across multiple variables. That is, imagine we instead had the following scenario:
Here, we do have the same individual appearing on multiple mesaid's, but he appears on different variables each time. Thus, I am not able to catch using the duplicates command. Is there a straightforward way to do this?
The only thing I can think of is to perhaps stack these three variables (f_p_cedula, f_v1_cedula and f_v2_cedula) with an append into one variable (call it f_cedula) and then use the duplicates command on f_cedula. Yet that seems somewhat ad hoc to me. Is there a better way to do this? Or would you recommend I just implement my imagined solution?
Best,
Raul
Thank you for reading this in advance. I tried to look up answers to this question before on this forum and elsewhere but I could not find someone having the exact problem I have encountered (please do let me know if there's another answer I should look at however). I am basically trying to find duplicates across columns and rows at the same time. I can explain further by describing my data and giving examples of what I want to find.
Basically, I have a variable called mesaid which has three individuals associated with it, which are identified by my variables: f_p_cedula, f_v1_cedula and f_v2_cedula. Technically, these individuals identified in variables f_p_cedula, f_v1_cedula and f_v2_cedula should not appear on multiple mesaid's (multiple rows), but I discovered that it does happen a few times. Right now, I've been only able to catch duplicates that appear across rows, but within the same column. For that I can just run the following:
duplicates report f_p_cedula if f_p_cedula!=.
duplicates report f_v1_cedula if f_v1_cedula!=.
duplicates report f_v2_cedula if f_v2_cedula!=.
duplicates report f_v1_cedula if f_v1_cedula!=.
duplicates report f_v2_cedula if f_v2_cedula!=.
|
|
The only thing I can think of is to perhaps stack these three variables (f_p_cedula, f_v1_cedula and f_v2_cedula) with an append into one variable (call it f_cedula) and then use the duplicates command on f_cedula. Yet that seems somewhat ad hoc to me. Is there a better way to do this? Or would you recommend I just implement my imagined solution?
Best,
Raul
Comment