I have a dataset with two variables which look something like this.
I want to flag those observations which are not consistent. Eg. If prim_key 101000100040101 has a spouse_id 101000100040102, then prim_key 101000100040102 should have a spouse_id 101000100040101. If the prim_key 101000100040102 has any other spouse_id, then I want it flagged. The problem I have is that duplicates are also possible. So I could have the same spouse_id for two (or more) prim_keys, and similarly I could have two(or more) spouse_id for the same prim_key. This prevents me from making two separate files and using merge on them.
Even then, if a prim_key has a spouse_id, then that prim_key should show up as a spouse_id when there is a prim_key with the value of the original spouse_id. Or to put it more clearly, If prim_key 101000100040101 has a spouse_id 101000100040102, then prim_key 101000100040102 should have a spouse_id 101000100040101 if they occur only once. But if prim_key 101000100040101 occurs twice, with different spouse_ids (say A and B), and both A and B are also some values of the prim_key, then both A and B should have a spouse_id equal to 101000100040101
Is there any way to do what I want?
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str15(prim_key spouse_id) "101000100040101" "101000100040102" "101000100040102" "101000100040101" "101000100130101" "101000100130102" "101000100130102" "101000100130101" "101000100130109" "" "101000100250106" "" "101000100320101" "101000100320102" "101000100320102" "101000100320101" "101000100320109" "" "101000100370102" "101000100370101" "101000100370111" "" "101000100590101" "101000100590102" "101000100590102" "101000100590101" "101000100760101" "101000100760102" "101000100760102" "101000100760101" "101000101040101" "101000101040102" "101000101040102" "101000101040101" "101000101330101" "" "101000101720101" "" "101000101890107" "" "101000102490101" "101000102490102" "101000102490102" "101000102490101" "101000103350101" "101000103350102" "101000103350102" "101000103350101" "101000103550101" "" "101000103550102" "" "101000103690101" "101000103690102" "101000103690102" "101000103690101" "101000104030101" "101000104030102" "101000104030102" "101000104030101" "101000104240101" "101000104240102" "101000104240102" "101000104240101" "101000200050101" "101000200050102" "101000200050102" "101000200050101" "101000200160109" "" "101000200210101" "101000200210102" "101000200210102" "101000200210101" "101000200210108" "" "101000200270101" "" "101000200290108" "" "101000200290109" "" "101000200340101" "101000200340102" "101000200340102" "101000200340101" "101000200470101" "101000200470102" "101000200470102" "101000200470101" "101000200740101" "" "101000200860102" "101000200860101" "101000201190106" "" "101000201360101" "101000201360102" "101000201360102" "101000201360101" "101000201360103" "" "101000201560108" "" "101000201670110" "" "101000202130102" "101000202130101" "101000202440101" "101000202440102" "101000202440102" "101000202440101" "101000203010101" "101000203010102" "101000203010102" "101000203010101" "101000203460101" "101000203460102" "101000203460102" "101000203460101" "101000300030201" "101000300030202" "101000300030202" "101000300030201" "101000300070105" "" "101000300140106" "" "101000300220101" "" "101000300260101" "101000300260102" "101000300260102" "101000300260101" "101000300380205" "" "101000300550105" "" "101000300920101" "101000300920102" "101000300920102" "101000300920101" "101000301080101" "101000301080102" "101000301080102" "101000301080101" "101000301450201" "101000301450202" "101000301450202" "101000301450201" "101000301720106" "" "101000301840101" "101000301840102" "101000301840102" "101000301840101" "101000302140101" "101000302140102" "101000302140102" "101000302140101" "101000302250106" "" "101000302720104" "" "101000303240201" "" "101000400040101" "101000400040102" "101000400040102" "101000400040101" "101000400070107" "101000400070108" "101000400070108" "101000400070107" "101000400120101" "101000400120102" "101000400120102" "101000400120101" "101000400120109" "" "101000400140101" "101000400140102" "101000400140102" "101000400140101" "101000400230102" "101000400230101" "101000400230111" "" "101000400430201" "101000400430202" "101000400430202" "101000400430201" "101000400610101" "101000400610102" "101000400610102" "101000400610101" "101000400930101" "101000400930102" "101000400930102" "101000400930101" end
Even then, if a prim_key has a spouse_id, then that prim_key should show up as a spouse_id when there is a prim_key with the value of the original spouse_id. Or to put it more clearly, If prim_key 101000100040101 has a spouse_id 101000100040102, then prim_key 101000100040102 should have a spouse_id 101000100040101 if they occur only once. But if prim_key 101000100040101 occurs twice, with different spouse_ids (say A and B), and both A and B are also some values of the prim_key, then both A and B should have a spouse_id equal to 101000100040101
Is there any way to do what I want?
Comment