Apologies for the confusion. It seems my explanation in #13 is not clear regarding the "unattractiveness of the random solution". What I mean is that such consecutive newid would be dependent on the original (ID) order of the original list and thus not "stable".
Below's the illustration for a simple case whereby a dropping of a single (original ID) will cause the change in the newid (with random method) while could not affect the hashed_id). That explains why a hash() method (that could create the uniqueness) would have an "attractive" advantage in comparison to the random method.
Whether such a hash solution could be found within Stata?
Below's the illustration for a simple case whereby a dropping of a single (original ID) will cause the change in the newid (with random method) while could not affect the hashed_id). That explains why a hash() method (that could create the uniqueness) would have an "attractive" advantage in comparison to the random method.
Whether such a hash solution could be found within Stata?
Code:
clear
input str4 ID
"7950"
"3226"
"6448"
"8660"
"9455"
"2096"
"2184"
"2442"
"3174"
"5045"
"1708"
"7167"
"8333"
"7696"
"5878"
end
mata:
mata set matastrict on
void function cvt(string scalar varname) {
real scalar index
index = st_addvar("double", "hashed_" + varname)
st_varformat(index, "%10.0f")
real matrix Input
pragma unset Input
st_sview(Input, ., varname)
real scalar row
for (row=1; row<=rows(Input); row++) {
st_store(row, index, hash1(Input[row, 1]))
}
}
end
mata: cvt("ID")
set seed 517794135
generate double randu = runiform()
generate str nid = string(_n, "%07.0f")
drop if _n == 14
ren (hashed_ID randu nid) =_1
mata: cvt("ID")
set seed 517794135
generate double randu = runiform()
generate str nid = string(_n, "%07.0f")
assert hashed_ID_1 == hashed_ID
assert nid_1 == nid
assertion is false
Comment