Hello,
I am trying to mark duplicate values of a variable, var1, using the following code:
by var1: gen dup = cond(_N==1,0,_n)
yet, i do not want the ordering of what ends up being dup=1, dup=2, .... to be arbitrary.
So I thought of presorting the data on another variable, var2, to remove the randomness, such that the complete code would look like:
gsort var1, -var2
by var1: gen dup = cond(_N==1,0,_n)
But when I do this, it appears that the second command reshuffles the observations in an arbitrary manner.
How can I get around this?
Thank you.
I am trying to mark duplicate values of a variable, var1, using the following code:
by var1: gen dup = cond(_N==1,0,_n)
yet, i do not want the ordering of what ends up being dup=1, dup=2, .... to be arbitrary.
So I thought of presorting the data on another variable, var2, to remove the randomness, such that the complete code would look like:
gsort var1, -var2
by var1: gen dup = cond(_N==1,0,_n)
But when I do this, it appears that the second command reshuffles the observations in an arbitrary manner.
How can I get around this?
Thank you.
Comment