Hello,
I Need to create a data-file, which resembles the structure of my original data-file but does not give any true Information about the original data.
I manage to shuffle and mask variables with numerical values (using e.g. the runiform() function) but I have problems with string variables.
There is for exmaple one variable, which tells about the country code, such that observations have the string values "DE" "AT" "US" and so on.
Now I want these values to be randomly assigned to any other observation in the dataset.
The ralpha() function from ssc does not help much in this case because it creates completely random values, which are no real country codes. So, I want the real country codes just randomly assigned to any business-ID for example.
There are also other string variables, which contain a mixture of numbers and signs like *21351 or *D009128571 ...
One possibility also would be to shift the observations down 20 or 30 or 80 lines, which i did with numerical variables with the help of lag-operators as follows:
But this only works for numerical variables.
Help would be much appreciated!
I Need to create a data-file, which resembles the structure of my original data-file but does not give any true Information about the original data.
I manage to shuffle and mask variables with numerical values (using e.g. the runiform() function) but I have problems with string variables.
There is for exmaple one variable, which tells about the country code, such that observations have the string values "DE" "AT" "US" and so on.
Now I want these values to be randomly assigned to any other observation in the dataset.
The ralpha() function from ssc does not help much in this case because it creates completely random values, which are no real country codes. So, I want the real country codes just randomly assigned to any business-ID for example.
There are also other string variables, which contain a mixture of numbers and signs like *21351 or *D009128571 ...
One possibility also would be to shift the observations down 20 or 30 or 80 lines, which i did with numerical variables with the help of lag-operators as follows:
Code:
*Randomly Shuffle observations gen helpvar= runiform() if ID != ID[_n-1] bysort ID: egen helpvar2 = max(helpvar) sort hhlfs_randomnr2 * Duplicate the first 75 observations to the end of the dataset gen count = _n expand 2 if count <= 75, gen(exptag) * tsset the dataset gen nrr = _n tsset nrr * Use lag-Operator to shift down observations of a certain varlist by 75 lines in this case foreach x of varlist var1 var2 var3 { gen hh_`x' = L75.`x' replace `x' = hh_`x' } * Drop the first 75 observations, which now do not contain any Information within variable 1 var2 and var3 drop if exptag == 0 & count <=75 drop exptag count nrr
Help would be much appreciated!
Comment