Dear all,
I have a large unbalanced panel dataset -about 3 million observations per month for about 6 years- and would like to anonymise two variables. A firm specific id and a worker specific id. To anonymize the current data I could do something like below which I found in some threads:
keep worker_id
sort worker_id
by worker_id: keep if _n==1
set seed 123456
gen double random = uniform()
sort random
bysort random : assert _N == 1
gen long new_worker_id=_n
sort afm_id_r
save mapping_pid, replace
and then merge the two datasets and keep the new worker id. Similarly for firm id's
However, the dataset is dynamic. New incoming data will be on a monthly frequency and firms and workers may enter and/or exit over time. Preferably, for each new month of data that I get, I would like to anonymize only the incoming data (as the full panel will become overly large and cumbersome to handle over time) and ensure that the new anonymized id's are consistent with the old ones. Would something like the above code work?
Many thanks,
Pavlos
I have a large unbalanced panel dataset -about 3 million observations per month for about 6 years- and would like to anonymise two variables. A firm specific id and a worker specific id. To anonymize the current data I could do something like below which I found in some threads:
keep worker_id
sort worker_id
by worker_id: keep if _n==1
set seed 123456
gen double random = uniform()
sort random
bysort random : assert _N == 1
gen long new_worker_id=_n
sort afm_id_r
save mapping_pid, replace
and then merge the two datasets and keep the new worker id. Similarly for firm id's
However, the dataset is dynamic. New incoming data will be on a monthly frequency and firms and workers may enter and/or exit over time. Preferably, for each new month of data that I get, I would like to anonymize only the incoming data (as the full panel will become overly large and cumbersome to handle over time) and ensure that the new anonymized id's are consistent with the old ones. Would something like the above code work?
Many thanks,
Pavlos
Comment