Dear Stata users
I am working with a dataset like below:
referal_CWid is person id; referal_officeid is office id; referal_month is a time (monthly) variable; nos_off denotes the number of branches of an office a person worked in a specific month and finally nos_js is the number of client of the person at a specific office in a specific month. As you can probably detect, in some months the person works at multiple office branches. I want to remove duplicates month to have unique id time panel. In order to remove duplicate months, (i) I want to keep the office with highest number of clients. (ii) But there are cases where number of clients (nos_js) is a tie. In this particular case, I want to keep the office of the previous month when nos_off==1.
I tried the following:
bys referal_CWid referal_month (nos_js): egen major_off=max(nos_js)
keep if nos_js==major_off
It only works if I do not have case (ii). But it has flaws in the presence of case(ii).
Any suggestions on this would be highly appreciated.
PS: The above dataset is at the Client level and hence there are some duplicate rows.
Zariab Hossain
Uppsala University
I am working with a dataset like below:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int(referal_CWid referal_officeid) float referal_month double(nos_off nos_js) 4595 1209 569 1 1 4595 1209 570 1 2 4595 1209 570 1 2 4595 1209 576 1 1 4595 1209 577 1 1 4595 1209 578 1 1 4595 1202 590 1 1 4595 1202 594 1 2 4595 1202 594 1 2 4595 1227 598 1 1 4595 1227 599 1 1 4595 1227 604 1 1 4595 1227 605 1 1 4595 1227 606 1 1 4595 1227 611 1 1 4595 1227 613 1 1 4595 1227 618 2 1 4595 1229 618 2 1 4595 1228 619 3 1 4595 1229 619 3 1 4595 1227 619 3 1 4595 1227 621 1 1 4595 1227 623 1 1 4595 1227 624 1 1 4595 1227 625 1 1 4595 1227 626 1 1 4595 1227 642 1 1 4595 1227 650 1 1 end format %tm referal_month
I tried the following:
bys referal_CWid referal_month (nos_js): egen major_off=max(nos_js)
keep if nos_js==major_off
It only works if I do not have case (ii). But it has flaws in the presence of case(ii).
Any suggestions on this would be highly appreciated.
PS: The above dataset is at the Client level and hence there are some duplicate rows.
Zariab Hossain
Uppsala University
Comment