Dear Stata users
I have the following dataset example:
Here, referal_CWid is the person id, referal_officeid is office id and referal_year is year. It is also visible that few person works at different offices in a given year. Now, I would like to create a person year panel and for that I need to remove duplicates for a given year.
My rule of removal is: (i) I would like to create a variable called major_office which will be the one in a given year that has the maximum nos_ref (number of referral). (ii) In case of a tie, I would like to keep the office for which tenure is maximum.
Would highly appreciate if anyone can suggest how can I do this cleaning.
Regards,
Zariab Hossain
Uppsala University
I have the following dataset example:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int(referal_CWid referal_year referal_officeid) double(nos_off nos_ref) float(tenure mult_off_dummy multi_off_dummy) 22 2013 118 1 46 3 0 0 22 2014 118 1 19 3 0 0 22 2015 1830 2 1 1 1 0 22 2015 118 2 1 3 1 0 366 2006 1715 1 4 9 0 0 366 2007 1715 1 16 9 0 0 366 2008 1715 1 40 9 0 0 366 2009 1730 3 29 3 0 1 366 2009 1737 3 1 1 0 1 366 2009 1715 3 24 9 0 1 366 2010 1730 1 38 3 0 0 366 2011 1730 2 9 3 1 0 366 2011 1715 2 8 9 1 0 366 2012 1715 1 6 9 0 0 366 2013 1715 1 8 9 0 0 366 2014 1715 1 5 9 0 0 366 2015 1715 1 4 9 0 0 end
My rule of removal is: (i) I would like to create a variable called major_office which will be the one in a given year that has the maximum nos_ref (number of referral). (ii) In case of a tie, I would like to keep the office for which tenure is maximum.
Would highly appreciate if anyone can suggest how can I do this cleaning.
Regards,
Zariab Hossain
Uppsala University
Comment