Breaking tie

Zariab Hossain

Join Date: Oct 2020

Posts: 48
#1

Breaking tie

27 Mar 2024, 06:38

Dear Stata users

I have the following dataset example:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int(referal_CWid referal_year referal_officeid) double(nos_off nos_ref) float(tenure mult_off_dummy multi_off_dummy) 22 2013 118 1 46 3 0 0 22 2014 118 1 19 3 0 0 22 2015 1830 2 1 1 1 0 22 2015 118 2 1 3 1 0 366 2006 1715 1 4 9 0 0 366 2007 1715 1 16 9 0 0 366 2008 1715 1 40 9 0 0 366 2009 1730 3 29 3 0 1 366 2009 1737 3 1 1 0 1 366 2009 1715 3 24 9 0 1 366 2010 1730 1 38 3 0 0 366 2011 1730 2 9 3 1 0 366 2011 1715 2 8 9 1 0 366 2012 1715 1 6 9 0 0 366 2013 1715 1 8 9 0 0 366 2014 1715 1 5 9 0 0 366 2015 1715 1 4 9 0 0 end

Here, referal_CWid is the person id, referal_officeid is office id and referal_year is year. It is also visible that few person works at different offices in a given year. Now, I would like to create a person year panel and for that I need to remove duplicates for a given year.
My rule of removal is: (i) I would like to create a variable called major_office which will be the one in a given year that has the maximum nos_ref (number of referral). (ii) In case of a tie, I would like to keep the office for which tenure is maximum.

Would highly appreciate if anyone can suggest how can I do this cleaning.

Regards,
Zariab Hossain
Uppsala University
Tags: None
Zariab Hossain

Join Date: Oct 2020

Posts: 48
#2

27 Mar 2024, 07:10

I created the following code which serves the purpose. Would be nice, if anyone can provide a suggestion

bys referal_CWid referal_year (nos_ref ): egen major_off=max(nos_ref)
replace major_off=. if nos_ref !=major_off
replace major_off=referal_officeid if !missing(major_off)

keep if !missing(major_off)
drop major_off

bys referal_CWid referal_year (tenure ): egen major_off=max(tenure)
replace major_off=. if tenure !=major_off
replace major_off=referal_officeid if !missing(major_off)

keep if !missing(major_off)
drop major_off

In this way, I can create office by year panel.

Thanks for any future suggestions.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29795
#3

27 Mar 2024, 10:13

I think this can be simplified to a one-liner:

Code:

by referral_CWid referral_year (nos_ref tenure), sort: keep if _n == _N

This will work correctly if, as is the case in the example data, referal_CWid, referal_year, and referal_officeid uniquely identify observations. But if the same referal_CWID and referal_officeid can have multiple occurrences in the data in the same year, then it would be slightly different:

Code:

by referal_CWid referal_year (nos_ref tenure), sort: /// keep if referal_officeid == referal_officeid[_N]

This will retain the same referal_officeid as the preceding code, but it will retain all of the observations of that officeid for the given referal_CWid and year.

Note: You may not have fully resolved the problem of tie breaking in any case. Is it possible that there will be a remaining tie even after taking number of referrals and tenure into account? If so, this code generalizes easily by just adding more variables after tenure inside the parentheses of the -by- prefix.
Comment

Announcement

Comment

Comment