Keeping and dropping certain duplicates based on election margins

Brooke Jones

Join Date: Oct 2021
Posts: 8

Keeping and dropping certain duplicates based on election margins

28 Mar 2022, 10:10

Hello Statalist,

I am working with election margins that look like:

Code:

state_fips    county_fips    pass    margin    Results    county
48    25    1    56.30397    Carried    Bee
48    27    1    56.87769    Carried    Bell
48    27    0    43.48619    Defeated    Bell
48    29    1    69.11197    Carried    Bexar
48    29    1    64.70588    Carried    Bexar
48    29    1    56.43153    Carried    Bexar
48    29    1    70.88123    Carried    Bexar
48    29    0    49.38272    Defeated    Bexar
48    29    1    62.37555    Carried    Bexar
48    29    1    64.13662    Carried    Bexar
48    29    0    44.25287    Defeated    Bexar
48    39    0    41.73387    Defeated    Brazoria
48    39    1    67.65539    Carried    Brazoria
48    61    1    56.24672    Carried    Cameron
48    61    0    24.82759    Defeated    Cameron
48    61    0    48.81772    Defeated    Cameron
48    61    0    34.93976    Defeated    Cameron
48    61    1    66.66666    Carried    Cameron
48    85    1    56.01775    Carried    Collin
48    85    1    74.06423    Carried    Collin
48    85    1    89.80264    Carried    Collin
48    85    1    77.875    Carried    Collin
48    89    1    56.01685    Carried    Colorado

I am looking to do a few things:

If a county has duplicate elections and at least one of those elections are passing (pass==1), I want to drop all duplicates that have failed (pass==0).
If a county has duplicate elections that all passed, I want to keep only the election with the closest margin (i.e. lowest margin, or that which is closest or equal to 50%).
If a county has duplicate elections that all failed, I want to keep only the election with the closest margin (i.e. highest margin, or that which is closest to 50%).

The end goal is to have only one observation for each county (i.e. no duplicates).

Any advice is much appreciated, thank you!

Tags: None

Ken Chui

Join Date: Aug 2014
Posts: 1058

28 Mar 2022, 13:02

There are probably more elegant way, but this should do the job:

Code:

egen success_rate = mean(pass), by(county_fips)
drop if pass == 0 & success_rate > 0 & success_rate < 1

gen margin_d       =  abs(margin - 50)
egen margin_min    =  min(margin_d), by(county_fips)
keep if margin_min == margin_d

drop success_rate margin_d margin_min

Comment

Brooke Jones

Join Date: Oct 2021

Posts: 8
#3

28 Mar 2022, 13:32

Thank you so much Ken!! That worked, except that the counties where the margin was 100 or 0 (because only one person voted) remained in the data, including the duplicates. I just removed those with --

Code:

drop if total_votes==1

Thanks again
Comment

Announcement

Keeping and dropping certain duplicates based on election margins

Comment

Comment