Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping and dropping certain duplicates based on election margins

    Hello Statalist,

    I am working with election margins that look like:
    Code:
    state_fips    county_fips    pass    margin    Results    county
    48    25    1    56.30397    Carried    Bee
    48    27    1    56.87769    Carried    Bell
    48    27    0    43.48619    Defeated    Bell
    48    29    1    69.11197    Carried    Bexar
    48    29    1    64.70588    Carried    Bexar
    48    29    1    56.43153    Carried    Bexar
    48    29    1    70.88123    Carried    Bexar
    48    29    0    49.38272    Defeated    Bexar
    48    29    1    62.37555    Carried    Bexar
    48    29    1    64.13662    Carried    Bexar
    48    29    0    44.25287    Defeated    Bexar
    48    39    0    41.73387    Defeated    Brazoria
    48    39    1    67.65539    Carried    Brazoria
    48    61    1    56.24672    Carried    Cameron
    48    61    0    24.82759    Defeated    Cameron
    48    61    0    48.81772    Defeated    Cameron
    48    61    0    34.93976    Defeated    Cameron
    48    61    1    66.66666    Carried    Cameron
    48    85    1    56.01775    Carried    Collin
    48    85    1    74.06423    Carried    Collin
    48    85    1    89.80264    Carried    Collin
    48    85    1    77.875    Carried    Collin
    48    89    1    56.01685    Carried    Colorado
    I am looking to do a few things:
    1. If a county has duplicate elections and at least one of those elections are passing (pass==1), I want to drop all duplicates that have failed (pass==0).
    2. If a county has duplicate elections that all passed, I want to keep only the election with the closest margin (i.e. lowest margin, or that which is closest or equal to 50%).
    3. If a county has duplicate elections that all failed, I want to keep only the election with the closest margin (i.e. highest margin, or that which is closest to 50%).
    The end goal is to have only one observation for each county (i.e. no duplicates).

    Any advice is much appreciated, thank you!

  • #2
    There are probably more elegant way, but this should do the job:

    Code:
    egen success_rate = mean(pass), by(county_fips)
    drop if pass == 0 & success_rate > 0 & success_rate < 1
    
    gen margin_d       =  abs(margin - 50)
    egen margin_min    =  min(margin_d), by(county_fips)
    keep if margin_min == margin_d
    
    drop success_rate margin_d margin_min

    Comment


    • #3
      Thank you so much Ken!! That worked, except that the counties where the margin was 100 or 0 (because only one person voted) remained in the data, including the duplicates. I just removed those with --
      Code:
      drop if total_votes==1
      Thanks again

      Comment

      Working...
      X