Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • removing duplicates

    I have a dataset at contract level and a person can have multiple contract per year. I want to select the contract that is identified as the main job and remove all other contracts.
    I used the following criteria as the main job - proportion of employment (procc_main), salary and terms of employment (term). I main job is the job that has the highest proportion of employment, if equal, highest salary , and finally if both are equal the terms of employment (1- full time, 2 part time).

    I used the code below: i sorted them first using the criteria so that the main contract will be the first observation for each individual for that year.
    Then i generated a duplication key and removed any duplicates.


    gsort staff_id year -propcc_main -salary f_terms
    quietly by staff_id year: gen dup=cond(_N==1,0,_n)

    count if dup>1 //177,633 duplication in other words individuals with multiple contracts

    drop if dup>1

    THE PROBLEM: When i re-run this it give me difference samples. When i say different sample, i mean it selects different contracts as the main job- overall the sample size remains the same. It would be great if someone could help me with this. I going over and over it and having no luck.

    Thanks alot.
    Danula

  • #2
    To start, I recommend to use the command - duplicates drop - instead. Then, you may check whether it solved the problem.
    Best regards,

    Marcos

    Comment


    • #3
      Thanks Marcos- i am looking at the duplicate drop command now. First i tried typing in the command -duplicate drop- since each contract is not particularly a duplicates it didn't drop any observations. Secondly i tried using the -duplicate tag- to identify multiple records per individual in a given year - it just tagged 1 for all the multiple contract for an individual.

      I want to incorporate the selection criteria to select the most relevant contract ("the main job"). Is there a way to incorporate this selection criteria to the duplication command?

      Best Wishes
      Danula

      Comment


      • #4
        Danula Gamage Yes, you can apply a selection criteria, because - duplicates drop - allows for the "if" clause.
        Best regards,

        Marcos

        Comment

        Working...
        X