Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping Observations for duplicate answers

    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str7 MCSID byte(GPNUM00 GACTRY00 GPNTCO00 GPNTLP00)
    "M10016V" 1 4 9 -1
    "M10068H" 1 4 3 -1
    "M10068H" 2 4 3 -1
    "M10083G" 2 4 16 -1
    "M10146E" 1 4 3 -1
    "M10146E" 2 4 3 -1
    "M10153D" 1 4 2 -1
    "M10153D" 2 4 2 -1
    "M10379V" 1 4 2 -1
    "M10492V" 2 4 -1 2
    "M10537R" 1 4 -1 2
    "M10541M" 2 4 2 -1
    "M10567X" 1 4 -1 2
    "M10603J" 2 4 16 -1
    "M10617Q" 1 4 17 -1
    "M10617Q" 2 4 14 -1
    "M10620K" 1 4 -1 3
    "M10683A" 1 4 16 -1
    "M10688F" 1 4 2 -1
    "M10762Y" 1 4 -1 2
    "M10816V" 1 4 7 -1
    "M10816V" 2 4 7 -1

    The dataset above is measuring joint income from 2 parent families (GPNTCO00) which is recorded by different people (GPNUM00) within each family (MCSID). I only need one income answer for each family. Some MCSID have recorded the question twice, as they have asked different family members. eg. M10068H has recorded the same answer for both GPNUM00 1 and 2, whereas M!0016V just has the income answer recorded once from GPNUM00 1. How do I drop the additional answer from each family given the GPNUM00 numbers are randomly assigned and sometimes the different family members have produced different income answers. Ideally I would like to keep the higher value of income for those families who have recorded two reponses. Thank you, J Rea

  • #2
    First, my understanding is that GPNTCO00 is the income variable, and MCSID is the family id. If that's right, none of the other variables are relevant to your problem and only serve to obscure what I think you want, namely: "If there are multiple observations for a family, assign as the family income for each member the highest value reported by any member of the family." That value can be found as:

    Code:
    egen HighestIncome = max(GPNTCO00), by(MCSID)
    (When you can't figure out how to calculate some variable, consult -help egen-.)

    I'm presuming here that by contrast to what you say, you don't really want to "drop observations," which in the "spreadsheet sense" would mean dropping rows.

    Comment


    • #3
      Thank you, I was then able to tag these values and extract 1 value for each family. Much appreciated, J Rea

      Comment

      Working...
      X