Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge by person number and household ID?

    Hello,
    I am trying to merge data sets with many waves. Both have variable of PNID and HHID. But when i try to merge by matching by PNID and HHID I get an error "variables PNID HHID do not uniquely identify observations in the master data.

    The master data has PNID as there is a person number (10) and (21) and (31) all live in HHID (003). SO there are many "10s" in the dataset.

    How do I merge this? Is there a way to create a unique identifier for each person.
    tabulate PNID

    person |
    number |
    (char) | Freq. Percent Cum.
    ------------+-----------------------------------
    10 | 75,228 54.30 54.30
    11 | 3,107 2.24 56.54
    12 | 90 0.06 56.61
    13 | 5 0.00 56.61
    20 | 42,989 31.03 87.64
    21 | 665 0.48 88.12
    22 | 42 0.03 88.15
    30 | 6,590 4.76 92.90
    31 | 260 0.19 93.09
    32 | 11 0.01 93.10
    40 | 9,247 6.67 99.77
    41 | 299 0.22 99.99
    42 | 14 0.01 100.00
    ------------+-----------------------------------
    Total | 138,547 100.00



  • #2
    I suppose that the personal number although common across households should not be common within households if it is a unique household member identifier. To find out what is going on, start with

    Code:
    duplicates tag HHID PNID, gen(dups)
    list if dups>1
    Last edited by Andrew Musau; 20 Nov 2019, 10:00.

    Comment


    • #3
      summarize dups

      Variable | Obs Mean Std. Dev. Min
      > Max
      -------------+----------------------------------------------------
      > -----
      dups | 138,547 7.297913 1.717128 4
      > 9

      When I listed dups I had to stop stata from listing every observation in the dataset

      Comment


      • #4
        Can you copy and paste the output from the following:

        Code:
        dataex in 1/20

        Comment


        • #5
          . dataex in 1/20
          input statement exceeds linesize limit. Try specifying fewer variables
          r(1000);

          . dataex in 1/5
          input statement exceeds linesize limit. Try specifying fewer variables
          r(1000);

          Comment


          • #6
            dataex PNID HHID age in 1/20

            ----------------------- copy starting from the next line -----------------------
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte PNID str6 HHID int agey
            10 "000003" 62
            10 "000003" 70
            10 "000003" 74
            10 "000003" 72
            10 "000003" 66
            10 "000003" 68
            10 "000003" 76
            10 "000003" 64
            10 "000003" 60
            10 "010001" 66
            10 "010001" 58
            10 "010001" 74
            10 "010001" 60
            10 "010001" 62
            10 "010001" 64
            10 "010001" 71
            10 "010001" 57
            10 "010001" 72
            10 "010001" 69
            10 "010004" 56
            end
            ------------------ copy up to and including the previous line ------------------

            Listed 20 out of 138547 observations

            Comment


            • #7
              Thanks, I did not pay attention to your initial description.

              I am trying to merge data sets with many waves
              Do you have a variable identifying a particular wave (e.g., year) in both data sets? If so, merge using

              Code:
              PNID HHID year

              Comment


              • #8
                No I can't do year. But I think i can create a wave variable to match the master. The new data is all from one wave. So i can create a simple wave variable

                Comment


                • #9
                  If the second data set has no duplicates in terms of PNID HHID, you can do a one-to-many merge then. You check this using

                  Code:
                  isid PNID HHID
                  If there are duplicates, the command will tell you this otherwise you are fine. There is no need to create an extra wave variable. For example

                  Code:
                  *DATASET 2
                  clear
                  input byte PNID str6 HHID int varx
                  10 "000003" 60
                  10 "010001" 69
                  10 "010004" 56
                  end
                  tempfile two
                  save `two'
                  
                  *DATASET 1
                  clear
                  input byte PNID str6 HHID int agey
                  10 "000003" 62
                  10 "000003" 70
                  10 "000003" 74
                  10 "000003" 72
                  10 "000003" 66
                  10 "000003" 68
                  10 "000003" 76
                  10 "000003" 64
                  10 "000003" 60
                  10 "010001" 66
                  10 "010001" 58
                  10 "010001" 74
                  10 "010001" 60
                  10 "010001" 62
                  10 "010001" 64
                  10 "010001" 71
                  10 "010001" 57
                  10 "010001" 72
                  10 "010001" 69
                  10 "010004" 56
                  end
                  
                  merge m:1 PNID HHID using `two'
                  Res.:

                  Code:
                  . l
                  
                       +-------------------------------------------+
                       | PNID     HHID   agey   varx        _merge |
                       |-------------------------------------------|
                    1. |   10   000003     62     60   matched (3) |
                    2. |   10   000003     70     60   matched (3) |
                    3. |   10   000003     74     60   matched (3) |
                    4. |   10   000003     72     60   matched (3) |
                    5. |   10   000003     66     60   matched (3) |
                       |-------------------------------------------|
                    6. |   10   000003     68     60   matched (3) |
                    7. |   10   000003     76     60   matched (3) |
                    8. |   10   000003     64     60   matched (3) |
                    9. |   10   000003     60     60   matched (3) |
                   10. |   10   010001     66     69   matched (3) |
                       |-------------------------------------------|
                   11. |   10   010001     58     69   matched (3) |
                   12. |   10   010001     74     69   matched (3) |
                   13. |   10   010001     60     69   matched (3) |
                   14. |   10   010001     62     69   matched (3) |
                   15. |   10   010001     64     69   matched (3) |
                       |-------------------------------------------|
                   16. |   10   010001     71     69   matched (3) |
                   17. |   10   010001     57     69   matched (3) |
                   18. |   10   010001     72     69   matched (3) |
                   19. |   10   010001     69     69   matched (3) |
                   20. |   10   010004     56     56   matched (3) |
                       +-------------------------------------------+
                  ADDED IN EDIT:


                  The new data is all from one wave.
                  If the data relates to an extra year, then you can append as opposed to merge, but yes, you need a wave identifier to differentiate the new observations from the rest.

                  Code:
                  *DATASET 2
                  clear
                  input byte PNID str6 HHID int varx
                  10 "000003" 60
                  10 "010001" 69
                  10 "010004" 56
                  end
                  tempfile two
                  save `two'
                  
                  *DATASET 1
                  clear
                  input byte PNID str6 HHID int agey
                  10 "000003" 62
                  10 "000003" 70
                  10 "000003" 74
                  10 "000003" 72
                  10 "000003" 66
                  10 "000003" 68
                  10 "000003" 76
                  10 "000003" 64
                  10 "000003" 60
                  10 "010001" 66
                  10 "010001" 58
                  10 "010001" 74
                  10 "010001" 60
                  10 "010001" 62
                  10 "010001" 64
                  10 "010001" 71
                  10 "010001" 57
                  10 "010001" 72
                  10 "010001" 69
                  10 "010004" 56
                  end
                  
                  append using `two'
                  Res.:

                  Code:
                     +-----------------------------+
                       | PNID     HHID   agey   varx |
                       |-----------------------------|
                    1. |   10   000003     62      . |
                    2. |   10   000003     70      . |
                    3. |   10   000003     74      . |
                    4. |   10   000003     72      . |
                    5. |   10   000003     66      . |
                       |-----------------------------|
                    6. |   10   000003     68      . |
                    7. |   10   000003     76      . |
                    8. |   10   000003     64      . |
                    9. |   10   000003     60      . |
                   10. |   10   010001     66      . |
                       |-----------------------------|
                   11. |   10   010001     58      . |
                   12. |   10   010001     74      . |
                   13. |   10   010001     60      . |
                   14. |   10   010001     62      . |
                   15. |   10   010001     64      . |
                       |-----------------------------|
                   16. |   10   010001     71      . |
                   17. |   10   010001     57      . |
                   18. |   10   010001     72      . |
                   19. |   10   010001     69      . |
                   20. |   10   010004     56      . |
                       |-----------------------------|
                   21. |   10   000003      .     60 |
                   22. |   10   010001      .     69 |
                   23. |   10   010004      .     56 |
                       +-----------------------------+
                  Last edited by Andrew Musau; 20 Nov 2019, 11:05.

                  Comment


                  • #10
                    I was able to merge 1:1 PN HHID wave

                    Comment

                    Working...
                    X