Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging Datasets

    Hello,
    I am trying to merge 2 datasets (see sample of datasets below) . Each time I try to merge by patid, visit, and day, I get an error message saying the variables do not uniquely identify observations. I am unsure of what to do next. Any help will be greatly appreciated.

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str19 patid byte visit int day double Eq5d
    "Z022509-20012-20546"  7   1  .75
    "Z022509-20012-20546" 11  22  .78
    "Z022509-20012-20546" 15  43  .88
    "Z022509-20012-20546" 20  64  .67
    "Z022509-20012-20556"  7   1  .88
    "Z022509-20012-20556" 11  23  .78
    "Z022509-20012-20556" 15  43  .56
    "Z022509-20012-20556" 20  64  .36
    "Z022509-20012-20556" 24  85  .44
    "Z022509-20012-20535"  7   1  .62
    "Z022509-20012-20535" 11  20 .689
    "Z022509-20012-20535" 15  43 .682
    "Z022509-20012-20535" 20  64  .62
    "Z022509-20012-20535" 24  85 .429
    "Z022509-20012-20535" 28 134 .358
    "Z022509-20012-20535" 33 155 .789
    "Z022509-20012-20535" 37 176 .265
    "Z022509-20012-20535" 46 216 .888
    end
    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str19 patid byte visit int(day LBTscore)
    "Z022509-20012-20546"  7   1  94
    "Z022509-20012-20546" 11  22 110
    "Z022509-20012-20546" 15  43  98
    "Z022509-20012-20546" 20  64  96
    "Z022509-20012-20556"  7   1 107
    "Z022509-20012-20556" 11  23 100
    "Z022509-20012-20556" 15  43  96
    "Z022509-20012-20556" 20  64 110
    "Z022509-20012-20556" 24  85 111
    "Z022509-20012-20535"  7   1  92
    "Z022509-20012-20535" 11  20  92
    "Z022509-20012-20535" 15  43 113
    "Z022509-20012-20535" 20  64  79
    "Z022509-20012-20535" 24  85  37
    "Z022509-20012-20535" 28 134  66
    "Z022509-20012-20535" 33 155  72
    "Z022509-20012-20535" 37 176  92
    "Z022509-20012-20535" 46 216 111
    end
    ------------------ copy up to and including the previous line ------------------


  • #2
    You'll have to find out which cases are duplicates. Here is an example data, I changed it up a bit by actually adding a couple duplicates and also a few lines of missing data:

    Code:
    clear
    input str19 patid byte visit int day double Eq5d
    "Z022509-20012-20546"  7   1  .75
    "Z022509-20012-20546" 11  22  .78
    "Z022509-20012-20546" 15  43  .88
    "Z022509-20012-20546" 20  64  .67
    "Z022509-20012-20556"  7   1  .88
    "Z022509-20012-20556" 11  23  .78
    "Z022509-20012-20556" 15  43  .56
    "Z022509-20012-20556" 20  64  .36
    "Z022509-20012-20556" 24  85  .44
    "Z022509-20012-20535"  7   1  .62
    "Z022509-20012-20535"  7   1  999
    "Z022509-20012-20535" 11  20 .689
    "Z022509-20012-20535" 15  43 .682
    "Z022509-20012-20535" 20  64  .62
    "Z022509-20012-20535" 24  85 .429
    "Z022509-20012-20535" 28 134 .358
    "Z022509-20012-20535" 33 155 .789
    "Z022509-20012-20535" 37 176 .265
    "Z022509-20012-20535" 46 216 .888
    "Z022509-20012-20535" 46 216 999
    "" . . .
    "" . . .
    "" . . .
    end
    First, find out how many using duplicates report:

    Code:
    duplicates report patid visit day
    The report indicates that 16 cases are unique; 4 cases have two copies, yield 2 extra identical cases; 3 cases have 3 copies, yield 2 extra identical cases.

    Code:
    --------------------------------------
       Copies | Observations       Surplus
    ----------+---------------------------
            1 |           16             0
            2 |            4             2
            3 |            3             2
    --------------------------------------
    To find out where they are, use duplicates tag to tag and screen them out for examination:

    Code:
    duplicates tag patid visit day, gen(anyNameYouLike)
    list if anyNameYouLike > 0, sepby(patid)
    Here are the 7 observations with issues:

    Code:
         +-----------------------------------------------------+
         |               patid   visit   day   Eq5d   anyNam~e |
         |-----------------------------------------------------|
     10. | Z022509-20012-20535       7     1    .62          1 |
     11. | Z022509-20012-20535       7     1    999          1 |
     19. | Z022509-20012-20535      46   216   .888          1 |
     20. | Z022509-20012-20535      46   216    999          1 |
         |-----------------------------------------------------|
     21. |                           .     .      .          2 |
     22. |                           .     .      .          2 |
     23. |                           .     .      .          2 |
         +-----------------------------------------------------+
    Based on what the causes are, you can decide how to remove the surpluses. There are other commands like duplicates drop that can do it. To learn more, use command help duplicates.

    Comment


    • #3
      Ken Chui Many thanks for helping Ken. I followed the steps above and was able to finally merge my datasets - thank you!

      Comment

      Working...
      X