Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merging data

    Hello, I have two data sets. Each data set has two common variables but not the case (ID) number. How can I merge these two data sets based on two of these common variables.? (The original data set is separated into several packages, I would like to merge two packages again)

  • #2
    If x and y combined uniquely identify each case, and dataset2 is in core and sorted by x and y then

    Code:
    use dataset1
    sort x y
    merge 1:1 x y using dataset2
    but if x and y are sex and race, obviously they won't uniquely identify each case and the merge will fail. See
    https://www.stata.com/manuals/u23.pdf for the -merge- command.

    Comment


    • #3
      I tried but got a message : "variables x y do not uniquely identify observations in the master data" in this case I used gender and class level of the students. But it did not help, what can be the reason for this? Thanks..

      Comment


      • #4
        Use

        Code:
        merge m:1
        or

        Code:
        merge 1:m

        You should not have duplicates of both variables at least in one dataset. If you do, you need to think more about how you want to combine the datasets.

        Comment


        • #5
          Ok, in two datasets I have two same variables. it was actually one dataset. But now dataset is divided into two. I am missing variables in the main data set. The ather dataset has variables that I need. between two datasets I have two common variables (var_x and var_y) . In this case how to merge these two datasets to one? So I can use the other variables that I need in the second dataset.. Make this sense? thank you...

          Comment


          • #6
            As I said in #4, as long as there are no duplicates in the using dataset:

            Code:
            use dataset1
            sort x y
            merge m:1 x y using dataset2
            Otherwise, see why you have duplicates:

            Code:
            use dataset2, clear
            duplicates tag x y, gen(dup)
            sort x y
            list if dup, sepby(x y)

            Comment

            Working...
            X