Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Ah! That's it. Limiting the duplicate report to the two identifying variables showed me the problem, a simple data entry error in the using data set. Thanks! You guys are life savers.

    Comment


    • #17
      variables ccode1 year do not uniquely identify observations in the using data
      r(459); [Emphasis added[
      So it is your using dataset, that has multiple observations with the same value(s) of ccode1 and year. So what you want is -merge 1:m-

      Now, it may be that with your original -merge 1:1- you got a message about ccode1 year not identifying unique observations in the master data, prompting you to change it to -merge m:1-. Do NOT, by analogy go to -merge m:m-. What -merge m:m- does, though it is legal syntax, is a very bizarre way of putting the two data sets together that is almost never what is wanted and usually just results in a jumble of unrelated observations being thrown together.

      If ccode1 and year do not, jointly, identify unique observations in either data set, then basically you cannot -merge- them together with ccode1 year as the merge key. There are several possibilities here:

      1. One or both of the data sets really should have its observations uniquely identified by ccode1 and year. This means that your data sets are incorrect and you need to re-trace how they were built and fix the problem. It may be as simple as running -duplicates drop- to get rid of purely identical observations, but often this situation is a symptom of a more profound problem with the data that requires more extensive work to repair.

      2. There is no real reason why ccode1 and year should jointly identify unique observations. Then there are three subscases here.
      2a. There are some additional variables (or perhaps just one variable) that, along with ccode1 and year should jointly identify unique observations. In that case, including those additional variables in the merge key will resolve the problem. (For example, maybe ccode1 year and month might do the trick, or ccode1 year and industry, or something like that.)
      2b. There are no such additional variables in your data set, but in principle they exist and could be retrieved. Expanding your data sets to include such information then leads back to 2a.
      2c. There is no reason at all why ccode1 and year should jointly identify unique observations even with additional variables thrown into the merge key. In that case the very concept of merging the data sets is ill-defined. If I have three observations in the first data set with a given value of ccode1 and year, and four in the second data set with those same values, how are we to know which of the three gets matched with which of the four (and which goes unmatched). It's simply undefined. What is possible under these circumstances is to match each of the three with each of the four, resulting in 12 pairs. This, however, is not done with -merge-; the command to do this is -joinby-.
      Added: Crossed with #16, which makes clear that this fell under case 1 and was readily resolved. Perhaps others reading the thread, however, will derive some benefit from the general discussion laid out here.
      Last edited by Clyde Schechter; 08 May 2017, 16:49.

      Comment


      • #18
        I have a similar problem. I'm writing my Master thesis and got the problem by merging several data sets of panel data. There is no problem by merging the first six, but when it comes to the 7th and last, the message "variables ID_t wave do not uniquely identify observations in the master data" occurs.
        The data contains information about school children and their transition after school. The first six datasets contain the Cohort profile, the asked parents, the children asked in school, the children asked after leaving school, the school dataset, which contains information about the institution (like end of schooling time). The last data set which I like to merge contains information about Vocational training. I need this data set because of one variable: the starting date of vocational training, so I can calculate the time between school and vocational training. My supervisor said it may be due to the fact, that one person is in two institutions in the same wave and stata has problems matching them. If I would use the m:m command some IDs may be matched to the wrong time points which would be a huge problem.
        For a better understanding: I'm examing the transition from school to vocational training in germany for non-A leveler and the impact of social capital on this transition.
        Maybe you can help me in this situation.
        Thanks and best wishes,
        Myriam

        Comment


        • #19
          I don't think anyone will be able to help you unless you post examples of the data you are working with. Read FAQ #12 for instructions about using the -dataex- command to post example data, and follow them. The examples you need to show are, 1) an example of the result of merging the first 6 data sets, and 2) an example of the 7th data set. Be sure that the particular examples you show include some observations that should match up with each other in the merge.

          I'm very glad that you are aware that -merge m:m- is inappropriate.

          Comment


          • #20
            I am trying to merge two data sets. My key variable is the date. The date format in both data sets is dd/mm/yy. The data set in memory contains only the date variable with a big number of sorted observations. The "using" data set contains only a handful of observations so I can see all the dates are unique there. I keep receiving the ...not uniquely identify... message. Does someone has a clue what may be wrong?








            Comment


            • #21
              Without seeing the actual -merge- command you tried and the exact error message Stata gave you it is difficult to be specific.

              As best I can imagine from your brief description, you believe that the date variable uniquely identifies observations with the using data. GIven that the master data in memory has a large number of observations, my guess is that there are multiple observations for some or all dates. So the command to merge them should be:

              Code:
              merge m:1 name_of_date_variable using name_of_using_data_set_here
              If you tried -merge 1:1- then the command will, appropriately, fail if any value of the date variable occurs more than once in the data in memory.

              Another possible source of the problem is that although you think the using data set has no duplicate values of the date variable, you may be wrong. So it is worth running the following before trying to -merge-:

              Code:
              isid using name_of_using_data_set_here
              If the date variable observations are truly unique in the using data set, then Stata will say nothing and move on to your next command. But if you are mistaken about that, Stata will give you an error message telling you how many duplicate observations there are. Then your next task is to go about finding them:
              Code:
              use name_of_using_data_set_here, clear
              duplicates tag name_of_date_variable, gen(flag)
              browse if flag
              and figuring out what to do about them.

              If what you tried is what I showed above, then please post back showing:

              1. Examples of the data from each of the two data sets; use the -dataex- command to do this.

              2. The exact -merge- command you tried.

              3. The exact response you got from Stata, including any error messages.

              For #2 and #3, please be sure to wrap them in code delimiters to enhance their readability here. If you are not familiar with code delimiters or with -dataex-, please read Forum FAQ #12 for detailed instructions.

              Comment


              • #22
                Thank you for your reply Clyde. I found the problem. Though the "using" data set was containing only a handful of visible observations, the number of observations was much larger and Stata consider all those blank date observations as duplicates

                Comment


                • #23
                  I'm glad you found the solution to your problem. I have encountered this situation fairly often myself. My experience is that it typically arises when using data that was initially imported from spreadsheets. For reasons I do not understand, sometimes -import excel- fails to recognize the end of the data in a spreadsheet and brings in a number of blank observations at the end. These blank observations can cause havoc during coding as variables that are supposed to be unique identifiers fail to be so. It has happened to me often enough that I have made it a habit to check for this, and eliminate the blank observations if there are any, after importing data from spreadsheets.

                  Comment


                  • #24

                    Hello, I have a similar Problem.
                    Table_one contains Data on a daily basis and table_two contains monthly averages stated on the 15th of each month. Only one row for one month is existing in table_two. I want the two tables to be merged so that Every Day in table_one is filled with the average Data for each month of table_two.

                    Table_one
                    Datum r_stock1 r_stock2 Monats
                    1/1/1980 0,001 0,122 1980m1
                    1/2/1980 0,002 -0,001 1980m1
                    ...


                    Table_two
                    Datum infl_1 infl_2 Monats
                    1/15/1980 0,0001 0,2 1980m1
                    2/15/1980 0,0002 0,00009 1980m2
                    ...


                    Table_expected
                    Datum r_stock1 r_stock2 infl_1 infl_2 Monats
                    1/1/1980 0,001 0,122 0,0001 0,2 1980m1
                    1/2/1980 0,002 -0,001 0,0001 0,2 1980m1
                    ... ... ... ... ... ...
                    2/1/1980 0,053 -1,000002 0,0002 0,00009 1980m2
                    2/2/1980 -0,0092 -0,0001 0,0002 0,00009 1980m2
                    What i tried so far was:
                    -> merge 1:m Monats using 'file'
                    with the outcome "variable Monats does not uniquely identify observations in the master data"
                    But i think that is not the only problem here...

                    Can someone please help me? Thanks in advance
                    Last edited by Niclas Kammler; 29 May 2019, 11:26.

                    Comment


                    • #25
                      Well, when Stata says that a variable does not uniquely identify observations in a data set, I have never known it to be wrong.

                      So the first step is to identify the observations where the same value of Monats recurs:
                      Code:
                      duplicates tag Monats, gen(flag)
                      browse if flag
                      After that, you will have to figure out why those observations are there and what to do about it. It may be that such observations are completely duplicate observations, i.e. they agree on all of the variables. In that case -duplicates drop- will resolve the problem with no loss of information and you can go on to your -merge- command.

                      If the observations with the same values of Monats disagree on some other variable(s), then you have to figure out which one to keep, or, perhaps, how to combine them all into a single variable, such as by taking means or medians or something like that.

                      The other thing to bear in mind in either of these situations is this. You have the expectation that the second data set should contain only a unique observation for any value of Monats. But the data do not conform to that expectation. If your expectation was wrong, then fine--you now know better and understand why. But if you had good reasons to expect unique identification by Monats, it means that the data set was incorrectly created. When a data set contains one error like this, it may well contain others: the entire data management process leading up to it is suspect. So you should confer with whoever provided you with this data set and discuss what has gone wrong. Then you should ignore this version and work with the corrected one.

                      In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                      Comment


                      • #26
                        Am I understanding correctly that you started with table_one in memory (the master file) and table_two is your using file? If so I think you have mis-specified your merge command. My understanding is that you expect duplicates of the variable Monats in table_one but not table_two. So your command would be

                        Code:
                        merge m:1 Monats using "table_two"
                        This tells Stata that you expect duplicates on Monats in your master file and for it to uniquely identify your using file. Your previous command told Stata you expected Monats to be unique in your master. If you started with table_one you already know that Monats should not be unique.

                        Comment


                        • #27
                          hello everyone, I am faced with a similar problem where I cannot merge

                          code:
                          ***IMPORTING LARGE EXCEL FILE
                          set excelxlsxlargefile on

                          import excel "globalterrorismdb_0919dist.xlsx", sheet("Data") firstrow

                          drop if country!=147
                          keep if iyear>=2009 & iyear<=2013
                          keep if gname=="Boko Haram"
                          drop if provstate=="Unknown"

                          g new_event = 1

                          bysort provstate iyear: egen n_event = total(new_event)

                          rename provstate state

                          ***convert str39 to numeric
                          encode state, g(Sstate)

                          *recoding the original variable values to match master
                          recode sstate (1=140) (2=70) (3=90) (4=180) (5=60) (6=350) (7=80) (8=320) (9=40) (10=110) (11=100) (12=30) (13=190) (14=360) (15=150) (16=130) (17=240) (18=160) (19=10) (20=170) (21=50) (22=20)

                          lab define newlabel 140 "abuja" 70 "adamawa" 90 "bauchi" 180 "benue" 60 "borno" 350 "delta" 80 " gombe" 320 "imo" 40 "jigawa" 110 "kaduna" 100 "kano" 30 "katsina" 190 "kogi" 360 "lagos" 150 "nasarawa" 130 "niger" 240 "ondo" 160 "plateau" 10 "sokoto" 170 "taraba" 50 "yobe" 20 "zamafara" , modify
                          lab val sstate newlabel


                          save "bh1.dta"
                          **clear all and load master NDHS
                          merge m:1 sstate using "bh1.dta" variable sstate does not uniquely identify observations in the using data r(459); At the moment, I am at wits end what the problem could be, any advice given is greatly appreciated

                          Comment


                          • #28
                            Well, the error message you are getting says very clearly that the problem is in the using data, i.e. in bh1.dta. And you have said nothing at all about that data set. So it is not possible to help you based on the information you gave.

                            Please post back, showing, not the code that creates the data sets, but example data from both data sets. Make sure that when you run your -merge- command on those example data sets you get the same error message.

                            When posting back with the example data, be sure to use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                            Comment


                            • #29
                              Originally posted by Clyde Schechter View Post
                              Well, the error message you are getting says very clearly that the problem is in the using data, i.e. in bh1.dta. And you have said nothing at all about that data set. So it is not possible to help you based on the information you gave.

                              Please post back, showing, not the code that creates the data sets, but example data from both data sets. Make sure that when you run your -merge- command on those example data sets you get the same error message.

                              When posting back with the example data, be sure to use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
                              Thanks Clyde, I was able to solve it

                              Comment


                              • #30
                                Hello,

                                I was trying to merge national sample survey, India (NSSO) 55th round data sets using merge 1:1 command. Before merging data, I had to create House hold ID (HHID) and personal ID (PID) for 14 files

                                When I tried to merge (1:1, m:1 or 1:m) on these data sets, I got the error message "variable date does not uniquely identify observations in the master data stata".

                                Can someone help me on this.
                                Thanks

                                . merge 1:1 HHID using "C:\Users\suhail.mir\Desktop\PhD daily work\Data Work PhD 2021\55th Round STATA files\Block4-sch10-and-10d
                                > ot1-Records-combined.dta"
                                variable HHID does not uniquely identify observations in the master data
                                r(459);

                                . merge m:1 HHID using "C:\Users\suhail.mir\Desktop\PhD daily work\Data Work PhD 2021\55th Round STATA files\Block4-sch10-and-10d
                                > ot1-Records-combined.dta"
                                variable HHID does not uniquely identify observations in the using data
                                r(459);

                                . merge 1:m HHID using "C:\Users\suhail.mir\Desktop\PhD daily work\Data Work PhD 2021\55th Round STATA files\Block4-sch10-and-10d
                                > ot1-Records-combined.dta"
                                variable HHID does not uniquely identify observations in the master data
                                r(459);

                                Comment

                                Working...
                                X