Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems merging data based on a string variable and date variable

    Would like to ask for help as I have been trying to merge two datasets based on a string ( which is a national identification number) and date variable in (int, %td).

    I opened a dataset and put in the command

    merge 1:1 ID StudyDate using "C:\Desktop\research proj\ID2018_280222.dta"

    I get this error message:

    variables ID StudyDate do not uniquely identify observations in the master data
    r(459);

    Appreciate advice!

    Thanks much!
    Last edited by Zijuan Huang; 30 Mar 2022, 10:01.

  • #2
    You apparently have some combinations of ID and StudyDate for which the dataset in memory has more than one observation. Do you expect this?

    If so, you want merge m:1 ... to allow for multiple observations with the same ID and StudyDate to occur in your dataset in memory.

    If not, you have duplicates in your data and you need to understand the reason and deal with it. The duplicates command is your assistant here.
    Code:
    help duplicates
    duplicates report ID StudyDate
    and then onward to other duplicates subcommands.

    You would also do well to review the output of help merge to better understand the command.

    Note: if you have duplicates in both the dataset in memory and the using dataset, do not consider using merge m:m - that is a mistake and will not do what you want.

    Even the Stata documentation asserts that merge m:m is almost never appropriate. If find yourself thinking about using merge m:m, it is a near certainty that at least one of the following is true:

    1. Your merge key actually does uniquely identify the observations in one of your data sets, so you can use merge 1:m or merge m:1 or maybe even merge 1:1.

    2. You are failing to take account of one or more additional variables in your data set that, combined with the variables you are trying to use as the merge key, uniquely identify the observations in one or both of your data sets, so you can use merge 1:m or merge m:1 or merge 1:1 with the expanded merge key.

    3. You are really trying to accomplish what joinby, a different command, does: create in the output dataset every possible combination of an observation from the first dataset and an observation from the second dataset, both having the same key. (SQL users take note! I fell for this myself: to Stata m-to-m does not mean m-by-m.)

    4. You actually need to append your datasets rather than merge them.

    5. The data sets you are trying to merge are incorrectly configured or contain data errors that need to be fixed.

    Comment


    • #3
      Thanks so much! I think that is the problem, that I have duplicates for the same date and same ID some of which are wrong entries which I may need to get rid of. Will just request the whole set of data again instead of trying to merge.

      Comment

      Working...
      X