Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two discrete datasets

    Hello all,

    My master dataset and the dataset that I want to merge are in discrete time format (multiple records per RandomID [The RandomID is my unique identifier]). The error message that I get is:

    "variable RandomID does not uniquely identify observations in the master data"

    I've Googled a little, and there doesn't seem much on merging discrete datasets. I thought merge 1:1 would work, but obviously not. The other merging methods (one-to-many and many-to-one) are not suitable for what I want to do.

  • #2
    Are you sure you want to merge, rather than append?

    Comment


    • #3
      Well as I understand it, merge is to add a new variable to existing observations (which I want to do) while append is to add more observations only.

      Comment


      • #4
        It depends on what you exectly want to do what your data looks like, but given the minimal information you have given us I suspect you want a many to one merge.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Right, I figured out I need two unique variables --- luckily I have that! Now an additional question: Is there an easy way to get rid of mismatched variables?

          Comment


          • #6

            Assuming "mismatched variables" is intended to refer to records that occur in one data set but not the other, after -merge- Stata creates a variable called _merge which takes on the values 1 for records that are in the master data only, 2 for records in the using data only, and 3 for records that are found in both data sets (matches). If you want to keep only the matched records you can do that with:

            keep if _merge == 3

            But merging data sets can be tricky and produce surprising results. It would be better practice to use your understanding of the two data sets to determine in advance what kinds of records you expect to find. For example, if the master data set is a subsample you are studying (but multiple records per participant) and the using data set is a compilation of the entire sample which should include everybody in your sample (but only once), then you would expect only to have two types of result: _merge == 2 or _merge == 3. If there really were a record with _merge == 1 it would imply that your sample contains somebody who is not properly registered. You would want to know that this problem arose, and then do something about it before blundering on analyzing an incorrectly merged data set.

            Stata makes this kind of checking easy for you by providing the assert() and keep() options to the merge command. For the example I just gave, your command would be:

            merge m:1 id using registry_data_set, assert(match using) keep(match)

            and you will then have both the merged data that you want and assurance that it really is what you want.

            Strongly suggest you read the full -merge- documentation. It's a very powerful and important command, but also very easy to misuse.

            Comment

            Working...
            X