Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Your example datasets fall into the last category.
    Maybe, maybe not. I think it is more likely that the example datasets fall into none of these categories because they are just wrong.

    If there are, indeed, supposed to be two different observations with ID = "111111110" in each data set, then, yes, the data are correct and -joinby- is appropriate. In that case, the results will have both observations from each data set paired with both of those observations in the other. In other words, there will be 2x2 = 4 observations with ID = "111111110" in the resulting dataset. If that is what is intended, go ahead and use -joinby-. It's legal, but it's uncommon. And given that all the other values of ID occur only once, I think it is more likely that we are looking at bad data. Only O.P. can tell for sure.

    Comment


    • #17
      Thanks everyone.

      There were few duplicates in dataset of 20,000 IDs.. I ended up cleaning it. There is only one set of observation for each ID.

      Identifier var/ID is distinct in both datasets and there should not be any duplicates. 1:1 merge works as well.


      Appreciate all of your help. It cleared so many doubts while dealing with large datasets and long digits.

      Regards
      Sandeep

      Comment


      • #18
        1) What does distinct mean? ID var is present in both datasets. Does it mean all ID's are different and no duplicates?
        Distinct means any pair of observations in the datset have the different value(s) for their identifier variable(s). (If instead of one ID as you have, you are matching on ID and year, the IDs can be the same as long as the years are different; the years can be the same as long as the IDs are different.) As Clyde pointed out, that is not the case in either of your example datasets.
        Code:
        2) How can data can inserted in command?
        . use `dataA', clear
        . joinby ID using `dataB', unmatched(both)
        I do not understand what this asks. Perhaps I have confused you by putting your two example datasets into temporary files ("tempfile"s) which are referred to by the local macros dataA and dataB. Reading the output of
        Code:
        help joinby
        should make the syntax of the joinby command clearer.

        Comment


        • #19
          Thanks William Lisowski . It totally makes sense to me now.

          Comment

          Working...
          X