Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with the variable _merge after joinby

    Hello, I am using stata 15 and I am having an issue with joinby. After I merge the two database I have 14 observation "only in using data". Then I type list id if _merge ==2 and I get a list with 14 id. So far so good.

    But then, if I browse one of the id in that list, its variable _merge says "only in master data".

    How on earth is this possible????

    Thank you for you help,

    Ylenia

  • #2
    Make sure when you are browsing, you're finding the case under the variable called "id" and not the first column of the browser. That is row number, which has nothing to do with the id.

    A way to check is, instead of manually browsing, take a few id, and then type something like (2, 5, 17, 48 are my made up IDs):

    Code:
    browse id _merge if inlist(id, 2, 5, 17, 48)
    Last edited by Ken Chui; 27 May 2021, 20:33.

    Comment


    • #3
      Hi Ken,
      actually I am not browsing the whole database, I wrote br if id=="xxx". So I am sure I am looking at the right observation. Moreover the id is a name, not a number, so pretty impossible to make the mistake you suggested.
      Any other idea?

      Comment


      • #4
        I would update Stata first and see if the issue persists.

        Code:
        update all
        Also, you should check if you have an inconsistency if you browse specifying a specific value of the variable _merge. If you have very long identifiers (numbers), it may be a precision issue.

        Code:
        browse if _merge==2

        Comment


        • #5
          Yes, there was an inconsistency, a bloody extra space after the end of the id! I think the forum really needs a facepalm emoticon.

          Thank you.

          Comment


          • #6
            Everybody who works with string variables needs to be familiar with the trim() and itrim() functions. When I create a data set one part of my do file is almost always:
            Code:
            ds, has(type string)
            foreach v in `r(varlist)' {
                replace `v' = trim(itrim(`v'))
            }
            precisely to avoid such problems. Of course, you have to be sure that padded blanks are not actually meaningful distinctions in your data--but that is rarely the case, and is also a really bad data practice.

            Comment

            Working...
            X