Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • nearmrg duplicating values

    Hi all, I'm using the nearmrg command (with Stata 16.1) to merge two forms (form 1 & 2) on a "uniqueID" which is a concatenation of user ID + visit date. (Each user has multiple visit dates; for a given visit, usually form 1 and 2 will be on the same date but other times form 2 will occur a few days later, hence the use of nearmrg. Typically, each visit is about a month apart but many exceptions.)

    Nearmrg seems to merge as intended for the majority of my dataset... however, occasionally there are visits where the user has form 1 but no form 2. When this occurs, nearmrg seems to find the form 2 for the closest visit date and duplicate it, such that a given form 2 is now merged with two form 1s (one correctly, the other incorrectly). In the example data below, everything works fine for the first person (form1id=110006); however, the described error occurs for the third row of 110293 and the third row of 110652).

    Is there a way I can tell nearmrg to only use each form 2 value once? I have already tried adding the "type(1:1)" option but that didn't seem to get around this problem. Saw other posts about pre-existing duplicate values in the dataset but nothing about why nearmrg does this duplication. Ideally if I knew which values this would apply to ahead of time, I could remove them, but not sure how to do that since it's only when the merge occurs that I can see which visits have a form1 but are missing their form2.

    Thanks,
    ~Cristina


    nearmrg lines:
    use form1, clear
    nearmrg using form2, upper nearvar(uniqueid) genmatch(matchedvar) limit(25) type(1:1)

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long form1id int form1visitdate double uniqueid int form2visitdate long form2id
    110006 21630 11000621630 21630 110006
    110006 21676 11000621676 21678 110006
    110006 21707 11000621707 21707 110006
    110293 21740 11029321740 21740 110293
    110293 21782 11029321782 21782 110293
    110293 21838 11029321838 21854 110293
    110293 21854 11029321854 21854 110293
    110652 21843 11065221843 21843 110652
    110652 21873 11065221873 21873 110652
    110652 21896 11065221896 21901 110652
    110652 21901 11065221901 21901 110652
    110652 21956 11065221956 21956 110652
    end
    format %tddd-Mon-YY form1visitdate
    format %tddd-Mon-YY form2visitdate
    Last edited by Cristina Munk; 20 Oct 2020, 11:58.
Working...
X