Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Look up information from one observation for another (and: Unexpected error in -vlookup-)

    As part of a matching exercise, I am trying to bring in information from one observation into a different observation, within the same dataset. I am using the SSC command -vlookup-, which seems to offer exactly what I need, but encountering an unexpected error. I'm using Stata/SE 14.1 on Windows 10.

    I start with the first three columns of this dataset:

    Code:
    clear
    input id matchid d_date
    102 351 21246
    232 283 .
    280 426 21244
    283 232 21266
    351 102 .
    426 280 .
    end
    And I wish to generate a fourth column d_date_match that "looks up" match_id from the id column, and takes the value of d_date from the corresponding observation. Note that in my full dataset, as in this example, the values of id are unique (that is: <isid id> runs without error). So the desired output is this:

    Code:
    clear
    input id matchid d_date d_date_match
    102 351 21246 .
    232 283 . 21266
    280 426 21244 .
    283 232 21266 .
    351 102 . 21246
    426 280 . 21244
    end
    Searching Statalist for advice on how to do this I found that Nick Cox had previously recommended the user-written command vlookup (https://www.statalist.org/forums/for...mmand-in-stata). So I installed that, and ran:

    Code:
    vlookup matchid, gen(d_date_match) key(id) value(d_date)
    But alas, I got this error:
    Code:
    d_date is unique within id;
    there are multiple observations with different d_date within id.
    Which is very confusing since the two parts of that error message contradict each other. But also, the only field that needs to be unique is the key in the key-value pair: -id- in my example (and it is).

    Note that another confusing error message from vlookup has been reported before with no response: https://www.statalist.org/forums/for...ssc-user-kcrow. Given the lack of response to that post, I fear this command is not well supported. So a workaround would be welcome as well.

    Many thanks!

  • #2
    Let's go for the workaround, using your example data (thanks!).
    Code:
    cls
    clear
    input id matchid d_date
    102 351 21246
    232 283 .
    280 426 21244
    283 232 21266
    351 102 .
    426 280 .
    end
    tempfile master
    save `master'
    
    * confirm that d_date is unique within id
    bysort id (d_date): assert d_date[1]==d_date[_N]
    
    * create lookup table
    bysort id (d_date): keep if _n==1
    drop matchid
    rename (id d_date) (matchid d_date_match)
    tempfile lookup
    save `lookup'
    
    * merge lookup to master
    use `master', clear
    merge m:1 matchid using `lookup'
    sort id
    list, clean noobs abbreviate(12)
    Code:
    . list, clean noobs abbreviate(12)
    
         id   matchid   d_date   d_date_match        _merge  
        102       351    21246              .   matched (3)  
        232       283        .          21266   matched (3)  
        280       426    21244              .   matched (3)  
        283       232    21266              .   matched (3)  
        351       102        .          21246   matched (3)  
        426       280        .          21244   matched (3)

    Comment


    • #3
      Perfect. Thanks!

      Comment


      • #4
        I had the same error message. In my case, the variable I was looking up (matchid, in your example) and the key variable (id) did not share the same type: the first one was numeric, the second one was a string. So at least in some cases, this is what the confusing error message means.

        Comment


        • #5
          After a couple of more changes, the error reappeared. From what I gather, vlookup doesn't handle well missing values in the value(varname) (d_date in the example). I temporarily replaced the missings with a nonsensical value to work around this.

          Comment

          Working...
          X