Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merging/joining rows from using dataset that fall within a range in the master dataset

    Dear Stata users,

    I want to merge or join two datasets (confused what would be the most convenient one!). First one is the unemployment spell dataset that has person identifier with their unique unemployment spell information. The second dataset contains the referral data that an unemployed person gets from unemployment office.

    The first dataset looks like the following. persid is personal ID, akt_dat is spell start date and av_dat is spell end date.

    clear
    input str9 persid int(akt_dat av_dat)
    "100000001" 17314 17419
    "100000005" 17440 17471
    "100000005" 18049 18231
    "100000007" 18400 18723
    "100000009" 17932 .
    "100000010" 17895 18273
    "100000011" 15131 15207
    "100000011" 15494 15582
    "100000011" 15862 15942
    "100000011" 16229 16313
    end
    format %td akt_dat
    format %td av_dat
    The second dataset looks like the following:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 persid int referal_date long orderid
    "100000025" 18247 5395636
    "100000025" 18247 5394698
    "100000081" 17435 4696299
    "100000081" 17714 5014115
    "100000083" 18092 5296834
    "100000083" 18113 5307976
    "100000119" 16496 3893729
    "100000119" 16841 4132671
    "100000119" 17953 5195546
    "100000157" 18011 5240020
    end
    format %td referal_date
    I am trying to merge/join the second dataset with the first dataset by persid. The merge/join rule that I want is below:

    a) For each row in first dataset, there could be multiple rows in second dataset. This means the row in first dataset is the unique unemployment spell info and a person can get multiple job referrals in each unemployment period.
    b) Therefore, if a particular referral date from the second dataset lies within the range of akt_dat and av_dat then it will merge with the first dataset.

    I am not sure whether I am clear with my explanation. A help would be very much appreciated.

    Thanks in advance,

    Zariab Hossain
    Uppsala University

  • #2
    The simplest way to do this is with the -rangejoin- command:
    Code:
    use unemployment_data, clear
    rangejoin referal_date akt_dat av_dat using referral_data, by(persid)
    Replace the italicized parts of the code by the actual names of the corresponding data sets.

    -rangejoin- is written by Robert Picard and is available from SSC. To use -rangejoin-, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    Added: In your example data, none of the referrals shown actually match any of the employment spells shown. I assume that this will not be the case in your full data sets.

    Comment


    • #3
      Dear Clyde

      As usual a Big thanks to you for your prompt and smart suggestion. It worked perfectly for my purpose. You were right that the example data didn't match along the key identifier, however, that was just an excerpt for explanation purpose.

      Comment

      Working...
      X