merging/joining rows from using dataset that fall within a range in the master dataset

Zariab Hossain

Join Date: Oct 2020

Posts: 48
#1

merging/joining rows from using dataset that fall within a range in the master dataset

28 Jan 2024, 11:54

Dear Stata users,

I want to merge or join two datasets (confused what would be the most convenient one!). First one is the unemployment spell dataset that has person identifier with their unique unemployment spell information. The second dataset contains the referral data that an unemployed person gets from unemployment office.

The first dataset looks like the following. persid is personal ID, akt_dat is spell start date and av_dat is spell end date.

clear
input str9 persid int(akt_dat av_dat)
"100000001" 17314 17419
"100000005" 17440 17471
"100000005" 18049 18231
"100000007" 18400 18723
"100000009" 17932 .
"100000010" 17895 18273
"100000011" 15131 15207
"100000011" 15494 15582
"100000011" 15862 15942
"100000011" 16229 16313
end
format %td akt_dat
format %td av_dat

The second dataset looks like the following:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str9 persid int referal_date long orderid "100000025" 18247 5395636 "100000025" 18247 5394698 "100000081" 17435 4696299 "100000081" 17714 5014115 "100000083" 18092 5296834 "100000083" 18113 5307976 "100000119" 16496 3893729 "100000119" 16841 4132671 "100000119" 17953 5195546 "100000157" 18011 5240020 end format %td referal_date

I am trying to merge/join the second dataset with the first dataset by persid. The merge/join rule that I want is below:

a) For each row in first dataset, there could be multiple rows in second dataset. This means the row in first dataset is the unique unemployment spell info and a person can get multiple job referrals in each unemployment period.
b) Therefore, if a particular referral date from the second dataset lies within the range of akt_dat and av_dat then it will merge with the first dataset.

I am not sure whether I am clear with my explanation. A help would be very much appreciated.

Thanks in advance,

Zariab Hossain
Uppsala University
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29801
#2

28 Jan 2024, 12:45

The simplest way to do this is with the -rangejoin- command:

Code:

use unemployment_data, clear rangejoin referal_date akt_dat av_dat using referral_data, by(persid)

Replace the italicized parts of the code by the actual names of the corresponding data sets.

-rangejoin- is written by Robert Picard and is available from SSC. To use -rangejoin-, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

Added: In your example data, none of the referrals shown actually match any of the employment spells shown. I assume that this will not be the case in your full data sets.
Comment
Zariab Hossain

Join Date: Oct 2020

Posts: 48
#3

28 Jan 2024, 13:47

Dear Clyde

As usual a Big thanks to you for your prompt and smart suggestion. It worked perfectly for my purpose. You were right that the example data didn't match along the key identifier, however, that was just an excerpt for explanation purpose.
Comment

Announcement

merging/joining rows from using dataset that fall within a range in the master dataset

Comment

Comment