Dear Stata users,
I want to merge or join two datasets (confused what would be the most convenient one!). First one is the unemployment spell dataset that has person identifier with their unique unemployment spell information. The second dataset contains the referral data that an unemployed person gets from unemployment office.
The first dataset looks like the following. persid is personal ID, akt_dat is spell start date and av_dat is spell end date.
The second dataset looks like the following:
I am trying to merge/join the second dataset with the first dataset by persid. The merge/join rule that I want is below:
a) For each row in first dataset, there could be multiple rows in second dataset. This means the row in first dataset is the unique unemployment spell info and a person can get multiple job referrals in each unemployment period.
b) Therefore, if a particular referral date from the second dataset lies within the range of akt_dat and av_dat then it will merge with the first dataset.
I am not sure whether I am clear with my explanation. A help would be very much appreciated.
Thanks in advance,
Zariab Hossain
Uppsala University
I want to merge or join two datasets (confused what would be the most convenient one!). First one is the unemployment spell dataset that has person identifier with their unique unemployment spell information. The second dataset contains the referral data that an unemployed person gets from unemployment office.
The first dataset looks like the following. persid is personal ID, akt_dat is spell start date and av_dat is spell end date.
clear
input str9 persid int(akt_dat av_dat)
"100000001" 17314 17419
"100000005" 17440 17471
"100000005" 18049 18231
"100000007" 18400 18723
"100000009" 17932 .
"100000010" 17895 18273
"100000011" 15131 15207
"100000011" 15494 15582
"100000011" 15862 15942
"100000011" 16229 16313
end
format %td akt_dat
format %td av_dat
input str9 persid int(akt_dat av_dat)
"100000001" 17314 17419
"100000005" 17440 17471
"100000005" 18049 18231
"100000007" 18400 18723
"100000009" 17932 .
"100000010" 17895 18273
"100000011" 15131 15207
"100000011" 15494 15582
"100000011" 15862 15942
"100000011" 16229 16313
end
format %td akt_dat
format %td av_dat
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str9 persid int referal_date long orderid "100000025" 18247 5395636 "100000025" 18247 5394698 "100000081" 17435 4696299 "100000081" 17714 5014115 "100000083" 18092 5296834 "100000083" 18113 5307976 "100000119" 16496 3893729 "100000119" 16841 4132671 "100000119" 17953 5195546 "100000157" 18011 5240020 end format %td referal_date
a) For each row in first dataset, there could be multiple rows in second dataset. This means the row in first dataset is the unique unemployment spell info and a person can get multiple job referrals in each unemployment period.
b) Therefore, if a particular referral date from the second dataset lies within the range of akt_dat and av_dat then it will merge with the first dataset.
I am not sure whether I am clear with my explanation. A help would be very much appreciated.
Thanks in advance,
Zariab Hossain
Uppsala University
Comment