Hi all,
It's come time to combine my data across years for comparison. I have a data set that contains the Household ID and the individual ID for all 3 years (as shown below) however these are string variables and differ across years.
I've tried:
To try and create a unique id that would identify the household and individual across years, however this doesn't seem to match up or have a recognisable connection when done separately for each year.
Each separate dataset tends to have for example y2_hhid and the previous y1_hhid to help identify (also shown below).
Dataset with three years ID:
Dataset for y2_hhid:
Any help would be much appreciated I think the main thing I'm struggling is successfully cleaning the data first. I have experimented with merge and append but can't seem to get anywhere.
Thanks
Alex
It's come time to combine my data across years for comparison. I have a data set that contains the Household ID and the individual ID for all 3 years (as shown below) however these are string variables and differ across years.
I've tried:
Code:
encode y2_hhid, gen(HHnum2) egen id = concat(HHnum2 indidy2)
Each separate dataset tends to have for example y2_hhid and the previous y1_hhid to help identify (also shown below).
Dataset with three years ID:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int UPI3 str14 y1_hhid byte indidy1 str16 y2_hhid byte indidy2 str8 y3_hhid byte indidy3 1 "55020180210078" 1 "5502018021007801" 1 "3924-001" 1 2 "55020180210068" 1 "5502018021006801" 1 "3922-001" 1 3 "55020180210068" 2 "5502018021006801" 2 "3922-001" 2 4 "55020180210068" 3 "5502018021006801" 3 "3922-001" 3 6 "55020180210068" 5 "5502018021006801" 4 "3922-001" 4 7 "55020180210068" 6 "5502018021006801" 5 "3922-001" 5 8 "55020180210068" 7 "5502018021006807" 4 "3923-001" 4 10 "55020180210068" 9 "5502018021006801" 6 "3922-001" 6 12 "55020180210059" 2 "5502018021005902" 1 "3920-001" 1 13 "55020180210059" 3 "5502018021005902" 4 "3920-001" 4 14 "55020180210059" 4 "5502018021005902" 2 "3920-001" 2 15 "55020180210059" 5 "5502018021005905" 8 "3921-001" 8 16 "55020180210058" 1 "5502018021005801" 1 "3919-001" 1 17 "55020180210058" 2 "5502018021005801" 2 "3919-001" 2 18 "55020180210058" 3 "5502018021005801" 3 "3919-001" 3 19 "55020180210058" 4 "5502018021005801" 4 "3919-001" 4 20 "55020180210058" 5 "5502018021005801" 5 "3919-005" 10 21 "55020180210058" 6 "5502018021005801" 6 "3919-005" 3 23 "55020180210058" 8 "5502018021005801" 7 "3919-001" 5 25 "55020180210054" 1 "5502018021005401" 1 "3918-001" 1 26 "55020180210054" 2 "5502018021005401" 2 "3918-001" 2 27 "55020180210054" 3 "5502018021005401" 3 "3918-001" 3 28 "55020180210054" 4 "5502018021005401" 4 "3918-001" 4 29 "55020180210042" 1 "5502018021004201" 1 "3917-001" 1 30 "55020180210042" 2 "5502018021004201" 2 "3917-001" 2 32 "55020180210034" 1 "5502018021003401" 1 "3916-001" 1 33 "55020180210034" 2 "5502018021003401" 2 "3916-001" 2 34 "55020180210034" 3 "5502018021003401" 3 "3916-001" 3 35 "55020180210034" 4 "5502018021003401" 4 "3916-001" 4 36 "55020180210034" 5 "5502018021003401" 5 "3916-001" 5 end label var UPI3 "Unique Panel Identifier NPSY3" label var y1_hhid "Unique HH Identifier" label var indidy1 "SBMEMNO: MEMBER NUMBER" label var y2_hhid "Unique Year 2 HH ID" label var indidy2 "Individual ID" label var y3_hhid "Unique Household Identification NPS Y3" label var indidy3 "Individual ID"
Dataset for y2_hhid:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str16 y2_hhid str14 y1_hhid byte(indidy2 hh_b02) int hh_b04 "0101014002017101" "01010140020171" 1 1 70 "0101014002017101" "01010140020171" 2 2 60 "0101014002017101" "01010140020171" 3 1 27 "0101014002017101" "01010140020171" 4 1 20 "0101014002017101" "01010140020171" 5 2 8 "0101014002028401" "01010140020284" 1 1 38 "0101014002028401" "01010140020284" 2 2 22 "0101014002028401" "01010140020284" 3 2 4 "0101014002028401" "01010140020284" 4 2 0 "0101014002029701" "01010140020297" 1 1 52 "0101014002029701" "01010140020297" 2 2 46 "0101014002029701" "01010140020297" 3 2 28 "0101014002029701" "01010140020297" 4 1 15 "0101014002029701" "01010140020297" 5 1 12 "0101014002029701" "01010140020297" 6 2 10 "0101014002029701" "01010140020297" 7 1 7 "0101014002029701" "01010140020297" 8 1 2 "0101014002029701" "01010140020297" 9 2 5 "0101014002029701" "01010140020297" 10 2 18 "0101014002029704" "01010140020297" 1 1 . "0101014002029704" "01010140020297" 2 2 35 "0101014002029704" "01010140020297" 3 2 11 "0101014002029704" "01010140020297" 4 2 8 "0101014002029704" "01010140020297" 5 2 5 "0101014002029704" "01010140020297" 6 2 1 "0101014002029704" "01010140020297" 7 2 18 "0101014002029704" "01010140020297" 8 1 23 "0101014002040901" "01010140020409" 1 1 32 "0101014002040901" "01010140020409" 2 2 27 "0101014002040901" "01010140020409" 3 1 8 end label values hh_b02 hh_b02 label def hh_b02 1 "Male", modify label def hh_b02 2 "Female", modify label var y2_hhid "Unique Year 2 HH ID" label var y1_hhid "Full household identification from NPS year 1" label var indidy2 "Individual ID" label var hh_b02 "Sex" label var hh_b04 "How old is [NAME]?"
Any help would be much appreciated I think the main thing I'm struggling is successfully cleaning the data first. I have experimented with merge and append but can't seem to get anywhere.
Thanks
Alex
Comment