Hello All -
I am a Stata novice working on my dissertation proposal. My chair left the university, and I do not have other committee members who work with Stata, so I am incredibly grateful in advance for any help provided.
I am constructing a monthly panel dataset using four annual waves of SIPP (Survey of Income and Participation) data at the individual level. My subset is about 435,000 individuals each with 48 monthly observations. The unique identifier is based on the main sample unit identifier (ssuid) and individual within the household (pnum) - requiring 15 characters. If I convert the string variables to numeric - the unique id because something like "1.143e+11", the rounding creates an absurd number of duplicates.
How can I create a unique identifier that will work with xtset?
[CODE]
* Example generated by -dataex-. For more info, type help dataex
clear
input str15 id str12 ssuid str3(shhadid pnum) byte(rrel1 esex trace tage eeduc eed_scrnr ecert monthcode) float new_month byte(eedcred thhldstatus tehc_metro) str2 tehc_st long tjb1_msum
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 1 1 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 2 2 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 3 3 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 4 4 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 5 5 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 6 6 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 7 7 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 8 8 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 9 9 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 10 10 . 2 1 "20" .
I did look at the manuals and forums, but the unique IDs in the examples were shorter, or they used gen id =_n - which would not work for my data setup.
Cheers,
Jaime
I am a Stata novice working on my dissertation proposal. My chair left the university, and I do not have other committee members who work with Stata, so I am incredibly grateful in advance for any help provided.
I am constructing a monthly panel dataset using four annual waves of SIPP (Survey of Income and Participation) data at the individual level. My subset is about 435,000 individuals each with 48 monthly observations. The unique identifier is based on the main sample unit identifier (ssuid) and individual within the household (pnum) - requiring 15 characters. If I convert the string variables to numeric - the unique id because something like "1.143e+11", the rounding creates an absurd number of duplicates.
How can I create a unique identifier that will work with xtset?
[CODE]
* Example generated by -dataex-. For more info, type help dataex
clear
input str15 id str12 ssuid str3(shhadid pnum) byte(rrel1 esex trace tage eeduc eed_scrnr ecert monthcode) float new_month byte(eedcred thhldstatus tehc_metro) str2 tehc_st long tjb1_msum
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 1 1 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 2 2 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 3 3 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 4 4 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 5 5 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 6 6 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 7 7 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 8 8 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 9 9 . 2 1 "20" .
"000114285070101" "000114285070" "011" "101" 99 2 1 68 44 2 2 10 10 . 2 1 "20" .
I did look at the manuals and forums, but the unique IDs in the examples were shorter, or they used gen id =_n - which would not work for my data setup.
Cheers,
Jaime
Comment