Hi all,
I was wondering if anyone has found a good soloution to working with very large integers as ID variables of variable length.
Background,
Working with a national medical database and they have a key (EPIKEY) which is 19 Numeric characters long.
It is stored in a SQL database as numeric((20,0),null)
This variable should be unique, when you read it directly using odbc load it reads as "double".
Unfortunately, this makes the variable non-unique, i am assuming precision issues.
So i recast this to "BIGINT" and alias it and then it reads as string.
This seems like an ugly and inefficient workaround.
20bytes is quite an expensive storage.
An equally ugly workaround would be
But this 10 bytes would be a lot more efficent, but who wants to work with 3 variables when you can have 1.
However, splitting the keys like this unfortunately doesn't work as the ID's are not always the same length.
and if the split occurs on "001" this is converted to "1" when stored as a long.
Stata have this helpful comment on there help file
But now I think we need some help on holding more than 16 digits of accuracy.
It seems like i am stuck with a string for the moment.
bw
Adrian
I was wondering if anyone has found a good soloution to working with very large integers as ID variables of variable length.
Background,
Working with a national medical database and they have a key (EPIKEY) which is 19 Numeric characters long.
It is stored in a SQL database as numeric((20,0),null)
This variable should be unique, when you read it directly using odbc load it reads as "double".
Unfortunately, this makes the variable non-unique, i am assuming precision issues.
So i recast this to "BIGINT" and alias it and then it reads as string.
Code:
odbc load, exec("SELECT * , CAST(EPIKEY AS BIGINT) AS EPIKEY_STR FROM .....
20bytes is quite an expensive storage.
An equally ugly workaround would be
Code:
gen long epikey_a = real(substr(epikey,1,9)) gen long epikey_b = real(substr(epikey,10,9)) gen byte epikey_c = real(substr(epikey,19,1))
However, splitting the keys like this unfortunately doesn't work as the ID's are not always the same length.
and if the split occurs on "001" this is converted to "1" when stored as a long.
Stata have this helpful comment on there help file
If you are storing identification numbers, the rounding could matter. If the identification numbers are integers and take 9 digits or less, store them as
longs; otherwise, store them as doubles. doubles have 16 digits of accuracy.
longs; otherwise, store them as doubles. doubles have 16 digits of accuracy.
It seems like i am stuck with a string for the moment.
bw
Adrian
Comment