I suppose this is simultaneously a feature request for Stata and a request for comment on the best solution that works in current versions of Stata.
Context:
I'm working with a very large panel dataset with many individuals. I receive the dataset from a data provider with individuals identified using a variable containing a 64-bit integer.
Issue:
Long-term request:
Context:
I'm working with a very large panel dataset with many individuals. I receive the dataset from a data provider with individuals identified using a variable containing a 64-bit integer.
Issue:
- Stata's data types (help data_types) support 32 bit integers (long). But Stata does not support variables containing 64 bit integers (which are often called bigint in other software).
- Given the size of the dataset, I would like to store this 64 bit integer in a manner that is both space efficient and computationally efficient.
Long-term request:
- It would be fantastic if Stata could add a new 64 bit integer variable type.
- It would also be nice to have a binary display format for string variables, akin to %16H / %16L / %8H / %8L.
- A bad solution. Since Stata supports 64-bit floating point variables, I considered converting the 64-bit integer to a double and displaying it using the %16H format.
- However, since there is a large space of values that map to missing, I will lose a significant portion of the 64-bit domain.
- Split the 64-bit integer into two separate 32-bit integers, and store each one as a long variable.
- This solution generates two 4 byte variables, storing the result in 8 bytes.
- Convert the 64-bit integer into its base256 encoding, which can be stored as an 8-character ASCII string in a str8 variable.
- This solution generates a single 8 byte variable, although it's not human readable when displayed as an ASCII string. Unfortunately I don't believe Stata supports a binary format like %16H for strings.
Comment