I have an ado program that merges (in a complicated way, not just the "merge" command) two large data sets, and I'd like the program to perform type checking on each dataset before beginning the merge. Specifically, given a variable name and a "using" dataset, I'd like to get the data type of that variable in that dataset. For the dataset in memory, this is easily done with macro extended functions, but for a dataset stored on disk, I can't find a way to do this without manually parsing the dta format.
Obviously, this information is available to Stata without loading the dataset because it's built into the dta format and commands like describe can access it. I don't want to load a 300 GB dataset into memory just to check the data types, and I'd prefer not to have to parse the text of "describe using ..." manually.
Obviously, this information is available to Stata without loading the dataset because it's built into the dta format and commands like describe can access it. I don't want to load a 300 GB dataset into memory just to check the data types, and I'd prefer not to have to parse the text of "describe using ..." manually.
Comment