I am trying to import a CSV containing ~15 variables at 80GB total. I am using Stata MP on a 512GB desktop, which I would presume is sufficient for a task such as this. Yet, I started the import using an import delimited command over 3 days ago, and it still has not loaded into Stata (still shows importing). Has anyone else ran into an issue like this and, if so, is this the type of situation where I just need to wait longer and it will eventually import, or will I end up waiting forever?
Would anyone have recommendations on how to best tackle situations such as this? My initial thought was to use the rowrange option in import delimited, but I have been told this doesn't actually solve the problem because Stata still has to import the entire dataset into memory before it can filter down to a subset of rows. I've heard that SAS tends to be much more efficient for big data tasks since it doesn't need to read an entire dataset into memory at once, but I do not have any coding knowledge in SAS and am not even sure if I'd have a way to access it. I do have an understanding of Python and R, but I am unsure if either would be of benefit since I believe they also read an entire dataset into memory. Any help would be greatly appreciated!
Would anyone have recommendations on how to best tackle situations such as this? My initial thought was to use the rowrange option in import delimited, but I have been told this doesn't actually solve the problem because Stata still has to import the entire dataset into memory before it can filter down to a subset of rows. I've heard that SAS tends to be much more efficient for big data tasks since it doesn't need to read an entire dataset into memory at once, but I do not have any coding knowledge in SAS and am not even sure if I'd have a way to access it. I do have an understanding of Python and R, but I am unsure if either would be of benefit since I believe they also read an entire dataset into memory. Any help would be greatly appreciated!
Comment