Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing sample of large DTA without loading the whole dataset

    I need to import a sample from a .dta file that is too large (67 GB) to load all at once. Let’s say I want to import a random sample of 1% observations from this dataset. Is there any way to do this?

    I cannot use:
    Code:
     use large_file.dta, clear sample 1%, count
    Because the program will crash on the first line.

  • #2
    -use- has both and "if" and an "in" option so, yes, you can load part of it (including loading only a subset of the variables if you wish); if you need to know the actual size, you can use -describe- to find out using the second syntax in the help file; see
    Code:
    h describe
    h use

    Comment


    • #3
      Here is how you use Rich's advice to get a(n approximate) 1 % sample of the observations:

      Code:
      use "large_file.dta" if uniform() < .01

      Comment

      Working...
      X