Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New on SSC: dataframe - store multiple datasets in memory concurrently

    Thanks to Kit Baum, the dataframe package is now available on SSC.

    Description
    dataframe is intended to bypass some of the limitations of only having one dataset in memory at a time. dataframe is similar to preserve, with the key differences being:
    1. dataframe stores the dataset in memory, rather than to disk
    2. multiple dataframes may exist in memory at the same time
    3. subsets of data may be stored, rather than the whole dataset
    To install, run:
    Code:
    ssc install dataframe
    I've included this in the mata forum, since the command is just a Stata wrapper on a mata function. I'd be interested to hear if anyone has any comments on coding efficiency, etc (see the ado file for source code). In particular, one challenge I had was storing value labels. I defined a struct, vallabelstruct with members pointer(string vector) vector text and pointer(real vector) vector vals. That is, if vallabelstruct were a vallabelstruct scalar, vallabelstruct.text would have length equal to the number of value labels in the dataset and the length of each vector being pointed to would be the number of entries for that particular value label. I want to use st_vlload() to populate vallabelstruct. I was hoping to be able to use syntax like st_vlload("mylabelname",*vallabelstruct.vals,*vall abelstruct.text), however this syntax is won't work. Does anyone have any ideas?

    Thank you,

    Andrew Maurer
    Last edited by Andrew Maurer; 04 Nov 2014, 10:12.

  • #2
    This package sounds extraordinarily useful, but I am having some trouble finding and installing the package with the code provided above. Has the name or location of the package been changed?

    Comment


    • #3
      Andrew Maurer Although not a Mata solution, this is something I've been planning to address from the Java side. I've made a few minor updated to the Java library recently, but you can find examples of different Java objects I've built to deal with this at https://github.com/wbuchanan/StataJavaUtilities. Given the way the Java API works, I have a Variables class and Observations class that both store meta data about the data set in memory (e.g., variable labels, value labels, variable names, value label names, indicators for string types, etc...). There is a Meta class which is used to initialize each of those two classes and store all of the meta data in a single object which is then used to create the DataSet class. The DataSet class stores all of the metadata and all of the data available in the dataset in memory. The version in this library lacks any of the Jackson JSON annotations, but was what I used to serialize a Stata dataset into a JSON object. Although there are methods to persist the data structures from Mata - to some degree at least - I think one of the advantages of moving something like this into Java is the ability to leverage different java persistence libraries and use lighter weight SQL solutions like H2/Hyper SQL to leverage SQL within the context of Stata a bit more natively (e.g., create a Java object from reading a Stata data file into the JVM, push it into the SQL backend and then an unlimited number of data objects could be accessible and can also be managed using standard SQL that would add additional flexibility for data management). It isn't completely related, but I figured if you would be interested in trying to do something like this it would probably be easier to get something up and running sooner.

      Comment


      • #4
        Originally posted by Peter Bahr View Post
        This package sounds extraordinarily useful, but I am having some trouble finding and installing the package with the code provided above. Has the name or location of the package been changed?
        I case the author doesn't see your post, did you try emailing him at the address listed here: https://ideas.repec.org/c/boc/bocode/s457933.html?

        Comment


        • #5
          Originally posted by Friedrich Huebler View Post

          I case the author doesn't see your post, did you try emailing him at the address listed here: https://ideas.repec.org/c/boc/bocode/s457933.html?

          Thank you, Friedrich. I just did as you suggested.

          Comment

          Working...
          X