Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Things I miss in Mata - Lists/Sets from Python

    Hi
    There are a lot excellent things in Mata.
    But one thing I miss here are more elegant containers with functionality.
    Mata is really really powerfull on vectors and matrices, also better than Python I guess even when using Numpy or similar.
    And Mata do have Asarray, but it lacks the functionality angle as I see it. And it could have more ease in use.

    However there are times when coding where flexible containers with functionality would be nice.
    Forgive me for maybe not using the proper words/concept, I'm self-tought.

    Some of the toughts I've had I've put into the concept List. It is a prototype but I would like your comments on it.
    What it can do is shown here:
    Code:
    :         lst = List()    // Instantiate a List object
    
    :         lst.len()               // It is empty
      0
    
    :         lst.append( (1,4,5,6,3,2,2,3) )         // Append a list of numbers
    
    :         lst.content()           // See the content
           1   2   3   4   5   6   7   8
        +---------------------------------+
      1 |  1   4   5   6   3   2   2   3  |
        +---------------------------------+
    
    :         
    :         while (lst.has_next()) lst.next() // Loop through the content
      1
      4
      5
      6
      3
      2
      2
      3
    
    :         lst.next()              // loop is done
    
    : 
    :         lst.find(4)             // Find the value 4
      2
    
    :         lst.find(0) == J(1,0,.)         // When nothing is found
      1
    
    : 
    :         // Add one or more elements
    :         lst.append(8)
    
    :         lst.append((8, 9))
    
    :         lst.content()
            1    2    3    4    5    6    7    8    9   10   11
        +--------------------------------------------------------+
      1 |   1    4    5    6    3    2    2    3    8    8    9  |
        +--------------------------------------------------------+
    
    : 
    :         // Function tochars returns character a for argument 1, b for 2 etc
    :         function tochars(nbr) return(char(nbr+96))
    
    :         // Apply function tochars to all values in lst
    :         strofreal(lst.content()) \ lst.apply(&tochars())
            1    2    3    4    5    6    7    8    9   10   11
        +--------------------------------------------------------+
      1 |   1    4    5    6    3    2    2    3    8    8    9  |
      2 |   a    d    e    f    c    b    b    c    h    h    i  |
        +--------------------------------------------------------+
    
    : 
    :         // A List of characters/strings
    :         lst2 = List()
    
    :         lst2.append(lst.apply(&tochars()))
    
    : 
    :         lst2.find("b")          // find returns all the positions of b in lst2
           1   2
        +---------+
      1 |  6   7  |
        +---------+
    
    : 
    :         lst2.unique_values()            // Return only unique values
           1   2   3   4   5   6   7   8
        +---------------------------------+
      1 |  a   b   c   d   e   f   h   i  |
        +---------------------------------+
    
    : 
    :         lst2.frequency(("c", "d"))      // The frequency of all (no argument) or a subset of values
           1   2
        +---------+
      1 |  2   1  |
        +---------+
    
    :         lst2.remove("c")        // Remove a single value from lst2
    
    :         lst2.frequency(("c", "d"))      // Frequency ignores non-present values
           1   2
        +---------+
      1 |  0   1  |
        +---------+
    It is of course far more interesting when combined with datasets:
    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . mata:
    ------------------------------------------------- mata (type end to exit) -----------------------------
    :         // Use List on variables
    :         lst = List()
    
    :         lst.append(st_data(., "rep78")')
    
    :         lst.unique_values()
           1   2   3   4   5   6
        +-------------------------+
      1 |  1   2   3   4   5   .  |
        +-------------------------+
    
    :         lst.frequency()
            1    2    3    4    5    6
        +-------------------------------+
      1 |   2    8   30   18   11    5  |
        +-------------------------------+
    
    :         // Set calculations
    :         lst.frequency((4, 5, 6))
            1    2    3
        +----------------+
      1 |  18   11    0  |
        +----------------+
    
    :         lst.union_unique((4, 4, 5, 6, 6))
           1   2   3   4   5   6   7
        +-----------------------------+
      1 |  1   2   3   4   5   6   .  |
        +-----------------------------+
    
    :         lst.intersection_unique((4, 4, 5, 6, 6))
           1   2
        +---------+
      1 |  4   5  |
        +---------+
    
    :         lst.less_unique((4, 4, 5, 6, 6))        
           1   2   3   4
        +-----------------+
      1 |  1   2   3   .  |
        +-----------------+
    
    : end
    So is this just unnecessary? Is something missing? Other comments?

    Have fun

    The code for doing this on a computer near you is attached here: List.do
    Kind regards

    nhb

  • #2
    I've only learned a little Python, but I think the point of a list is that adding and removing elements is done efficiently. However, your append method overwrites the data instead of literally appending to it. Also (some minor things), append should support appending more than one value; remove should only remove the first instance of a value; you need a pop method; next is for iterables, which are not the same as lists (in Python 3, anyway).

    Anyway, I think the details of an efficient implementation are important and difficult to shoehorn in, even with computer scientists working on it. R has tons of packages, but none I've seen has managed to implement lists in the Python sense.

    Comment


    • #3
      Hi Frank
      I'm partly agreeing on your comment on efficiency.
      I haven't tested the efficiency here, so I can't whether it is efficient or not.
      However the time used on coding the same again and again has also something to do with efficiency.

      And I think also things that are done again and again should be made as methods to objects.
      Hence our coding become shorter and more efficient.

      In the best of all worlds objects like lists, sets and dictionaries were already a part of Mata.
      And Stata Corp had secured the efficiency of these methods.

      My point is more that I think that we need this functionality.
      And I think it would be better if it came as part of Stata 15.
      I was trying to show how a concept like lists would help.

      Regarding my append() method actually does what it should:
      Code:
      :         lst = List()    // Instantiate a List object
      :         lst.append( (1,4,5,6,3,2,2,3) )         // Append a list of numbers
      :         lst.content()           // See the content
             1   2   3   4   5   6   7   8
          +---------------------------------+
        1 |  1   4   5   6   3   2   2   3  |
          +---------------------------------+
      
      ...
      
      :         // Add one or more elements
      :         lst.append(8)
      :         lst.append((8, 9))
      :         lst.content()
              1    2    3    4    5    6    7    8    9   10   11
          +--------------------------------------------------------+
        1 |   1    4    5    6    3    2    2    3    8    8    9  |
          +--------------------------------------------------------+
      The next() method is related to iterators in Python, but also to how to loop through self-defined classes.
      Iterators is actually another thing I miss.

      And this is partly why I added the apply() method.
      It is not done quite right, but it was as close as I could get at a first try.

      Thank you for the note on the pop method, but maybe I think I've got the functionality covered by has_next()/next().
      Kind regards

      nhb

      Comment


      • #4
        Niels Henrik Bruun I'd agree - though from a slightly different perspective. I think it'd be nice if there were classes/methods similar to those that exist in Java (e.g., collections, iterators, streams, map/reduce, etc...). It definitely makes it nice when the storage objects allow for greater flexibility (e.g., creating a Map<String, Map<String, Object>> type for example [I use something similar to this for the json serializer I wrote to retain all of the meta data from the dataset in the resulting JSON object). The biggest thing that I think would be nice to have is support for vectorizing methods (e.g., instead of looping over elements, splitting the data object based on values in a vector, applying the method to each chunk of the data, and then combining the result).

        Comment


        • #5
          Niels, I completely agree that Mata could use more data structure classes. Sorely missing is a dynamic array: the fact that adding an element to an existing vector using the , and \ join operators creates an entirely new vector easily leads to performance problems, turning O(n) code to O(n2).

          I also think that exposing the class underlying the asarray*() functions would lead to less verbose code. For those who don't want to delve into object-oriented programming, the asarry*() functions are convenient. But they shouldn't be the only way to access the class.

          On another note, it'd be nice to see an efficient implementation of the macro list operations (union, intersection, etc.) in Mata.

          Comment

          Working...
          X