Understanding structs

Niels Henrik Bruun

Join Date: Aug 2014
Posts: 552

Understanding structs

12 Apr 2015, 22:33

Hi
I was testing some sample code on structs:

Code:

:         mata clear

: 
:         struct twopart {
>                 real scalar n1, n2
>         }

:         
:         struct twopart scalar new_twopart(val)
>         {
>                 struct twopart scalar t
>                 t.n1 = 1
>                 t.n2 = val
>                 return(t)
>         }

: 
:         struct twopart scalar tg
nothing found where '(' expected
r(3000);

:         tg=new_twopart(5)

:         liststruct(tg)
1  structure of 2 elements
1.1  1 x 1 real = 1
1.2  1 x 1 real = 5

:         tg.n1
type mismatch:  exp.exp:  transmorphic found where struct expected
r(3000);

: end

Why is it I get the error above when I try to refer to tg.n1?
As I read the documentation it should be doable
Using liststruct I can see that the structure has values.

I've tried moving "struct twopart scalar tg" around in the code without luck. It seems like it is unnecessary.

Kind regards

nhb

Tags: None

Matthew J. Baker

Join Date: Mar 2014

Posts: 126
#2

14 Apr 2015, 07:38

Niels --

I think you've run into one of the major differences between Mata and languages like Python, and I'm glad you asked this question because I'm curious if anyone has a better solution. My understanding (I could be wrong) is that components of structures can only be accessed inside of functions in Mata, so you can neither access them on the fly nor define them on the fly (as one can in Python). So, my understanding is that things like:

Code:

N1=t.n1

are only meaningful inside of functions in Mata.

My quick and dirty solution to this problem has been to just write a function that returns chosen values. Something like::

Code:

mata: transmorphic twopart_return(struct twopart t, string scalar obj) { if (obj=="n1") return(t.n1) if (obj=="n2") return(t.n2) } tpn1=twopart_return(t,"n1") tpn2=twopart_return(t,"n2")

This has always seemed to me a somewhat clumsy workaround, and I was wondering if anyone has a better idea! Of course, it would be great if in the future Mata were to allow one to access features of a structure on the fly!

Matt
Comment

Niels Henrik Bruun

Join Date: Aug 2014
Posts: 552

14 Apr 2015, 12:51

Hi Matthew
Thank you very much.
You're absolutely right:

Code:

:         struct twopart {
>                  real scalar n1, n2
>         }

:         struct twopart scalar new_twopart(val)
>         {
>                  struct twopart scalar t
>                  t.n1 = 1
>                  t.n2 = val
>                  return(t)
>         }

:         function struct_display(struct twopart scalar t)
>         {
>                 t.n1
>                 t.n2
>         }

:         tg=new_twopart(5)

:         liststruct(tg)
1  structure of 2 elements
1.1  1 x 1 real = 1
1.2  1 x 1 real = 5

:         struct_display(tg)
  1
  5

But if I instead use the class definition as a struct then the code works properly:

Code:

:         class twopart {
>                  real scalar n1, n2
>         }

:         class twopart scalar new_twopart(val)
>         {
>                  class twopart scalar t
>                  t.n1 = 1
>                  t.n2 = val
>                  return(t)
>         }

:         tg=new_twopart(5)

:         tg.n1
  1

:         tg.n2
  5

Which in the end leaves the questions: Why do we need structs? Why not just use classes?

Last edited by Niels Henrik Bruun; 14 Apr 2015, 12:54.

Kind regards

nhb

Comment

Bill Gould (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 15
#4

14 Apr 2015, 13:56

Mata is a compiled language and, because of that, structures simply don't work interactively.

First let me show that structures do work and then I'll explain how that statement above predicts that structures do not work interactively.

Code:

: struct twopart { > real scalar n1, n2 > } : : struct twopart scalar newtwopart(val) > { > struct twopart scalar t > > t.n1 = 1 > t.n2 = val > return(t) > } : : function proveit() > { > struct twopart scalar myt > > myt = newtwopart(2) > printf("myt.n1 = %g\n", myt.n1) > printf("myt.n2 = %g\n", myt.n2) > } : : proveit() myt.n1 = 1 myt.n2 = 2

And yet, If I try to do the same thing interactively, it does not work:

Code:

: myt = newtwopart(2) : printf("myt.n1 = %g\n", myt.n1) type mismatch: exp.exp: transmorphic found where struct expected r(3000);

When you type an interactive statement such as

Code:

: myt = newtwopart(2)

Mata acts as if you typed

Code:

function secret() { external myt myt=newtwopart(2) } secret() mata drop secret()

And if you go back to the interactive log above, you'll find that worked fine.

Next we typed printf("myt.n1 = %g\n", myt.n1) interactively, and that became,

Code:

function secret() { external myt printf("myt.n1 = %g\n", myt.n1) } secret() mata drop secret()

This line did not work interactively, and the line did not work because the line external myt needed to be external struct twopart myt. That is, when the Mata compiler gets to compiling the line myt.n1, Mata does not access myt to find out what myt is.

Stata works differently. If you typed gen y = x[_n-1], Stata does access x and so discovers that x is, say, a numeric variable. Stata is an interpreter. Mata is a compiler.

Mata does not assume that any externals (globals) even exist at the time of compile and, even if they do exist, Mata doesn't look at them. Mata goes by what is declared and assumes anything that is not declared is a transmorphic matrix. In this case, myt is declared external, and that means external transmorphic matrix. Coding myt.n1 makes no sense with transmorphic matrices, external or otherwise. If myt had been declared external struct twopart scalar, however, then myt.n1 would have made sense.

Interpreters like Stata are more convenient than compilers like Mata, but interpreters execute more slowly.

Let's consider an interpreter looking at myt.n1:
I see myt.n1. Parse it. It becomes myt, <dot>, n1.

I see myt. Parse it. It's a name.

Look up the name. A search of memory of occurs, and ultimately the interpreter learns that myt is a struct twopart scalar.

I see n1. Parse it. It's a name.

Okay. So I'm seeing struct twopart scalar myt, followed by a <dot>, followed by a name. That makes sense.

Does struct twopart contain an element named myt? Yes.

Where is myt in struct twopart? It's at offset 0.

Find the struct twopart named myt. Good. Now add 0 to it's address. Now copy 8 bytes.

In a compiler, the above becomes:
I see myt.n1. Parse it. It becomes myt, <dot>, n1.

I see myt. Parse it. It's a name.

Look up the name in the program's declarations. A search of memory of occurs, and ultimately the compiler learns that myt is a struct twopart scalar.

I see n1. Parse it. It's a name.

Okay. So I'm seeing struct twopart scalar myt, followed by a <dot>, followed by a name. That makes sense.

Does struct twopart contain an element named myt? Yes.

Where is myt in struct twopart? It's at offset 0.

Okay, record the following in the compiled code: Copy 8 Thatbytes from address of myt plus 0.

The work performed by the interpreter and the compiler is nearly identical. In lines 1 to 7, the code works to "understand" the line. In line 8, the code does something with this understanding. The interpreter uses its understanding to carry out the fetching of the value of myt.n1. The compiler uses its understanding to record the code to fetch the value of myt.n1. Either way, the last step of fetching the value takes only a few machine cycles.

There's an advantage, however, to the compiler's approach. Pretend the line myt.n1 occurs inside a loop being executed 1,000 times. The interpreter repeats all 8 steps in each execution of the loop. The compiler, however, merely recorded what was to be done, and at execution time, only a few cycles are consumed fetching the value each trip through the loop. Now understand that this speed advantage accumulates for every line of the program!

Separation of compilation from execution, however, means that the compiler cannot make assumptions that the value of myt, even it is even defined, is what it will be in the future, when execution occurs, and because the default assumption of transmorphic matrix is not sufficient to access the elements of a structure, myt must be explicitly declared.

Last edited by Bill Gould (StataCorp); 14 Apr 2015, 14:10.
1 like
Comment
Niels Henrik Bruun

Join Date: Aug 2014

Posts: 552
#5

15 Apr 2015, 07:39

Hi Bill
Thank you very much for the explanation.
However I do not think that your first code example matches my first in #3.

You define myt inside the function proveit. And hence (if the scope rules are as I expect) myt dies with the function.

Me on the other hand passes the struct from function new_twopart into a variable tg.
What is inside function new_twopart dies due to scope rules, but I should keep the struct in variable tg.
And it is there because I see it using liststruct (Just as i #1) and because I can pass it as an argument to function struct_display.
Finally inside function struct_display I can refer to t.n1 and t.n2.
But as shown in #1 I can nor refer to tg.n1 outside a function.

It might well be that this has to be due to design of the interpreter/compiler, but it isn't logical.

What is even more puzzling is that if I use a class definition to implement a struct then there is no problem as shown in #3, example 2.

Conceptually (to me anyway) one could call a class for a struct with methods added, so I would expect them to behave alike.
So why not just stick to classes and drop structs?
Am I missing something?

Kind regards

nhb
Comment

Andrew Maurer

Join Date: Apr 2014
Posts: 28

15 Apr 2015, 17:15

Originally posted by Bill Gould (StataCorp) View Post

...

Mata does not assume that any externals (globals) even exist at the time of compile and, even if they do exist, Mata doesn't look at them. Mata goes by what is declared and assumes anything that is not declared is a transmorphic matrix. In this case, myt is declared external, and that means external transmorphic matrix.

...

mata describe seems to indicates that myt is a struct scalar and not a transmorphic matrix. See below.

Code:

. mata
------------------------------------------------- mata (type end to exit) ----
: mata clear

: 
: struct twopart {
>         real scalar n1, n2
> }

: 
: struct twopart scalar new_twopart(val)
> {
>         struct twopart scalar t
>         
>         t.n1 = 1
>         t.n2 = val
>         return(t)
> }

: 
: myt = new_twopart(2)

: printf("myt.n1 = %g\n", myt.n1)
type mismatch:  exp.exp:  transmorphic found where struct expected
r(3000);

: end
------------------------------------------------------------------------------

. 
. mata mata describe

      # bytes   type                        name and extent
------------------------------------------------------------------------------
> -
          164   struct scalar               new_twopart()
          116   structdef scalar            twopart()
            8   struct scalar               myt
------------------------------------------------------------------------------
> -

. 
. exit

Comment

Niels Henrik Bruun

Join Date: Aug 2014

Posts: 552
#7

16 Apr 2015, 04:04

I'm liking mata describe more and more

Kind regards

nhb
Comment

Niels Henrik Bruun

Join Date: Aug 2014
Posts: 552

28 May 2015, 06:36

Hi
I've been reading thinking and testing. And it is (almost) all in the documentation and in Bill's comment #4.

Below is some code I use in my examples:

Code:

:         mata clear
:       
:         struct twopart_s
>         /* A simple struct definition */
>         {
>                 scalar n1, n2
>         }
:
:         struct twopart_s scalar newtwopart_s(val)
>         /* A function that construct and returns a struct */
>         {
>                 struct twopart_s scalar t
>
>                 t.n1 = 1
>                 t.n2 = val
>                 return(t)
>         }
:
:         function show_s(struct twopart_s scalar myt)
>         /* A function taking a struct as an argument */
>         {
>                 printf("show_s: myt.n1 = %g\n", myt.n1)
>                 printf("show_s: myt.n2 = %g\n", myt.n2)
>         }
:        
:         class twopart_c
>         /* A simple class definition, similar to the struct */
>         {
>                         real scalar n1, n2
>         }
:
:         class twopart_c scalar newtwopart_c(val)
>         /* A function that construct and returns a class */
>         {
>                         class twopart_c scalar t
>
>                         t.n1 = 1
>                         t.n2 = val
>                         return(t)
>         }
:
:         function show_c(class twopart_c scalar myt)
>         /* A function taking a class as an argument */
>         {
>                 printf("show_c: myt.n1 = %g\n", myt.n1)
>                 printf("show_c: myt.n2 = %g\n", myt.n2)
>         }

From [M-2] declarations:

The purpose of declarations
Declarations occur in three places: in front of function definitions, inside the parentheses defining the function's arguments, and at the top of the body of the function, defining private variables the function will use.

>
and

If eltype is not specified, transmorphic is assumed.
If orgtype is not specified, matrix is assumed.

>

This is the design.

So You can not declare variables outside functions. All variables outside functions are implicitly declared as a transmorphic matrix.
Hence no struct or class can be declared outside a function.
But they can be passed as output from a function (and maybe then they might be referable and accessible).

Both structs and classes can be defined and refered to inside a function.
And they can be passed to a variable which becomes a struct scalar or class scalar respectively.

Code:

:         s = newtwopart_s(2)
:         s
  0x9afbff0
:         eltype(s), orgtype(s)
            1        2
    +-------------------+
  1 |  struct   scalar  |
    +-------------------+

:         c = newtwopart_c(3)
:         c
  0x9afb9c0
:         eltype(c), orgtype(c)
            1        2
    +-------------------+
  1 |   class   scalar  |
    +-------------------+

: mata describe

      # bytes   type                        name and extent
-------------------------------------------------------------------------------
          168   class scalar                newtwopart_c()
          168   struct scalar               newtwopart_s()
          192   transmorphic matrix         show_c()
          192   transmorphic matrix         show_s()
          260   classdef scalar             twopart_c()
          116   structdef scalar            twopart_s()
            8   class scalar                c
            8   struct scalar               s
-------------------------------------------------------------------------------

:         liststruct(s)
1  structure of 2 elements
1.1  1 x 1 real = 1
1.2  1 x 1 real = 2

s and c above are probably a sort of pointers since their eltype by definition is transmorhic.
Note that their eltype are struct and classes in general, not the specific ones defined, ie in this case twopart_s and twopart_c and passed to them.
Also note that -liststruct- accepts s as a struct argument.

The variables s and c can be used as argument for functions and used therein:

Code:

:         show_s(s)
show_s: myt.n1 = 1
show_s: myt.n2 = 2

:         show_c(c)
show_c: myt.n1 = 1
show_c: myt.n2 = 3

So classes and structs can be put in variables outside a function.
However they can not be looked into or modified outside a function:

Code:

. capture mata s.n1
. display _rc
3000

. capture mata c.n1
. display _rc
3000

Classes can after instantation be both seen and accessed outside a function, so they are not as limited as struct in usage:

Code:

:         myt = twopart_c()
:         myt.n1 = 12
:         myt.n2 = 42
:         printf("myt.n1 = %g\n", myt.n1)
myt.n1 = 12
:         printf("myt.n2 = %g\n", myt.n2)
myt.n2 = 42

.
So far so good. Everything is like Bill and the documentation says.

But here is the puzzle that I based my comment #3 and #5 upon:

Code:

:         mata clear
:
:                 class twopart_c {
>                         /* A simple class definition  similar to the struct */
>                                 real scalar n1, n2
>                 }
:
:                 class twopart_c scalar newtwopart_c(val)
>                 {
>                                 class twopart_c scalar t
>
>                                 t.n1 = 1
>                                 t.n2 = val
>                                 return(t)
>                 }
:
:         tg = newtwopart_c(5)
:         tg.n1
  1
:         tg.n2
  5

Strangely enough if I only define a class then I can refer to the content outside functions.
Somehow the instantation is done when assigning the output of function newtwopart_c to variable tg.
And hence the content of tg is referrable.

But this is very strange because this is the exact same code as in the example above.
So when no structs are defined AND used (it is not enough just to define a struct) then you'll also be able to refer to the content of a class outside a function.

Kind regards

nhb

Announcement

Understanding structs

Comment

Comment

Comment

Comment

Comment

Comment

Comment