Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with noninteger matrix subscripts

    I have a matrix called "sgrid" that is 6993 x 3.

    Within this matrix, I need to search for rows that correspond to a certain criterion. I have a function that spits out a vector of row indices called "tidx". Here is tidx:

    Code:
    : tidx
              1
        +--------+
      1 |   967  |
      2 |  1597  |
      3 |  2227  |
        +--------+
    Unfortunately, tidx is not an integer. Most unfortunately, the matrix accepts it as a valid subscript vector, but returns the wrong rows. Behold:

    Code:
    : sgrid[tidx,.]
             1     2     3
        +-------------------+
      1 |  .01     0    .1  |
      2 |  .02     0    .1  |
      3 |  .03     0    .1  |
        +-------------------+
    
    : sgrid[round(tidx),.]
             1     2     3
        +-------------------+
      1 |  .01     1     0  |
      2 |  .02     1     0  |
      3 |  .03     1     0  |
        +-------------------+
    To be precise, sgrid[tidx,.] returns rows 966, 1596 and 2226.

    I have wasted innumerable hours because of the cognitive dissonance created by this. I could blame myself for not forcing tidx to be integer, but how much should I really blame myself?
    1) Why should I care it's integer when it's right?
    2) Why doesn't the matrix tell me that tidx is not integer and just refuse to take it as a subscript?
    3) If the matrix has a good reason to take a noninteger subscript, why does it use "floor()" instead of "round()"?
    4) How was I even supposed to know about all this?

    I am not just upset. I really would like an answer to all four questions.

  • #2
    In your example tidx looks to me like 3 integers so it is hard to understand your (apparent) claim that it is not.

    But faced with any quantity offered as subscript that is integer + fraction, Stata and Mata round down as if floor() was in the code, which is exactly what you report. That is a long-standing rule.

    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . l mpg in 1/3
    
         +-----+
         | mpg |
         |-----|
      1. |  22 |
      2. |  17 |
      3. |  22 |
         +-----+
    
    . di mpg[0.999]
    .
    
    . di mpg[1.999]
    22
    I know this and I presume it's documented somewhere, else how would I know this, but at this moment I can't remember where I read this. The documentation is of the order of 10,000 pages long and from several points of view incomplete. That is why it's a good tactic for programmers to carry out simple experiments as above to work out what is happening.

    Otherwise your questions are interesting, although I don't think you give enough explanation for us to understand 1).

    I can't explain (the history of) exactly why Stata indulges fractional subscripts in this way.

    I'd start with the idea that you clearly know that your subscripts aren't integers, so you should perhaps be surprised that Stata or Mata accepts them at all.


    Last edited by Nick Cox; 01 Nov 2015, 05:33.

    Comment


    • #3
      Dear Nick,

      thanks for your reply. I'll try to explain better.

      I did not know my subscripts are not integers. I have come to the conclusion they are nonintegers (even though they appear to be integers, as you point out) precisely because of what is happening here.

      The function that produces these row numbers is declared "real colvector", because there is no such thing as an "integer" type in Mata. However, the row numbers I get are supposed (ignoring numerical representation issues) to be integers. The actual numbers as you can see are so close to integers that Mata decides to represent them on-screen as integers. (I suppose that there is a tolerance of 10^-17 or so, and we are within that tolerance). However, when used as matrix indices, the numbers turn out to be an epsilon smaller than the exact integer, and using the floor() function means I get one row above the row I want.

      Does this make sense? It's hard to know if I'm forgetting to say something that's obvious to me so please ask for clarifications if necessary.

      Some more detail about the function: these row numbers are obtained by
      a) Start from some three dimensional noninteger coordinates such as (-0.05, 1, 0.02)
      b) Using division and subtraction, turn those into three dimensional integer grid coordinates such as (1, 2, 7).
      c) Using addition and multiplication, turn the grid coordinates into a row number (each row of sgrid is a point on a three dimensional grid).

      I know that the division in (b) must yield integers because I start from a point that I know to be a grid node. Then if (b) has integers, (c) must too because I'm just doing addition and multiplication. As you can see, the outcome is indeed integer, up to numerical precision.

      Let me restate that this whole problem, once understood, has a quick fix: I just need to round() the output of my function before I return() it. However, this is one of those things that one can't possibly imagine... I am not even sure that I'll realize I'm having this problem the next time I run into it.

      How hard would it be to require integer subscripts in mata?
      Last edited by Mattia Landoni; 01 Nov 2015, 11:55.

      Comment


      • #4
        Say rather that (b) should yield integers.

        Code:
        help precision
        for sources with much more discussion.

        Comment


        • #5
          Dear Nick,

          thank you again for your response.

          The problem is not the one described in "help precision", i.e., that Mata calculates 4/2 = 1.999999999999, which is not equal to 2. The problem is that Mata represents "1.999999999999" as "2" on the screen, but understands that as "1" when used as a matrix subscript.

          Call me spoiled, but as a programmer, I don't want to deal with this. A good programmer will run little experiments to make sure everything behaves as intended, but a good programming language will behave as intended to the extent possible.

          I couldn't say it clearer than this. I am still hoping someone here will explain to me why must Mata behave like this, and if there is no reason for it, I hope someone from Statacorp will let me know that they recognize it's an issue, and that they will do something about it, ideally they will change the interpreter so that it complains about noninteger subscripts and stop execution.

          Comment


          • #6
            This still looks to me like a precision issue, insofar as you are getting non-integer results where you expect integers. That is the name of the problem; exactly how it arises is very much at issue.

            If there's another explanation, then it is a bug in code you don't show us, and we can't comment usefully.

            I think Mata does behave as "intended" by StataCorp. It's just that (a) decimal display formats for binaries (b) rounding down non-integer subscripts combine to give you puzzling results. The choice of what happens under (b) is arbitrary but I think Stata has remained consistent on it throughout its history.

            I agree that (unless you show more code) only an official word from StataCorp will push this further.


            Comment

            Working...
            X