Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logic for the behavior of inrange(z,a,b)

    I have a question about the logic behind the behavior of inrange(z,a,b) when a, b, or z is missing. My question is just out of curiosity.

    The manual states the following:
    The following ordered rules apply:
    z > . returns 0.
    a > . and b = . returns 1.
    a > . returns 1 if z < b; otherwise, it returns 0.
    b > . returns 1 if a < z; otherwise, it returns 0.
    Otherwise, 1 is returned if a < z < b.
    If the arguments are strings, "." is interpreted as "".
    This seems like very unusual behavior. Why does it follow these rules? Is it a consequence of the implementation being simple/fast or was it written to produce these outcomes as part of some intended logical boolean conditions in some common workflow? If its the latter, what are those logical conditions/workflow?

  • #2
    Well, only the people at StataCorp know for sure. But here's my guess. As you know, missing values in Stata are treated as being greater than any non-missing value by the <, <=, >, >= operators and the -sort- command. But as a result, you can't test for whether a variable is, say, >= 2 by saying:
    Code:
    whatever if z >= 2
    if you don't want to include missing values of z. So, before -inrange()- came along, we always had to code this sort of thing as:

    Code:
    whatever if z >=  2 & !missing(z)
    

    or some equivalent to that. It's cumbersome since, in practice, it is probably more frequent that people don't want to act on missing values of z when they want to act on values of z >= 2. The logic of -inrange()- gives you this convenience of allowing you to express the desired range of values without having to tack on an extra clause about missings. See:
    [code]
    . clear

    . input float z

    z
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. .
    7. end

    . count if z >= 2
    4

    . count if inrange(z, 2, .)
    3

    .
    /[code]

    Comment


    • #3
      Something was lost in the copying of the help inrange() entry. Here's what it should look like:

      inrange(z,a,b)
      Description: 1 if it is known that a ≤ z ≤ b; otherwise, 0

      The following ordered rules apply:
      z ≥ . returns 0.
      a ≥ . and b = . returns 1.
      a ≥ . returns 1 if z ≤ b; otherwise, it returns 0.
      b ≥ . returns 1 if a ≤ z; otherwise, it returns 0.
      Otherwise, 1 is returned if a ≤ z ≤ b.
      In terms of notation, . indicates a Stata system missing value. Stata has an additional 26 extended missing values ordered as:
      Code:
      all nonmissing numbers  <  .  <  .a  <  .b  < ...  <  .z
      so the help entry uses ≥. to indicate any missing value.

      The inrange() function follows the rules for real intervals specified by endpoints and missing values allow for open bounds. The first ordered rule states that if z is missing, it is not a number and therefore the function will return false no matter the specified bounds. If z is not missing, the second rule states that if both bounds are open, the function returns true. The next two rules indicate what happens if the interval is left-open or right-opened. The final rule restates what you would expect with a bounded interval.

      Careful readers will note that the second "ordered rule" should be a ≥ . and b. returns 1. Here's an example:
      Code:
      . dis inrange(1, .c, .b)
      1

      Comment


      • #4
        Thanks Robert Picard. I found that helpful.

        Comment


        • #5
          Yes, I was just bitten by this strange logic.
          I had simulated a bunch of data with confidence intervals and wanted to check coverage, so I applied
          covered = inrange(true value, lower CI, upper CI)
          It wasn't until some time later that I realised that one of the statistical methods wasn't converging all the time, so I had a few instances where the CIs were missing.
          Unfortunately for me, that meant that covered = 1 if there were no confidence intervals (rather than 0 as I would have expected).
          e.g. true = inrange(2,.,.)
          Won't make that mistake again!
          I still think that it's bizarre that it didn't give either a 0 or a missing for that situation!

          Comment


          • #6
            The subtle logic here is that if -a- and -b- evaluate to missing (.) then the interpretation is to evaluate whether -z- is a real number.

            Comment

            Working...
            X