Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • &, +, and logical idiosyncrasies

    I don't really have a data question, but I'm quite fascinated about a recent thing I learned.
    Code:
    sysuse auto, clear
    
    
    cls
    
    { // Same
    br if rep + head ==4
    
    br if rep ==4 & head ==4
    }
    
    
    br if rep & head ==4 // Different
    The addition sign is a shorthand way of doing the second browse command, but when I use the & symbol, the command produces different results.


    I know it isn't a very big deal, but why is this? I thought this was the coolest thing since fire when I saw this for the first time. Anyone else code like this or have thoughts about this?

  • #2
    Good question. Stata tells us the order of evaluation for operations at -help operator-. We find that logical comparisons are always evaluated last. To demonstrate that in fact, all 3 conditions are different, I have created a toy example based on your conditions. NB that for all intents and purposes, Stata treats any non-zero number as "true" for logical comparisons, which is the same as being evaluated as 1. So I use just three values for the example.

    Code:
    clear
    input byte(a b)
    0 0
    0 1
    1 0
    1 1
    . 0
    . 1
    0 .
    1 .
    end
    
    gen byte one = a + b ==1    // addition evaluated first, then the equality
    gen byte two = a ==1 & b ==1  // evaluated as a proper logical expression: A AND B
    gen byte three = a & b ==1  // nested evalution as A & (B==1)
    Result

    Code:
         +---------------------------+
         | a   b   one   two   three |
         |---------------------------|
      1. | 0   0     0     0       0 |
      2. | 0   1     1     0       0 |
      3. | 1   0     1     0       0 |
      4. | 1   1     0     1       1 |
      5. | .   0     0     0       0 |
      6. | .   1     0     0       1 |
      7. | 0   .     0     0       0 |
      8. | 1   .     0     0       0 |
         +---------------------------+

    Comment


    • #3
      I wouldn't expect those first two to produce the same results. And on my setup, all three commands give different results. Rather than listing the details, I'll just show the counts:
      Code:
      . sysuse auto, clear
      (1978 automobile data)
      
      . summ rep head
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
             rep78 |         69    3.405797    .9899323          1          5
          headroom |         74    2.993243    .8459948        1.5          5
      
      . tab headroom
      
         Headroom |
            (in.) |      Freq.     Percent        Cum.
      ------------+-----------------------------------
              1.5 |          4        5.41        5.41
              2.0 |         13       17.57       22.97
              2.5 |         14       18.92       41.89
              3.0 |         13       17.57       59.46
              3.5 |         15       20.27       79.73
              4.0 |         10       13.51       93.24
              4.5 |          4        5.41       98.65
              5.0 |          1        1.35      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      . count if rep + head == 4
        3
      
      . count if rep == 4 & head == 4
        5
      
      . count if rep & head == 4
        10
      
      . tab rep head
      
          Repair |
          record |                                     Headroom (in.)
            1978 |       1.5        2.0        2.5        3.0        3.5        4.0        4.5        5.0 |     Total
      -----------+----------------------------------------------------------------------------------------+----------
               1 |         1          1          0          0          0          0          0          0 |         2
               2 |         0          3          0          0          1          2          1          1 |         8
               3 |         0          5          5          4         10          3          3          0 |        30
               4 |         2          1          5          3          2          5          0          0 |        18
               5 |         0          3          4          4          0          0          0          0 |        11
      -----------+----------------------------------------------------------------------------------------+----------
           Total |         3         13         14         11         13         10          4          1 |        69
      This accords with my expectations based on my understanding of logical operations in Stata. rep + head == 4 is a single logical expression because + has operator precedence over ==. So Stata looks for observations where the sum of rep and head equals 4. There are a few such observations because the values of head range between 1.5 and 5.0, and all are integral multiples of 0.5, so sometimes when added to a value of rep78, the value of 4 results.

      The second is two logical expressions conjoined to make an overarching logical expression. Observations are scanned to see if rep == 4 and if head == 4, and only those where both are true get counted. You can confirm that this is what's going on by running -tab rep head- and you will see that in the rep = 4 row and head = 4.0 cell, N == 5.

      The final version is the most complicated. == has operator precedence over &, so this is, like the second , the conjunction of two logical expressions into one overarching one. head == 4 is a logical expression, and it is connected by the & operator to rep. Now rep is a numeric variable, but in the context of being an operand of &, it is treated as a logical expression by Stata's usual rule: 0 is false, anything else is true. Now, as it happens, rep never has any 0 values. So it is always true. And true & whatever == whatever. So this is the same as just head == 4, which the table again confirms to occur in 10 observations.

      Added: Crossed with #2.

      Comment


      • #4
        To answer your question, the reason the third command operates differently is because it is expanding out to say:

        Code:
        br if rep!=0 & head==4
        So it is showing all observations where rep is not 0 and head is 4, as opposed to all observations where both rep and head are 4.

        I'm not sure why the rep + head == 4 would have the same result as rep==4 & head ==4, because they are asking two different things.

        Edit: crossed with 2 and 3, which provide more detail.
        Last edited by Ali Atia; 16 Mar 2022, 11:21.

        Comment


        • #5
          x & y with & as operator and numeric x and numeric y as operands only ever yields 1 or 0, even if x or y is numeric missing. (Everything not 0 is true and everything 0 is false.)

          x + y with + as operator with numeric x and numeric y as operands yields, naturally, the sum x + y unless x or y is missing, in which case it yields missing.

          x + y with + as operator and string x and y as operands yields the concatenation of those strings, with cases such as "" + "frog" yielding "frog" and "toad" + "" yielding "toad".

          There can be fortuitous coincidences. 0 & 0 is 0 and so is 0 + 0.

          The result you cite for the auto data is just for those data: there are cases headroom == 2 and rep78 == 2 and cases headroom == 4 and rep78 == 4 but they are not the same cars.

          Comment

          Working...
          X