Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • finding a value in a variable that is based on a match between two other variables

    Hi All,

    In the example below, I am looking to find the value in "dat" that corresponds to the case of when z1 == max1. In this case, the value 10.759366 is the value I am looking for when z1 == max1 (2.848514).

    The easiest way is simply "list dat if z1==max1", however, I need create a scalar from that value, so "list" is not an option...

    Thanks in advance!

    Ariel

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(dat z1 max1)
     6.001519 .9686938 2.848514
     8.734053 2.048315 2.848514
    10.146043 2.606191 2.848514
    10.759366 2.848514 2.848514
    end
    Last edited by Ariel Linden; 04 Apr 2024, 16:41.

  • #2
    Code:
    qui levelsof dat if z1==max1, local(wanted)
    display `wanted'

    Comment


    • #3
      Thanks, Andrew!!! You always have the simplest answer!!!

      Comment


      • #4
        I hate to throw cold water on this, but it is a general problem in digital computing that conditioning on exact equality of floating point numbers is treacherous. Most numbers that have finite decimal representations do not have exact finite binary representations (e.g. the way 1/3 has no exact finite decimal representation), and, in the course of computing with floating point numbers various rounding and truncation errors may further distort things, creating results that are incorrect when viewed from the perspective of real-number arithmetic.

        Code:
        . clear
        
        . set obs 1
        Number of observations (_N) was 0, now 1.
        
        . gen x = .1234567
        
        . gen y = .7654321
        
        . gen z = .8888888
        
        . list
        
             +--------------------------------+
             |        x          y          z |
             |--------------------------------|
          1. | .1234567   .7654321   .8888888 |
             +--------------------------------+
        
        . assert z == x + y // SURPRISE!
        assertion is false
        If z1 and max1 were both directly read in from sources that had the same number of decimal places in precision, and then not modified, you will be OK. If one of the variables was obtained as something like -gen varA = varB if...-, and assuming varB and varA are both float, or both double, you will, again, have no problem. But if these numbers were arrived at in different ways, as in the example above, numbers that would, in real-number calculations, be equal, can differ.

        So unless you are quite sure that only copying variables of the same storage type, or identical calculations were used in calculating z1 and max1, you are likely to miss some of what should be matches. Usually the safest thing to do is to pick some reasonably small threshold of acceptable difference and condition on equality to within that caliper.

        Comment


        • #5
          Hi Clyde,

          Yes, I am aware of that problem, and that is why the variable max1 is generated using:

          Code:
          egen max1 =  max(z1)
          which ensures that the value(s) being compared are formatted the same and will have the same precision.

          To your point, this might not work if instead I used:

          Code:
          sum z1
          list dat if z1==r(max)
          because r(max) may be formatted differently than the variable z1 (on the sample data above this works, but it doesn't on the full dataset)

          As always, thanks for weighing in!

          Ariel

          Comment


          • #6
            Originally posted by Ariel Linden View Post
            Hi Clyde,

            Yes, I am aware of that problem, and that is why the variable max1 is generated using:

            Code:
            egen max1 = max(z1)
            which ensures that the value(s) being compared are formatted the same and will have the same precision.
            No, it does not. Watch:

            Code:
            . input double z1
            
                         z1
              1. 1.2
              2. 1
              3. end
            
            .
            . egen float max1 = max(z1)
            
            .
            . generate byte match = z1 == max1
            
            .
            . list
            
                 +--------------------+
                 |  z1   max1   match |
                 |--------------------|
              1. | 1.2    1.2       0 |
              2. |   1    1.2       0 |
                 +--------------------+
            You could compare numbers rounded to float precsion:

            Code:
            . generate match_float = float(z1) == float(max1)
            
            . list
            
                 +-------------------------------+
                 |  z1   max1   match   match_~t |
                 |-------------------------------|
              1. | 1.2    1.2       0          1 |
              2. |   1    1.2       0          0 |
                 +-------------------------------+
            Also, watch out for more than one match.


            Last edited by daniel klein; 05 Apr 2024, 08:05.

            Comment


            • #7
              Hi Daniel,

              I stand corrected! I assumed that -egen max()- copies the formatting from the variable it is working with (i.e. z1 in this case), but it appears to default to float.

              As you note, I'll have to ensure that both the variable under analysis (z1) and the "max" variable (max1) are the same format or else it may not find the match.

              After some pondering, it seems to me that a far simpler approach is to reverse sort "z1" and find the value in "dat" that is the first observation:

              Code:
              gsort -z1
              local dat_val = dat[1]
              
              di "`dat_val'"
              10.759366
              This method eliminates the need to generate the "max" variable.

              In any case, your point is well taken concerning multiple matches!

              Thanks!

              Ariel
              Last edited by Ariel Linden; 05 Apr 2024, 12:29. Reason: I specified z1[1] when it should have been dat[1]

              Comment

              Working...
              X