finding a value in a variable that is based on a match between two other variables

Ariel Linden

Join Date: Apr 2014

Posts: 153
#1

finding a value in a variable that is based on a match between two other variables

04 Apr 2024, 16:36

Hi All,

In the example below, I am looking to find the value in "dat" that corresponds to the case of when z1 == max1. In this case, the value 10.759366 is the value I am looking for when z1 == max1 (2.848514).

The easiest way is simply "list dat if z1==max1", however, I need create a scalar from that value, so "list" is not an option...

Thanks in advance!

Ariel

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(dat z1 max1) 6.001519 .9686938 2.848514 8.734053 2.048315 2.848514 10.146043 2.606191 2.848514 10.759366 2.848514 2.848514 end

Last edited by Ariel Linden; 04 Apr 2024, 16:41.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 9945
#2

04 Apr 2024, 16:45

Code:

qui levelsof dat if z1==max1, local(wanted) display `wanted'
Comment
Ariel Linden

Join Date: Apr 2014

Posts: 153
#3

04 Apr 2024, 16:47

Thanks, Andrew!!! You always have the simplest answer!!!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#4

04 Apr 2024, 18:07

I hate to throw cold water on this, but it is a general problem in digital computing that conditioning on exact equality of floating point numbers is treacherous. Most numbers that have finite decimal representations do not have exact finite binary representations (e.g. the way 1/3 has no exact finite decimal representation), and, in the course of computing with floating point numbers various rounding and truncation errors may further distort things, creating results that are incorrect when viewed from the perspective of real-number arithmetic.

Code:

. clear . set obs 1 Number of observations (_N) was 0, now 1. . gen x = .1234567 . gen y = .7654321 . gen z = .8888888 . list +--------------------------------+ | x y z | |--------------------------------| 1. | .1234567 .7654321 .8888888 | +--------------------------------+ . assert z == x + y // SURPRISE! assertion is false

If z1 and max1 were both directly read in from sources that had the same number of decimal places in precision, and then not modified, you will be OK. If one of the variables was obtained as something like -gen varA = varB if...-, and assuming varB and varA are both float, or both double, you will, again, have no problem. But if these numbers were arrived at in different ways, as in the example above, numbers that would, in real-number calculations, be equal, can differ.

So unless you are quite sure that only copying variables of the same storage type, or identical calculations were used in calculating z1 and max1, you are likely to miss some of what should be matches. Usually the safest thing to do is to pick some reasonably small threshold of acceptable difference and condition on equality to within that caliper.
Comment
Ariel Linden

Join Date: Apr 2014

Posts: 153
#5

04 Apr 2024, 23:19

Hi Clyde,

Yes, I am aware of that problem, and that is why the variable max1 is generated using:

Code:

egen max1 = max(z1)

which ensures that the value(s) being compared are formatted the same and will have the same precision.

To your point, this might not work if instead I used:

Code:

sum z1 list dat if z1==r(max)

because r(max) may be formatted differently than the variable z1 (on the sample data above this works, but it doesn't on the full dataset)

As always, thanks for weighing in!

Ariel
Comment

daniel klein

Join Date: Mar 2014
Posts: 3805

05 Apr 2024, 08:02

Originally posted by Ariel Linden View Post

Hi Clyde,

Yes, I am aware of that problem, and that is why the variable max1 is generated using:

Code:

egen max1 = max(z1)

which ensures that the value(s) being compared are formatted the same and will have the same precision.

No, it does not. Watch:

Code:

. input double z1

             z1
  1. 1.2
  2. 1
  3. end

.
. egen float max1 = max(z1)

.
. generate byte match = z1 == max1

.
. list

     +--------------------+
     |  z1   max1   match |
     |--------------------|
  1. | 1.2    1.2       0 |
  2. |   1    1.2       0 |
     +--------------------+

You could compare numbers rounded to float precsion:

Code:

. generate match_float = float(z1) == float(max1)

. list

     +-------------------------------+
     |  z1   max1   match   match_~t |
     |-------------------------------|
  1. | 1.2    1.2       0          1 |
  2. |   1    1.2       0          0 |
     +-------------------------------+

Also, watch out for more than one match.

Last edited by daniel klein; 05 Apr 2024, 08:05.

Comment

Ariel Linden

Join Date: Apr 2014

Posts: 153
#7

05 Apr 2024, 12:12

Hi Daniel,

I stand corrected! I assumed that -egen max()- copies the formatting from the variable it is working with (i.e. z1 in this case), but it appears to default to float.

As you note, I'll have to ensure that both the variable under analysis (z1) and the "max" variable (max1) are the same format or else it may not find the match.

After some pondering, it seems to me that a far simpler approach is to reverse sort "z1" and find the value in "dat" that is the first observation:

Code:

gsort -z1 local dat_val = dat[1] di "`dat_val'"

10.759366

This method eliminates the need to generate the "max" variable.

In any case, your point is well taken concerning multiple matches!

Thanks!

Ariel

Last edited by Ariel Linden; 05 Apr 2024, 12:29. Reason: I specified z1[1] when it should have been dat[1]
1 like
Comment

Announcement

finding a value in a variable that is based on a match between two other variables

Comment

Comment

Comment

Comment

Comment

Comment