Bug with strofreal()

Mael Astruc

Join Date: Nov 2023

Posts: 4
#1

Bug with strofreal()

07 Aug 2024, 12:32

Hello,

I have encountered an issue while trying to convert reals to strings.

Here is my initial issue: I have a program that stores different values of a variable, does some stuff in Mata and uses these values in a Stata command.
I have the exact value as a real in Mata

I convert it as a string using the strofreal() function

I concatenate my Stata command with the sprintf() function

I know that floats have a precision issue due to the underlying base 2 of the computer. I tested and deduced that Stata/Mata considers two values equal if the first 15 decimals are the same.

Code:

1.0 + 10^-15 == 1.0 // false 1.0 + 10^-16 == 1.0 // true 0.1 + 10^-17 == 0.1 // false 0.1 + 10^-18 == 0.1 // true 0.01 + 10^-17 == 0.01 // false 0.01 + 10^-18 == 0.01 // true

When there is no leading digit, I deduced that the closest value to 0 is 10^-307

Code:

10^-307 == 0 // false 10^-308 == 0 // true

The next step was to check what format in strofreal() allowed me to get back the right value after the conversion.

Code:

strtoreal(strofreal(10^-307, "%22.0g")) == 10^-307 // false strtoreal(strofreal(10^-307, "%23.0g")) == 10^-307 // true

But after brute forcing it, it seems that a width of 24 handles most of the cases

Code:

for (i = -308; i <= 308; i++) { x_num = 10^i x_str = strofreal(x_num, "%24.0g") x_back = strtoreal(x_str) are_same = (x_back == x_num) if (are_same == 0) { i x_num x_str x_back x_num - x_back } }

But there is still a difference for 10^19 and 10^20 and no width seem to correct it.

Worse:
this does not happen on every run

other values do not work anymore with larger widths such as 10^21, 10^22, 10^23

When I look at the result for 10^19 for example, it does not make sense

Code:

strofreal(10^19, "%23.0g") // 1.0000000000000000e+19 strofreal(10^19, "%24.0g") // 1000000000000000000.\x17 strofreal(10^19, "%25.0g") // 1000000000000000000.\x17 strofreal(10^19, "%26.0g") // 1000000000000000000.\x17 sprintf("%s", strofreal(10^19, "%23.0g")) // 1.0000000000000000e+19 sprintf("%s", strofreal(10^19, "%24.0g")) // 1.00000000000000000e+19 sprintf("%s", strofreal(10^19, "%25.0g")) // 1.000000000000000000e+19 sprintf("%s", strofreal(10^19, "%26.0g")) // 1000000000000000000.8\x0c\x1c

Hence, I have few questions:
Does my approach to inject back the value in the Stata command makes sense ?

Is there a known width that would suffice ?

What is the issue with 10^19 and 10^20 ?

How can I handle it ?

Best regards,
Mael Astruc--Le Souder
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3747
#2

07 Aug 2024, 13:14

I have not read everything so I am not even sure why you need the conversion from numeric to string. You could probably store the values in a (temporary) scalar and pass that to Stata. Anyway, the solution to your problems is %21x (there is a second part to that blog entry).

Code:

for (i = -308; i <= 308; i++) { x_num = 10^i x_str = strofreal(x_num, "%21x") // <- use %21x format here x_back = strtoreal(x_str) are_same = (x_back == x_num) if (are_same == 0) { i x_num x_str x_back x_num - x_back } }

Last edited by daniel klein; 07 Aug 2024, 13:19.
2 likes
Comment
Mael Astruc

Join Date: Nov 2023

Posts: 4
#3

08 Aug 2024, 01:54

Thank you very much, it seems I skipped this part of the manual.

I tried with "%21x" and it works perfectly. The blog post was also very instructive, thanks for sharing it.

I still don't understand these weird formatting issue with 10^19, but if it works, that's good enough for me.
Comment
Mauricio Caceres

Join Date: Sep 2015

Posts: 130
#4

08 Aug 2024, 23:58

Mael Astruc The largest integer that can be represented by a double is 2^53, and you can see here:

Code:

strofreal(2^53, "%24.0g") // correct strofreal(2^53 + 1, "%24.0g") // wrong strofreal(2^53 + 2, "%24.0g") // correct strofreal(2^53 + 3, "%24.0g") // wrong

I will grant you that it's odd 19 and 20 in particular encounter this bug, so there's probably something off in strofreal that's causing the problem. However, at those magnitudes, you shouldn't expect the conversion to work. %21x stores the underlying representation of the number, so it's lossless. Btw, when you noted

Originally posted by Mael Astruc View Post

I know that floats have a precision issue due to the underlying base 2 of the computer. I tested and deduced that Stata/Mata considers two values equal if the first 15 decimals are the same.

you're close to being right. The so-called "machine epsilon" for a given number is 2^-53, so two numbers can be off by twice that and potentially be "the same" in the eyes of the computer. This is 2e-16.
Comment
Mael Astruc

Join Date: Nov 2023

Posts: 4
#5

09 Aug 2024, 02:30

Mauricio Caceres thank you for your response. I saw that the precision was defined by the 53 bits, but I didn't connect the dots with the 10e-16, I thought 2^-53 was much larger. Thank you for highlighting it, everything makes much more sense now.

The "%g" format approach was flawed from the start, I will stick to the "%21x" for now. Ideally, I will rewrite the code using daniel klein's suggestion to create a temporary scalar, but right now my issue is that I have to re-inject an unknown number of scalars into the command, which is a bit tricky and will require a deeper rewrite using a vector and indices.
Comment

Announcement

Bug with strofreal()

Comment

Comment

Comment

Comment