Round function not doing what I intend

Jose Vila

Join Date: Jun 2014

Posts: 22
#1

Round function not doing what I intend

12 May 2022, 04:16

I know this question has been asked a thousand times, so sorry for asking again. I have read several posts asking similar questions, skimmed the 'search precision' notes and tried changing the storage type to float, with no success.

I am aware of the difference between display formats and storage precision. Here, I want to round a variable to just contain three decimal values - not just change how it looks.

Below is a working example showing that 'round' does nothing when I intend to round var1 to contain only three decimals.

What is the short answer to getting this to work? Thanks!

Code:

clear insobs 1 gen float var1 = 48.3639984131 gen float var2 = round(var1, 0.001) format %20.10f var1 var2 list
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35696
#2

12 May 2022, 04:29

The short answer is that it is impossible for the reasons you allude to, all explained many times over.

Most multiples of 0.001 can not be held exactly in binary, by my reckoning only those with decimal parts 0.500, 0.750, 0.250, 0.125, 0.875 and 0.000, that is only 6 out of 1000 possibilities.

In your case round(x, 0.001) does nothing to that value because it is already as close as it can be to 48.364 given the number of bits in a float. You could get closer with a double but no approximation cannot get you exactness if an exact binary equivalent does not exist.

What you can do is multiply by 1000 and ensure that all values are integers. Otherwise, in the vast majority of threads that I have read on this people don't really need or want exactness -- but if it's decimal arithmetic you need the answer is to work in integers.
Comment
Jose Vila

Join Date: Jun 2014

Posts: 22
#3

12 May 2022, 05:02

Thank you very much for the quick reply, Nick.

I think I get it now. "Most multiples of 0.001 can not be held exactly in binary". So no matter how a variable was imported or processed, as long as it has three or more decimal values this issue will be there. Crystal clear. I can finally move on and realize that the solution to my problem was elsewhere.

Specifically, I needed to do something like:

Code:

gen equal = var1 == var2

And I was trying to equalize the number of decimals before testing for equality, without success. Knowing now that this is not possible, this did the trick:

Code:

gen equal = round(var1, 0.001) == round(var2, 0.001)
Comment
daniel klein

Join Date: Mar 2014

Posts: 3848
#4

12 May 2022, 06:36

Also, see

Code:

help float() help reldif()
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#5

12 May 2022, 07:31

Thanks for the thanks, but I have to say that #3 is still missing some of the point.

Suppose I read somewhere that a variable has values like

48.364

and I enter that and other values in Stata and use a float because that's enough precision for most of my purposes. (I use doubles when I think they are needed.)

Suppose that somebody else types in the same data to another float, or I do it twice as a check, or even by accident..

These are situations in which we can agree that Stata should be holding exactly the same value internally --
because the same code should produce the same stored value, even if it is not an exact binary approximation that is really being held.

Then in principle

Code:

assert x == y

or

Code:

gen diff = x - y

are some of the ways of checking for equality.

But I don't see that comparing round(x, 0.001)and round(y, 0.001) is needed, or helps at all.

That seems to smack of an idea of "identical to 3 decimal places" which is carried over from school arithmetic.

Stata just doesn't work to so many decimal places! At machine level, Stata works in binary and (usually) presents in decimal. That's the point.

Round about 1964 one of my mathematics teachers took a break from the rather dry textbook he was using to teach us (12-year-olds, in my case) to explain about binary arithmetic and how it was used in computers. So that was, literally, my first lesson in or about computing, although it was not until 1970 that I first even saw one and 1973 that I first used one.

I don't know how anybody else is taught such things. I am happy (although not proud) to know almost nothing about computer hardware, but knowing that computers typically work in binary has remained important to know.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

12 May 2022, 10:20

Returning to post #3, if you want to know if two values differ by less than .001 then ... compare the difference to .001.

Code:

gen almost_equal = abs(x-y)<.001

Or if you want to know that two numbers will look the same when displayed with three places to the right of the decimal, then compare the displayed values.

Code:

gen displayed_same = strofreal(x,"%12.3f")==strofreal(y,"%12.3f")
1 like
Comment
Jose Vila

Join Date: Jun 2014

Posts: 22
#7

12 May 2022, 14:43

Thank you Williams, that's a nice and simple workaround.

@Nick: what I get is that there is no solution to what I thought there should obviously be a solution. But you're right, I didn't think through why my workaround solved the issue for me in this case. I guess I thought that somehow, for Stata, the product of 'round' is consistent, as long as I don't store it in a variable. Which is quite a silly idea. In practice, the reason why this solution worked for me is because previously I was only applying 'round' to one of the variables, because I thought the other had exactly three decimals; applying 'round' to both variables fixed it for my case.

Why is it that the following example always works as I intend? 'Works' in that the equalities that I evaluate always equal 1. I get that I'm not really rounding to the third decimal, just to the closest Stata can deliver. But for practical purposes, I get what I intend, just as if Stata was effectively rounding to exactly the third decimal.

Code:

clear insobs 1 * Create two variables containing a number that should equal 48.364 if rounded * to the third decimal: gen double var1 = runiform(48.36359,48.36444) gen double var2 = runiform(48.36359,48.36444) * Let's also attempt to store 48.364: gen double var3 = 48.364 * Attempt to round them up to third decimal: gen double var1short = round(var1, 0.001) gen double var2short = round(var2, 0.001) gen double var3short = round(var3, 0.001) format %20.15f var1* var2* var3* gen var1_equal_var2 = var1short == var2short gen var2_equal_var3 = var2short == var3short list

Last edited by Jose Vila; 12 May 2022, 14:48.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

12 May 2022, 15:29

Largely because you're saving your values in double rather than float, and your rounded numbers are 5 digits long, which is a substantial part of 7 digits of precision, but not of 15 digits of precision.

Code:

. clear

. set obs 1
Number of observations (_N) was 0, now 1.

. 
. * Create two variables containing numberd that should equal 48.364 if rounded
. * up (var1) or down (var2) to the third decimal:
. 
. gen double var1 = 48.3635

. gen double var2 = 48.3644

. 
. * Let's also attempt to store 48.364:
. gen double var3 = 48.364

. 
. * Attempt to round them up to third decimal:
. gen double var1short = round(var1, 0.001)

. gen double var2short = round(var2, 0.001)

. gen double var3short = round(var3, 0.001)

. 
. format %20.15f var1* var2* var3*

. 
. gen var1_equal_var2 = var1short == var2short

. gen var2_equal_var3 = var2short == var3short

. 
. list

     +-----------------------------------------------------------------------------------+
  1. |               var1 |               var2 |               var3 |          var1short |
     | 48.363500000000002 | 48.364400000000003 | 48.363999999999997 | 48.364000000000004 |
     |-----------------------------------------------------------------------------------|
     |            var2short    |            var3short    |   var1_e~2    |   var2_e~3    |
     |   48.364000000000004    |   48.364000000000004    |          1    |          1    |
     +-----------------------------------------------------------------------------------+

.

Code:

. clear

. set obs 1
Number of observations (_N) was 0, now 1.

. 
. * Create two variables containing numberd that should equal 48.364 if rounded
. * up (var1) or down (var2) to the third decimal:
. 
. gen float var1 = 48.3635

. gen float var2 = 48.3644

. 
. * Let's also attempt to store 48.364:
. gen float var3 = 48.364

. 
. * Attempt to round them to third decimal:
. gen float var1short = round(var1, 0.001)

. gen float var2short = round(var2, 0.001)

. gen float var3short = round(var3, 0.001)

. 
. format %20.15f var1* var2* var3*

. 
. gen var1_equal_var2 = var1short == var2short

. gen var2_equal_var3 = var2short == var3short

. 
. list

     +-----------------------------------------------------------------------------------+
  1. |               var1 |               var2 |               var3 |          var1short |
     | 48.363498687744141 | 48.364398956298828 | 48.363998413085938 | 48.362998962402344 |
     |-----------------------------------------------------------------------------------|
     |            var2short    |            var3short    |   var1_e~2    |   var2_e~3    |
     |   48.363998413085938    |   48.363998413085938    |          0    |          1    |
     +-----------------------------------------------------------------------------------+

.

Comment

Jose Vila

Join Date: Jun 2014
Posts: 22

13 May 2022, 02:51

That makes sense, William. Apparently, my solution works well only as long as I'm comparing two numbers to which I apply round(X, 0.001) and as long as these numbers have a total length of 15 or less (if using double storage type). However, as soon as I do round(X, 0.0001) (i.e. attempt to compare up to the fourth decimal) this approach doesn't work, even when the integer part is just 1 digit long. Therefore your proposed solution of doing gen almost_equal = abs(x-y)<.001 is clearly better.

Code:

******* 3 decimals

clear
insobs 1

local number 400000000000 // 12 digits number (with 3 decimals wuill be 15 digits)
*local number 4000000000000 // 13 digits number (with 3 decimals wuill be 16 digits)

gen double var1 = runiform(`number'.3635, `number'.3644)
gen double var2 = runiform(`number'.3635, `number'.3644)
gen double var3 = `number'.364

* Attempt to round them up to third decimal:
gen double var1short = round(var1, 0.001)
gen double var2short = round(var2, 0.001)
gen double var3short = round(var3, 0.001)

format %20.15f var1* var2* var3*

gen var1_equal_var2 = var1short == var2short
gen var2_equal_var3 = var2short == var3short

list

Code:

******* 4 decimals

clear
insobs 1

local number 4

gen double var1 = runiform(`number'.3635, `number'.3644)
gen double var2 = runiform(`number'.3635, `number'.3644)
gen double var3 = `number'.364

* Attempt to round them up to third decimal:
gen double var1short = round(var1, 0.0001)
gen double var2short = round(var2, 0.0001)
gen double var3short = round(var3, 0.0001)

format %20.15f var1* var2* var3*

gen var1_equal_var2 = var1short == var2short
gen var2_equal_var3 = var2short == var3short

list

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35696
#10

13 May 2022, 03:00

Again: note that comparisons with 0.001 are only exactly right for doubles.

Code:

. gen float thou = 0.001 . gen double THOU = 0.001 . di (THOU[1] == 0.001) 1 . di (thou[1] == 0.001) 0

True-or-false conditions yield 1 if true and 0 if false.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#11

13 May 2022, 08:54

I can summarize my personal experience.

round(,) with second argument an integer is quite often useful, although floor() or ceil() is often even more useful. For example, 5 * floor(whatever/5) gives you bins that start at e.g. 0, 5, 10, ... while 5 * ceil(whatever/5) gives you bins that end at the same set of points.

round(,) with second argument with a fractional part is usually a distraction and disappointing. Either people need something else -- notably to use a specific display format, if the issue is what you see -- or solutions are never exactly what you want, because what you want is based on a misunderstanding of how Stata holds numbers.

More at https://www.stata-journal.com/articl...article=dm0095
1 like
Comment

Announcement