Rounding using the tabstat compand with formatting

Megan Moreton

Join Date: Jan 2020
Posts: 56

Rounding using the tabstat compand with formatting

23 Jan 2025, 08:24

Hello

Hopefully just a quick one - I was comparing some results I produced using the "statsby" prefix with those when using "tabstat" & noticed a few minor differences that I can't work out.

I looked a little closer with different formatting in tabstat and I don't understand some (minor) differences when I format to a different number of units.

Eg, in the example below, when I use tabstat & format it to 1 dp (%9.1f) - can someone explain why the p50 for the "Control" group doesn't round to 5.6 - when I show it to 3dp (%9.3f) it is 5.550? (highlighted in red). Using statsby, it rounds to 5.6 as I would expect.

Code:

. tabstat glucose, stats(n p50 p25 p75) by(randgroup) format(%9.3f)

Summary for variables: glucose
Group variable: randgroup (randgroup)

   randgroup |         N       p50       p25       p75
-------------+----------------------------------------
     Control |   278.000     5.700     5.000     6.700
Intervention |   280.000     5.550     4.800     6.800
-------------+----------------------------------------
       Total |   558.000     5.600     4.900     6.800
------------------------------------------------------

. tabstat glucose, stats(n p50 p25 p75) by(randgroup) format(%9.1f)

Summary for variables: glucose
Group variable: randgroup (randgroup)

   randgroup |         N       p50       p25       p75
-------------+----------------------------------------
     Control |     278.0       5.7       5.0       6.7
Intervention |     280.0       5.5       4.8       6.8
-------------+----------------------------------------
       Total |     558.0       5.6       4.9       6.8
------------------------------------------------------

Probably easily explainable, just wanted to understand why I am getting slightly different results. Thank you!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

23 Jan 2025, 09:00

Further rounding an already rounded number does not necessarily give the same result as simply doing the fuller rounding by itself. To address your specific situation, suppose that the true underlying value for the number you highlighted in red is 5.549999... Rounded to three decimal places this is 5.550. But rounded to 1 decimal place it is 5.5.

By the way, this kind of difficulty can accumulate and add up to large differences over the course of a project. I recall something that a collaborator and I worked on a few years ago. My collaborator rounded results to 2 decimal places at every stage of the calculation. I carried as many decimal places as were available until the end, and then rounded the final result to 2 decimal places. The calculations were several hundred steps long. By the time we were done, our results didn't even agree to the nearest integer!

Moral of the story: round only once, at the end.

All of that said, I have no explanation for why -statsby- gives you different results. I haven't used -statsby- in a very long time now, and I don't know enough about its inner workings to say what is going on there.
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#3

23 Jan 2025, 10:25

The median of 280 values is predictably half way between ordered values 140 and 141. So, you have an independent check needing only an appropriate sort first.

From the rest of your output, glucose is perhaps reported to 1 decimal place. So, perhaps the median is the average of 5.5 and 5.6.

Now things are still not as simple as you hope, as a decimal such as 5.6 can only be held approximately even if as a double. 5.5 is easier to hold in binary.

Here are some results. In essence, the answer likely should be 5.55 but even with a double Stata can't approximate that exactly, and in some circumstances the result will be reported as 5.5 to 1 d.p.

Code:

. clear . set obs 2 Number of observations (_N) was 0, now 2. . gen glucose_f = cond(_n == 1, 5.5, 5.6) . gen double glucose_d = cond(_n == 1, 5.5, 5.6) . su Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- glucose_f | 2 5.55 .0707106 5.5 5.6 glucose_d | 2 5.55 .0707107 5.5 5.6 . su glucose_f, meanonly . di %23.18f r(mean) 5.549999952316284180 . su glucose_d, meanonly . di %23.18f r(mean) 5.549999999999999822

So, @Clyde Schechter's guess matches mine. This is all a side-effect of precision.
2 likes
Comment
Megan Moreton

Join Date: Jan 2020

Posts: 56
#4

23 Jan 2025, 11:03

OK, that's really helpful - thank you both for taking the time to explain. Will be sure to save all the rounding to the end!
Comment

Announcement

Rounding using the tabstat compand with formatting

Comment

Comment

Comment