Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rounding using the tabstat compand with formatting

    Hello

    Hopefully just a quick one - I was comparing some results I produced using the "statsby" prefix with those when using "tabstat" & noticed a few minor differences that I can't work out.

    I looked a little closer with different formatting in tabstat and I don't understand some (minor) differences when I format to a different number of units.

    Eg, in the example below, when I use tabstat & format it to 1 dp (%9.1f) - can someone explain why the p50 for the "Control" group doesn't round to 5.6 - when I show it to 3dp (%9.3f) it is 5.550? (highlighted in red). Using statsby, it rounds to 5.6 as I would expect.


    Code:
    . tabstat glucose, stats(n p50 p25 p75) by(randgroup) format(%9.3f)
    
    Summary for variables: glucose
    Group variable: randgroup (randgroup)
    
       randgroup |         N       p50       p25       p75
    -------------+----------------------------------------
         Control |   278.000     5.700     5.000     6.700
    Intervention |   280.000     5.550     4.800     6.800
    -------------+----------------------------------------
           Total |   558.000     5.600     4.900     6.800
    ------------------------------------------------------
    
    . tabstat glucose, stats(n p50 p25 p75) by(randgroup) format(%9.1f)
    
    Summary for variables: glucose
    Group variable: randgroup (randgroup)
    
       randgroup |         N       p50       p25       p75
    -------------+----------------------------------------
         Control |     278.0       5.7       5.0       6.7
    Intervention |     280.0       5.5       4.8       6.8
    -------------+----------------------------------------
           Total |     558.0       5.6       4.9       6.8
    ------------------------------------------------------
    Probably easily explainable, just wanted to understand why I am getting slightly different results. Thank you!

  • #2
    Further rounding an already rounded number does not necessarily give the same result as simply doing the fuller rounding by itself. To address your specific situation, suppose that the true underlying value for the number you highlighted in red is 5.549999... Rounded to three decimal places this is 5.550. But rounded to 1 decimal place it is 5.5.

    By the way, this kind of difficulty can accumulate and add up to large differences over the course of a project. I recall something that a collaborator and I worked on a few years ago. My collaborator rounded results to 2 decimal places at every stage of the calculation. I carried as many decimal places as were available until the end, and then rounded the final result to 2 decimal places. The calculations were several hundred steps long. By the time we were done, our results didn't even agree to the nearest integer!

    Moral of the story: round only once, at the end.

    All of that said, I have no explanation for why -statsby- gives you different results. I haven't used -statsby- in a very long time now, and I don't know enough about its inner workings to say what is going on there.

    Comment


    • #3
      The median of 280 values is predictably half way between ordered values 140 and 141. So, you have an independent check needing only an appropriate sort first.

      From the rest of your output, glucose is perhaps reported to 1 decimal place. So, perhaps the median is the average of 5.5 and 5.6.

      Now things are still not as simple as you hope, as a decimal such as 5.6 can only be held approximately even if as a double. 5.5 is easier to hold in binary.

      Here are some results. In essence, the answer likely should be 5.55 but even with a double Stata can't approximate that exactly, and in some circumstances the result will be reported as 5.5 to 1 d.p.

      Code:
      . clear
      
      . set obs 2
      Number of observations (_N) was 0, now 2.
      
      . gen glucose_f = cond(_n == 1, 5.5, 5.6)
      
      . gen double glucose_d = cond(_n == 1, 5.5, 5.6)
      
      . su
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
         glucose_f |          2        5.55    .0707106        5.5        5.6
         glucose_d |          2        5.55    .0707107        5.5        5.6
      
      . su glucose_f, meanonly
      
      . di %23.18f r(mean)
         5.549999952316284180
      
      . su glucose_d, meanonly
      
      . di %23.18f r(mean)
         5.549999999999999822
      So, @Clyde Schechter's guess matches mine. This is all a side-effect of precision.

      Comment


      • #4
        OK, that's really helpful - thank you both for taking the time to explain. Will be sure to save all the rounding to the end!

        Comment

        Working...
        X