Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inaccuracies in CSV export

    Dear Stata Community,

    while exporting some data to csv I experienced a problem that also appears when rounding variables. I wouldn't wonder if the problem has been posted before and apologize in advance if it is the case.
    The problem appears when rounding a fairly large variable and then either exporting it into a csv file or e.g. calculating a mean and saving it in string variable.
    An example code provoking the error is:
    Code:
       clear all
    
       set obs 1
       gen double x = 6615151515.120
    
       gen double x_r = round(x,.2)
    
       sum x_r if _n == 1
       gen hv = "`r(mean)'"
    A similar problem appears when I try to export the data to csv. Interestingly, it does not work with the example above but with a larger dataset. Here is a sample of the resulting csv file( All numeric variables were in double before exporting and looked fine):

    ID;value;;;;
    1;64.40000000000001;;;;
    2;278.83;;;;
    3;15.22;;;;
    4;173.51;;;;
    5;90.10000000000001;;;;
    6;206.93;;;;
    7;6.41;;;;
    8;65.48999999999999;;;;
    9;124.02;;;;
    10;94.23999999999999;;;;
    11;68.94;;;;
    12;12.11;;;;

    Does anyone know what causes the error and perhaps a solution?

    Best,
    Dave





  • #2
    Your problem is one of precision. The output of the help precision command will give you some background. The fact is, there is no way to store 16.4 precisely using a base-2 (binary) system, any more than one can represent 1/3 precisely using base-10 (decimal) system.

    If your export to CSV was done using the export delimited command, the output of the help export delimited command will show you that the datafmt option will cause Stata to output your values less precisely using the display format that has been assigned to them.

    Comment


    • #3
      Hi William,

      thanks a lot for the quick response. I tried your suggestion using the datafmt while exporting. This solves most of the issues even though there are still three cases left with more decimal digits than I specified in the display format. However, I still don't understand completely why the variable is stored correctly with double precision but export delimited fails to copy the value. Could you explain that in more detail?

      Best,
      Dave

      Comment


      • #4
        No, the variable is not "stored correctly" in double. It is displayed as you expect by some display formats, but others show that what is stored is an approximation.

        In the example below, we see with %21x format that when 65.49 is stored in binary it is represented in base 16 as a repeating fraction that ends in f5c28 (in binary, 11110011110000100100) repeated forever, much like 5/3 is represented in base 10 as 1.666666... with the 6 repeating forever. So when the string "65.49" is converted to a number stored in binary, the stored result will be an approximation, as 1.666 or 1.666666 etc are all approximations to 5/3.

        Some choices of display format hide the approximate nature of the value, others make the approximation apparent.

        Code:
        . clear all
        
        . set obs 1
        number of observations (_N) was 0, now 1
        
        . gen double value = 65.49
        
        . format %21x value
        
        . list, clean noobs
        
                            value  
            +1.05f5c28f5c28fX+006  
        
        . format %20.10f value
        
        . list, clean noobs
        
                    value  
            65.4900000000  
        
        . format %20.14f value
        
        . list, clean noobs
        
                        value  
            65.48999999999999

        Comment


        • #5
          Ok, now I undestand. Thanks for the clarification.

          Comment

          Working...
          X