Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tostring cannot be converted reversibly despite specifying format

    Hi,

    For various different reasons I won't go into here (mainly to do with appending/merging data sets), we need to convert a variable from numeric to string, however whenever we try we get the "cannot be converted reversibly; no generate" error when using -tostring-. We do have a large number with many decimal points (see example below), however specifying format does not seem to fix this. We have a rather large data set (~17.2M observations) - not sure if this factors in anywhere. Anyone have any suggestions that may help?

    I'm running MP 15.1 on Windows Server 2016 Standard

    Syntax:
    Code:
    tostring VisitsOid, generate(VisitsOid2) format(%13.8g)
    Example data:
    VisitsOid
    1. 2623.14166461
    2. 2623.14166462
    3. 2623.14166463
    4. 2623.14166464
    5. 2623.14166467
    6. 2623.14166468
    7. 2623.14166469
    8. 2623.14166473
    9. 2623.14166474
    10. 2623.14166475
    11. 2623.14166476
    12. 2623.14166477
    13. 2623.14166481
    14. 2623.14166482
    15. 2623.14166483
    16. 2623.14166484
    17. 2623.14166485
    18. 2623.14166486
    19. 2623.14166487
    20. 2623.14166488
    (apologies for not using dataex - we're having space issues at the moment resulting in I/O errors)

    Any help you can provide would be much appreciated.

    Cheers,

    Marissa

  • #2
    Use the force option.
    Stata is telling you that conversion to string would lead to a loss of precision going from numeric to string, and back again.
    Use with caution, and check your results afterwards.

    Comment


    • #3
      I cannot reproduce your problem:

      Code:
      . * Example generated by -dataex-. To install: ssc install dataex
      . clear
      
      . input double var1
      
                 var1
        1. 2623.14166461
        2. 2623.14166462
        3. 2623.14166463
        4. 2623.14166464
        5. 2623.14166465
        6. end
      
      .
      .
      . des
      
      Contains data
        obs:             5                          
       vars:             1                          
       size:            40                          
      --------------------------------------------------------------------------------------------------------------------------------
                    storage   display    value
      variable name   type    format     label      variable label
      --------------------------------------------------------------------------------------------------------------------------------
      var1            double  %10.0g                
      --------------------------------------------------------------------------------------------------------------------------------
      Sorted by:
           Note: Dataset has changed since last saved.
      
      . list, noobs clean
      
               var1  
          2623.1417  
          2623.1417  
          2623.1417  
          2623.1417  
          2623.1417  
      
      .
      . tostring var1, replace format(%13.8f)
      var1 was double now str13
      
      . des
      
      Contains data
        obs:             5                          
       vars:             1                          
       size:            65                          
      --------------------------------------------------------------------------------------------------------------------------------
                    storage   display    value
      variable name   type    format     label      variable label
      --------------------------------------------------------------------------------------------------------------------------------
      var1            str13   %13s                  
      --------------------------------------------------------------------------------------------------------------------------------
      Sorted by:
           Note: Dataset has changed since last saved.
      
      . list, noobs clean
      
                   var1  
          2623.14166461  
          2623.14166462  
          2623.14166463  
          2623.14166464  
          2623.14166465
      works just fine, as you can see.

      But I think I know why you have this problem, and the clue is that you are "having output problems" with -dataex-. You don't say what that means, but I'll bet it means that when you run -dataex- the numbers you get do not match the numbers you are showing here: they are close, but they are off in the last 1 or 2 decimal places. Am I right?

      The problem, then, is that your variable was created as a -float-. But a -float- does not have adequate storage capacity to store this many digits. So the actual numbers in your data set are just approximations to the numbers you think you have, but they are off in the final digits. Converting to string will not save you, even if you -force- it: the information is already lost.

      Numbers this long have to be stored as -double-s, not -float-s. And you have to do it that way from the start: -recast-ing the existing variables as double will not help you now: the information is already lost and -recast-ing will just pad the inaccurate numbers with zeroes on the right.

      So I think you have to re-generate the data file from its original source and be sure to create those numbers as doubles in the first place. (Actually, the name suggests that they are not really numeric variables but just IDs that happen to resemble numbers visually. They are not numbers in the sense that you plan to use them in calculations. If that is the case, then rather than creating this variable as a -double- you might as well just create it as a string variable in the first place.)

      Added: Crossed with #2. I strongly disagree with that recommendation. Nearly all of the variation in the numbers you show is in the lower order decimal places: these are precisely the ones that are not accurately represented and are causing the problem. If you -force- the -tostring- operation you will get strings with the wrong digits in those low order places, and the variable will not be useful. In fairness, #2 does say you should check the results. I predict that when you do, you will see they are unsatisfactory. The -force- option is only suitable when there is variation in higher order places and the low order digits that stand to be lost or obfuscated are really meaningless in any case.
      Last edited by Clyde Schechter; 16 Apr 2018, 22:04.

      Comment


      • #4
        Hi to you both and thanks for replying,

        Jorrit - we cannot use force for the reasons that Clyde mentions. We need to maintain precision to all 8 decimal places as the smallest decimal places are where our differentiation lies, which we need to maintain.

        Clyde - thank you for your thorough response! I can see that it is working for you so there is definitely something particular to our data set which is creating the issue. Some answers to your queries:
        • Apologies for not being more specific with dataex problem - the I/O error we have been getting is this: "I/O error writing .dta file. Usually such I/O errors are caused by the disk or file system being full." (essentially we do not have enough space on the disk for stata to run the command - note this is disk storage not RAM). Because of this error I do not get any output at all when running dataex because of this error - I'm working to free up more space on our disks.
        • The variable is currently double, and as far as I can tell was created that way. It is currently formatted as %13.8f (though have tried bigger formats with no success)
        • You are correct that this is an ID number and will only be used as unique ID to merge multiple data sets together - we will never use this variable for any kind of computation etc.
        • We would prefer to not re-generate the file from the data source as it is rather large so if we can manage to transform it here it will be preferable.
        Thanks again for your help!

        Marissa

        Comment

        Working...
        X