Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maintain exact formatting of data with import excel

    I am attempting to use Stata to manipulate Excel workbooks, with import (and export) excel. Importing changes the values of numbers (even though they are read in as strings since there are other non-numeric characters in the same row). For example, 0.814 in Excel becomes .8139999999999999 in Stata. I need the numbers to stay exactly as they appear in Excel, since the data later directly become tables in a publication via texsave. (Perhaps not the greatest workflow, but that's what I'm stuck with.)

    I assume this has to do with machine precision, but I thought the format would be preserved since it reads the numbers as strings anyway. The 'allstring' option doesn't help since they're already strings, and 'keepcellformat' is only for exporting. (Maybe I'll have to use that when exporting though.) If there is no way to enforce it in Stata I'd be curious if there are things I could do in Excel (formatting cells as text didn't seem to work.)

    A small excel file is attached.

    A code example would just be something like:
    Code:
    clear all
    import excel using ./Test.xlsx, clear //preferably with an option to keep exact formatting
    *Do other unrelated things
    export excel using ./Testoutput.xlsx, replace keepcellfmt

    Attached Files

  • #2
    You could save the Excel file as a CSV file and use import delimited. See

    Code:
    help import delimited

    Comment


    • #3
      The main issue is that Stata gets the exact value of each cell and ignores the display format that Excel uses to display it's own data. This is often the behaviour that is wanted, but if you must have exactly the data as it is shown, then you'll need to export it to a format that will do this as Andrew suggests.

      You can test this yourself with a toy example, as I have done and will describe below. Here is an Excel file with a single number presented to full accuracy and then rounded in display format only 1 decimal place.

      Click image for larger version

Name:	excel file.png
Views:	1
Size:	5.8 KB
ID:	1737562


      Code:

      Code:
      clear *
      
      import excel using test.xlsx, clear
      desc
      list, abbrev(20)
      
      import excel using test.xlsx, firstrow clear
      desc
      list, abbrev(20)
      
      import delimited using test.csv, varnames(1) asdouble clear
      desc
      list, abbrev(20)
      Result:

      Code:
      . import excel using test.xlsx, clear
      (2 vars, 2 obs)
      
      . desc
      
      Contains data
       Observations:             2                  
          Variables:             2                  
      ------------------------------------------------------------------------------
      Variable      Storage   Display    Value
          name         type    format    label      Variable label
      ------------------------------------------------------------------------------
      A               str17   %17s                  
      B               str17   %17s                  
      ------------------------------------------------------------------------------
      Sorted by:
           Note: Dataset has changed since last saved.
      
      . list, abbrev(20)
      
           +---------------------------------------+
           |                 A                   B |
           |---------------------------------------|
        1. |     no formatting          formatting |
        2. | 8.235239999999999   8.235239999999999 |
           +---------------------------------------+
      
      .
      . import excel using test.xlsx, firstrow clear
      (2 vars, 1 obs)
      
      . desc
      
      Contains data
       Observations:             1                  
          Variables:             2                  
      ------------------------------------------------------------------------------
      Variable      Storage   Display    Value
          name         type    format    label      Variable label
      ------------------------------------------------------------------------------
      noformatting    double  %10.0g                no formatting
      formatting      double  %10.0g                formatting
      ------------------------------------------------------------------------------
      Sorted by:
           Note: Dataset has changed since last saved.
      
      . list, abbrev(20)
      
           +---------------------------+
           | noformatting   formatting |
           |---------------------------|
        1. |      8.23524      8.23524 |
           +---------------------------+
      
      .
      . import delimited using test.csv, varnames(1) asdouble clear
      (encoding automatically selected: UTF-8)
      (2 vars, 1 obs)
      
      . desc
      
      Contains data
       Observations:             1                  
          Variables:             2                  
      ------------------------------------------------------------------------------
      Variable      Storage   Display    Value
          name         type    format    label      Variable label
      ------------------------------------------------------------------------------
      noformatting    double  %10.0g                no formatting
      formatting      double  %10.0g                
      ------------------------------------------------------------------------------
      Sorted by:
           Note: Dataset has changed since last saved.
      
      . list, abbrev(20)
      
           +---------------------------+
           | noformatting   formatting |
           |---------------------------|
        1. |      8.23524          8.2 |
           +---------------------------+

      Comment


      • #4
        Andrew Musau Yep, import delimited after saving as a .csv works. That means I have to save all the multiple sheets of my Excel workbook separately, making the workflow less automated, but maybe that's the best that can be done. Thanks!

        Leonardo Guizzetti I'm not so sure it has to do with Excel's formatting, since I didn't do any formatting in Excel. I typed exactly 0.814 (see my attached data set) and it reads in to Stata via import excel as 0.8139999999999. Of course, if I type '0.814 (adding the apostrophe before the number) it reads into Excel as desired. So it seems like a machine precision thing, and I'm still hoping there's some way to truly force Excel to treat is like a string rather than a number, but maybe there isn't.

        Comment


        • #5
          Originally posted by Garret Christensen View Post
          [USER="4687"]
          Leonardo Guizzetti I'm not so sure it has to do with Excel's formatting, since I didn't do any formatting in Excel. I typed exactly 0.814 (see my attached data set) and it reads in to Stata via import excel as 0.8139999999999. Of course, if I type '0.814 (adding the apostrophe before the number) it reads into Excel as desired. So it seems like a machine precision thing, and I'm still hoping there's some way to truly force Excel to treat is like a string rather than a number, but maybe there isn't.
          I agree it is a machine precision issue, but I think it is an interaction with how Excel stores the number more than how Stata reads it. I too typed exact 8.23524 (as a random number). When Stata reads in the number in the the first example, the value is represented within machine precision 8235239999999999. When Stata is allowed to read in the number with double precision, then the number displays as intended, as in the 2nd example. Prepending the single apostrophe as you have done signals to Excel to regard this data as a text string and that is either stored different in Excel or interpreted differently by Stata.

          Comment


          • #6
            Originally posted by Garret Christensen View Post
            That means I have to save all the multiple sheets of my Excel workbook separately, making the workflow less automated
            There should be some VBA code that automates that. This claims to do just that. Note that you can get information about all sheets in an Excel file and row ranges using the -describe- option of import excel.

            Code:
            clear
            qui import excel using myfile.xlsx, describe
            return list
            Last edited by Andrew Musau; 19 Dec 2023, 15:20.

            Comment

            Working...
            X