Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Export data as XLSX - File very big

    Hello,
    I am currently trying to export data from Stata to XLSX. However the resulting excel happens to be very big. The original excel is about 15mb. When I import that in Stata, change some (very few) variables, and export it again as excel, the file is ten times bigger. Does anyone know how I can avoid this problem and make sure that the export file is not too big?
    Thanks in advance!
    Best,
    Lukas

  • #2
    Wild guess - try saving as csv, then read the csv into Excel. Does that help?

    Comment


    • #3
      Uhhhhhh........ I happen to agree with Daniel, maybe try saving it as a text file first.


      Or, on second thought, maybe before you export it back to Excel, compress your dataset with the compress command and see if that affects anything.

      Comment


      • #4
        It is also possible to create a lot "bloat" in the Excel file by applying style elements (colors, font size, shading, bold, italics, etc) to many individual cells, especially when it's the same formatting being applied across big ranges of cells. In this way, Excel creates a duplicate style entry for each and every repetition (edit: this is what I recall). We can't really say more without more details of your situation.
        Last edited by Leonardo Guizzetti; 06 Apr 2022, 08:51.

        Comment


        • #5
          I tried a simple test, but could not replicate the type of issue you describe. In Excel, I used the rand() function to generate a random value for all cells in the range A1:AAA2000. Then to fix the values in place, I simply copied and pasted the entire range as values to overwrite the formula. This made a file of about 20 MB on my system. Then in Stata, I attempt a few different operations to see how they affect the file size. In all my attempts, the file size was somewhat reduced to about 18 MB. So at least for the limited set of experiments here, there wasn't any cause for increasing the file size, much less by a huge amount.

          Code:
          * import and directly export Excel sheet
          import excel using test, clear
          export excel using test2.xlsx, replace
          
          * modify one cell in the original sheet, outside of the original data range
          copy test.xlsx test3.xlsx, public replace
          putexcel set test3.xlsx, modify open
          putexcel A2001 = "Hello"
          putexcel save
          
          * add a repetitive style to the first column, as a range
          copy test.xlsx test4.xlsx, public replace
          putexcel set test4.xlsx, modify open
          putexcel A1:A2000, overwritefmt bold italic font(Arial, 11, red)
          putexcel save
          
          * add a repetitive style to the first column, cell by cell
          copy test.xlsx test5.xlsx, public replace
          putexcel set test5.xlsx, modify open
          foreach col in A B C D E F {
            forval i = 1/1000 {
              putexcel `col'`i', overwritefmt bold italic font(Arial, 11, red)
            }
          }
          putexcel save

          Comment

          Working...
          X