Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import of a large Excel file in Stata

    Dear Statalist members,

    I am currently trying to import an xlsx. file into Stata 16, I am also using a Mac. I am getting the message that the Excel file exceeds 40MB and cannot be imported.

    I was looking for ways to get around the problem and saved my xlsx. file as xls. I managed to import the file, but then I realised that I only have 65530 rows, which is only part of my data. I have in total around 225000 rows, so a substantial amount of my data doesn't get imported. Not sure why this is the case, because the xls. file has all the data in it. The
    Code:
    cellrange
    option of import in Stata doesn't also accept if I specify the 225000 as last row.

    The other option I tried is
    Code:
    set excelxlsxlargefile on
    I am not sure how to use it, but I ran this line of code and then as next step I did the import. This doesn't work, but I am not sure if I should do it this way.

    Any ideas how can I import my Excel file?

    Albena

  • #2
    For the benefit of others, the workaround you show is given in

    https://www.statalist.org/forums/for...-large-dataset

    You tell us "This doesn't work" which doesn't help us address the question of if you should use this technique.
    • Just to be clear, after running set excelxlsxlargefile on you then returned to trying to import the xlsx file, correct?
    • What precisely happens when this "doesn't work" — is there an error message from Stata? Or does Stata crash? Please be as thorough and precise (copy-and-paste any error message, for example) as you can be.
    • Just how big is your Excel workbook?
    Others on Statalist have had success with exporting the worksheet as a CSV file and then using import delimited but that works best with fairly simple worksheets.

    Comment


    • #3
      Thank you very much for your reply! Yes, William, the reference to this similar Statalist post is correct. I saw the information about the workaround there.

      I indeed first ran the set excelxlsxlargefile on and then returned to the importing of the xlsx. file. I again get the message that the Excel file is bigger than 40MB, so the file cannot be imported.
      The file you are attempting to open is too large. The maximum size allowed for *.xlsx files is 40 megabytes.
      Nothing else happened. This is in fact the same message I got even without using the workaround.

      My file is 61,7 MB, which is truly beyond the 40MB, but should be still manageable to be imported in Stata.

      I tried also the csv. option, but it seems to be imprecise with some of the string variables. I see that for some observation the data shifts to the next column (or even further) from where it is supposed to be.

      Can I maybe try something else?

      Comment


      • #4
        It does not seem that the
        Code:
        set excelxlsxlargefile on
        took effect.

        One question: What version of Stata are you using?
        Code:
        . about
        
        Stata/SE 16.1 for Mac (Intel 64-bit)
        Revision 06 Apr 2021
        Copyright 1985-2019 StataCorp LLC
        
        Total physical memory: 8.01 GB
        
        [license information omitted]
        I suggest the following test.
        • Quit Stata and re-start it.
        • Immediately issue the set excelxlsxlargefile on command.
        • Then immediately issue the import excel command you are using.
        I'm not optimistic, however. If this does not help, you could ask of Stata Technical Services if anything has changed surrounding this workaround. The suggestion in the original reference was that it might be slow, but not that the same error message should occur if it were unable to work.

        Comment

        Working...
        X