Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing an Excel dataset with 12 sheets on a single Stata dataset

    Hello Stata people;

    Is it possible to import 12 sheets of an Excel file on a single unique Stata base? Also, the variables' names exist only in the first sheet of the Excel file, it is a big dataset, that's why Excel had to divide it on 12 sheets.

    Is it possible to import all of those 12 sheets that have the same variables as the first sheet of Excel on a single unique Stata dataset? Just for information, the dataset has more than 700 000 observations. I do want to merge them in a vertical way, like, put each one beneath the previous on on Excel (like, sheet number 10 should come just after sheet number 9). Again, all the sheet have the same number of variables and the same size.

    With many thanks!

  • #2
    Yes, it is possible:
    Code:
    clear*
    tempfile building
    save `building', emptyok
    
    import excel using my_multisheet_excel_file.xlsx, describe
    local n_sheets `r(N_worksheet)'
    forvalues i = 1/`n_sheets' {
        local sheet`i' `r(worksheet_`i')'
    }
    
    forvalues i = 1/`n_sheets' {
        if `i' == 1 { // FIRST SHEET HAS VARIABLE NAMES
            import excel using my_multisheet_excel_file.xlsx, clear ///
                firstrow
            unab vbles: _all
        }
        else { // THE OTHERS DON'T
            import excel using my_multisheet_excel_file.xlsx, clear
            unab col_heads: _all
            rename (`col_heads') (`vbles')
        }
    
        append using `building'
        save `"`building'"', replace
    }
    
    save my_Stata_file_with_all_the_sheets, replace
    Note: The above is untested as I do not have a suitable Excel file available to try it out, so beware of typos or other error. But this is the gist of it and it is at least substantially correct.

    That said, I strongly recommend against doing this. There is a reasonable probability, even if the spreadsheet comes from a source that usually curates data sets carefully, that the 12 sheets will not all be as compatible as you say they are. I would instead import each sheet into a separate Stata data file. Then I would use Mark Chatfield's -precombine- program (available from Stata Journal dm0081) to find any incompatibilities among them. (The most common problem would be incompatible data storage types for the "same" variable in different sheets.) Then fix any incompatibilities and just run a simple program to append all the reconciled data files together. In the end, this will be easier and quicker than ending up with a mangled omnibus data set that has missing data or has "the same" variable in different places depending on which sheet it came from, and then going through contortions to fix this easily preventable problem.
    Last edited by Clyde Schechter; 18 Dec 2024, 16:02.

    Comment


    • #3
      Clyde Schechter Thank you! I guess your last advice was the most suitable one for my case, I just feared that I could lose some observations or have a problem in the process of what I was suggesting in the first place, so I ended up importing each Excel sheet in a separate Stata base and then I've appended the lot of them 12 sheets one by one, I guess in that way, the process is a bit longer but I can make sure that I didn't lose any information or observations. Still, I do appreciate the help and the advice Thanks again!

      Comment

      Working...
      X