Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • import excel and absorb the sheet name in the process

    Hello, I have several excel files that I need to import, where each sheet is named after the variable it contains data for. This variable name does not appear anywhere else in the sheet. Here is a simplified example of what it looks like:

    Click image for larger version

Name:	mwe16.9.17.PNG
Views:	1
Size:	24.6 KB
ID:	1410698


    I finally need to convert this into a panel of the following form:
    Code:
    Country   Year        GDP        Inflation      Unemployment
    Albania   1984        2.1        3.3            0,00
              1985        2.2        4.3            0,00
              1986        2.3        5.0            0,00
              1987        2.4        6.0            1.83
              1988        2.6        2.0            0,00
    Algeria   1984        1.1        1.0            4,00
              1985        1.3        0.8            4,00
              1986        1.2        1.5            4,00
              1987        1.0        1.3            6,29
              1988        1.0        1.0            6,50
    I now need to extract the sheet name during the import excel process, and store the sheet name as a macro, and then call it when I do reshape long to put it on top of the column. Finally, I do the same for all other sheets in this and other excel files (each excel file has different number of sheets); and combine the files using e.g. joinby.

    I don't know how to write the loop that extracts the sheet name from each excel sheet, when the sheet names (i.e. the variable names) are not given beforehand. I came up with this:
    Code:
    import excel using $import_file , describe
    
    scalar N = r(N_worksheet)
    
    *loop through sheets:
    forval i = 2/`=scalar(N)' {
    
    *import sheet i and save as file
    import excel using $import_file , sheet(i) firstrow clear
    
    *somehow extract and save the sheetname and use in reshape command
    *...
    
    save project_file_`i'dta, replace
    Bt I need an idea on the sheet name extraction, and how to import the ith sheet instead of the sheet with name i. Because for example

    Code:
    import excel using $import_file , sheet(2)
    is giving an error instead of importing the second sheet.

  • #2
    Before you import any of the sheets, run
    Code:
    import excel using $import_file, describe
    Stata will return r(N_worksheet) with the number of sheets in the file. You can then loop over those:
    Code:
    local n_sheets `r(N_worksheet)'
    forvalues i = 1/`n_sheets' {
        local sheet`i' `r(worksheet_`i')
    }
    Then when you can do another loop over the worksheets:

    Code:
    forvalues i = 1/`n_sheets' {
        import excel using $import_file, sheet(`sheet`i'')
       // DO STUFF TO CLEAN THE DATA AND RESHAPE IT
       // ETC.
       rename whatever `sheet`i''
    }
    Note: Not tested, so beware of typos, unbalanced braces, parens, quotes, etc. But this is the gist of it.

    A couple of additional comments:

    Using a global macro to store the name of the file to be imported is not a good idea. Local macros are safer.

    From the example you show, I don't understand why you would put the files together using -joinby-. This seems much more like a -merge 1:1 country year- operation to me. Perhaps there is more to it that you haven't shown here.

    The example of the panel data you want to create is dangerous and will likely land you in a lot of trouble. You need to have the country specified in every observation. Leaving it blank and "inferring" it from the nearest non-blank observation above it works fine for human eyes. But in a statistical package things will quickly go bad. There are many Stata commands that change the order of the data. Once that data is out of its original order, you have no idea which country most observations apply to and any results you got are almost sure to be incorrect, perhaps in bizarre ways.

    In working with Stata, you have to try to put spreadsheet-based ideas on data handling out of your mind. They almost never help and they often create serious errors. Every night before you go to bed, repeat 10 times: Stata is not a spreadsheet.
    Last edited by Clyde Schechter; 16 Sep 2017, 14:19.

    Comment


    • #3
      I'd suggest dropping down into Mata for this. You can get the worksheet names and read the data from all the files a bit more seamlessly. You can find more by reading the help for the xl() function.

      Comment


      • #4
        Thank you for the detailed reply, Clyde. Yes, I will have the country in each observation, that was just an example I created on the fly.
        Thank you also, wbuchanan. I don't have any Mata experience yet, and I think I got it working quite well without Mata for now.

        However, I am now stuck just before the finish line. My code looks like this:
        Code:
        clear
        cls
        set more off
        
        local impfiles "Table_3B.xls"
        * Table_4B.xls Table_5B.xls"
        tokenize `impfiles'
        
        foreach f in `impfiles' {
        
        import excel using `f', describe
        local n_shts `r(N_worksheet)'
        
        forvalues i = 2/`n_shts' {
            local shtname`i' `r(worksheet_`i')'
            macro list
            *di `shtname`i''
        }
        
        
        forvalues i = 2/`n_shts' {
        import excel using `f', sheet(`shtname`i'') cellrange(A8) firstrow clear
        
        *destring and rename year "variables"
        ds Country, not
        foreach v of var `r(varlist)' {
            destring `v', replace
            local l : variable label `v'
            rename `v' year`l'
            }
        
        reshape long year, i(Country) j(Year)
        
        local shtname`i' subinstr(shtname`i', " ", "", .)
        local shtname`i' subinstr(shtname`i', "-", "_", .)
        macro list
        
        rename year `shtname`i''
        }
            
        }
        In the last lines, I am trying to name the variable year that is left after the reshape command after the name of the excel sheet:
        Click image for larger version

Name:	mwe17.9..PNG
Views:	1
Size:	9.5 KB
ID:	1410736



        The local macro shtname`i' contains the sheet name, but not in a format that is compatible with Stata's variable name conventions. It contains spaces and hyphens, such as "B-Bureaucracy Stability".

        Therefore, I try to use subinstr twice to clear the macro of these characters, and subsequently
        Code:
        rename year `shtname`i''
        However, I must be getting the quotes wrong, because neither of the following work (and using double quotes just stores the literal subinstr(...) as the local macro):
        Code:
        local shtname`i' subinstr(shtname`i', " ", "", .)
        local shtname`i' subinstr(shtname`i', "-", "_", .)
        
        local shtname`i' `subinstr(shtname`i', " ", "", .)'
        local shtname`i' `subinstr(shtname`i', "-", "_", .)'
        so, how can I replace the local macro with a subinstr function of itself?

        Also, how can I display it, because while macro list lists all the sheet names correctly (with spaces and hyphens), di `shtname`i'' gives an error.
        Last edited by Max Piper; 17 Sep 2017, 02:31.

        Comment


        • #5
          Your commands in the final block are not what you think they are. When you write
          Code:
          local shtname`i' subinstr(shtname`i', " ", "", .)
          you are changing shtname`i' to the literal string "subinstr(shtname`i', " ", "", .)" which is not what you want. You want to actually evaluate subinstr(...) and replace shtname`i' with the result. TO do that you need an equals sign between them:

          Code:
          local shtname`i' = subinstr(shtname`i', " ", "", .)
          But, in any case, you don't need to do these multiple replacements. There is a function, strtoname() that will take an arbitrary string and convert it to the closest possible legal variable name in one fell swoop.:

          Code:
          local shtname`i' = strtoname(`"`shtname`i''"')



          Comment

          Working...
          X