Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Capturing Excel sheet names and variable names in a loop

    Hello,

    I have a .xlsx file that I want to import into Stata. The file has 50 sheets for 50 U.S. states. The first column for all the 50 sheets is a list of variables and the other columns (after dropping some rows like the current # 2, 5, 6, and 7) are the annual values for the variables. Please find the attached screenshot.



    The annual data available varies at the state level. Some states have data from 1949 onward while others start from 1952. There are also missing years for some.

    What I want to do:
    1. Import all 50 sheets into Stata, maybe as tempfiles.
    2. Make current row # 4 as variable names representing each year, for example, as _1949, _1950, _1951, etc.
    3. Reshape from wide to long. Before that, I will replace the values in row # 1 to fit as variable names.
    4. Merge the 50 temp/files.
    I especially need help with steps 1 and 2. Specifically,
    1. Is there a way to create a loop such that I can import all the sheets without having to type the sheet names in the loop heading? Is there a way to automatically capture the sheet names? I will be doing this to multiple Excel files, some of which may not have all the U.S. states, so having a way to capture only the available sheets will be helpful.
    2. As said, for each state/sheet, within the loop, I want to set the values in row # 4 as variable names (as _1949, _1950, _1951, etc.). For this, I found the following loop on this website. But with varying years for many states and with missing years for some states, how can I capture only the available column names and automate it without having to specify the column names?
    Code:
    foreach j in B-BX {
       local try = strtoname(`j'[1])
       capture rename _var`j'  `try'
    }
    Thanks so much!
    Last edited by Sam Bennett; 05 Feb 2024, 12:27.

  • #2
    What are the names of the sheets? If state abbr (or names), then you could loop through them, but will have to list them out. Something along these lines. The append makes it a long format. Variable names must be the same.

    Code:
    foreach st in al ak az ar ca co ct de dc fl ga hi id il in ia ks ky la me md ma mi mn ms mo mt ne nv nh nj nm ny nc nd oh ok or pa pr ri sc sd tn tx ut vt va vi wa wv wi wy {
       import excel using XXXX, sheet(`st') firstrow clear  [YOU CAN SET THE RANGE USING CELLRANGE]
       save `st'_data, replace
    }
    
    clear 
    set obs 0
    foreach st in al ak az ar ca co ct de dc fl ga hi id il in ia ks ky la me md ma mi mn ms mo mt ne nv nh nj nm ny nc nd oh ok or pa pr ri sc sd tn tx ut vt va vi wa wv wi wy {
       append using `st'_data, force
    }
    save datafinal, replace

    Comment


    • #3
      Originally posted by George Ford View Post
      What are the names of the sheets? If state abbr (or names), then you could loop through them, but will have to list them out. Something along these lines. The append makes it a long format. Variable names must be the same.

      Code:
      foreach st in al ak az ar ca co ct de dc fl ga hi id il in ia ks ky la me md ma mi mn ms mo mt ne nv nh nj nm ny nc nd oh ok or pa pr ri sc sd tn tx ut vt va vi wa wv wi wy {
      import excel using XXXX, sheet(`st') firstrow clear [YOU CAN SET THE RANGE USING CELLRANGE]
      save `st'_data, replace
      }
      
      clear
      set obs 0
      foreach st in al ak az ar ca co ct de dc fl ga hi id il in ia ks ky la me md ma mi mn ms mo mt ne nv nh nj nm ny nc nd oh ok or pa pr ri sc sd tn tx ut vt va vi wa wv wi wy {
      append using `st'_data, force
      }
      save datafinal, replace
      Thanks, I'm aware of this method, but I was wondering if there's a way to automatically capture the available sheets using a standardized line of code so that I don't have to carefully check the states/sheet names available for different Excel files and change them accordingly each time. Thanks!

      Comment


      • #4
        It will be a mess, but you can add "describe" option to import excel and will store the file names in r().

        HTML Code:
        https://www.statalist.org/forums/forum/general-stata-discussion/general/1410697-import-excel-and-absorb-the-sheet-name-in-the-process

        Comment

        Working...
        X