Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop over different stata files in a folder

    Loop over different data files in a folder

    Today
    Hi Guys!

    I am trying to create a merged file for some selected variables in a panel survey. Some people might be familiar with this data set. The data are recorded with different prefix as a b c d .... representing different wave (years). For instance, the variables are recorded thus according to the different year of collection:

    ahid axhwght alewght alrwght aregion aregion2 ............. in separate file called ahhsamp
    bhid bxhwght blewght blrwght bregion bregion2 ..............in separate file called bhhsamp
    chid cxhwght clewght clrwght cregion cregion2 ...............in separate file called chhsamp
    dhid dxhwght dlewght dlrwght dregion dregion2...............in separate file called dhhsamp

    and so on till it reaches r

    I want to loop over these different files in the same folder and all the files are containing the variables I need but are bearing different prefix as shown. I tried this code:

    local files : dir "C\Users\Desktop\stata8\BHPS" files "*hhsamp.dta"
    global dir1 "C:\Users\Ebenezer\Desktop\stata8\new merge"

    foreach w in a d e f g h i j k l m n o p q r {
    use `w'hid `w'xhwght `w'lewght `w'lrwght`w'region `w'region2 using file in `files', clear
    renpfix `w'
    gen wave = strpos("abcdefghijklmnopqr","`w'")
    sort hid wave
    lab dat "hhsamp1-18, long format"
    save $dir1/hhsamp_junk1-18.dta", replace
    }

    I might not be entirely clear to some extent but I think you might understand what I am trying to do here. I will appreciate your help.

    Thanks in advance.

  • #2
    Please explain what your problem is. Your code is not reproducible so we are forced to guess. Section 12 in the FAQ contains helpful advice on the kind of information we need to help. Thank you.

    Comment


    • #3
      Several things:

      1. Your loop, even if it were otherwise correct, never actually carries out a merge (or, really it's an -append- that you want).
      2. The -in `files'- clause in your -use- command is a syntax error. -in- in most Stata commands designates a range of observation numbers, nothing else.
      3. Did you intentionally omit b and c from the list of potential values for `w' in your -foreach- statement?
      4. Are you actually using Stata version 8?

      Try something like this:
      Code:
      local files: dir "C:\Users\Desktop\stata8\BHPS" files "*hhsamp.dta" // NOTE : after C
      local dir1 "C:\Users\Ebenezer\Desktop\stata8\new merge" // LOCALS ARE SAFER THAN GLOBALS
      
      tempfile building
      save `building', emptyok
      
      foreach f of local files {
           local w = substr(`"`f'"', 1, 1)
           use `"`f'"', clear
           keep `w'hid `w'xhwght `w'lewght `w'lrwght `w'region `w'region2
           rename `w'* * // REQUIRES RECENT STATA; IF USING VERSION 8, -renpfix `w'-
           gen wave = `"`w'"'
           sort hid wave
           append using `building'
           save `"`building'"', replace
      }
      
      label data "hhsamp1-18, long format"
      save `"`dir1'/hhsamp_junk1-18.dta"', replace
      NOTES:
      If "C:\Users\Desktop\stata8\BHPS" is your working directory, then you can replace that by just "." in your -local files:- command.
      If that isn't your working directory, then you need to specify this directory as a full path name before `f' in the -use- command. -use- looks for files only in the working directory.
      When doing bulk combining of files like this, it is not uncommon to find that the coding of variables is not consistent from one file to the next, or that what is a numeric variable in one file is a string variable in another. -append- will complain in the latter situation, but in the former you may not find out about it until much later when some analysis produces inexplicable results. So before you run this, you should review the documentation for each of the files and verify that each variable is really recorded in the same way in each wave.
      Last edited by Clyde Schechter; 04 Dec 2015, 16:02.

      Comment


      • #4
        Thanks Clyde the explanations are very helpful.
        I am using stata 13. the stata8 in my directory can be ignored as it is just a file name and nothing more.

        Also I did not mean to omit b and c from the values of `w'. This must have been a typo when i was copying it onto this forum. I believe so much in this code. However, it tells me that 'file ahhsamp.dta not found' but i have it in this working directory folder specified as: "C:\Users\Ebenezer\Desktop\stata8\BHPS_origina l". I think i am not clear with the replacement with "." in my -local files-: command. I will appreciate your further clarification on this.

        Many thanks.

        Comment


        • #5
          I'm not entirely clear on how your files are organized. The directory you specify in the -local files:...- command has to be the same as the directory in which the files are located. Moreover, in the use command, you either need to specify a full pathname for the file, or the the files must be in the current working directory. -use- only searches the current working directory for files unless you specify a full pathname.

          It seems to me you have at least two directories in play here (the one with the *hhsamp.dta files, and the target directory where you want to save the result), and possibly a third--your working directory may be different. If you run the -cd- command Stata will tell you what the working directory is. If it's not "C:\Users\Desktop\stata8\BHPS", then that explains why -use- is complaining that it can't find the file. To make this entirely bullet-proof, so the code does not depend on what your working directory is, I would do this:

          Code:
          local originals "C:\UsersDesktop\stata8\BHPS" // OR WHATEVER THE RIGHT DIRECTORY IS.
          local files: dir "`originals'" files "*hhsamp.dta" // NOTE : after C
          local dir1 "C:\Users\Ebenezer\Desktop\stata8\new merge" // LOCALS ARE SAFER THAN GLOBALS
          
          tempfile building
          save `building', emptyok
          
          foreach f of local files {
               local w = substr(`"`f'"', 1, 1)
               use `"`originals'/`f'"', clear // NOTE USE OF /, NOT \, AS SEPARATOR!!!
               keep `w'hid `w'xhwght `w'lewght `w'lrwght `w'region `w'region2
               rename `w'* * // REQUIRES RECENT STATA; IF USING VERSION 8, -renpfix `w'-
               gen wave = `"`w'"'
               sort hid wave
               append using `building'
               save `"`building'"', replace
          }
          
          label data "hhsamp1-18, long format"
          save `"`dir1'/hhsamp_junk1-18.dta"', replace
          By storing the correct directory in the local macro originals, and then referencing that in both the -local files- command and the -use- command, we can assure that Stata looks for the files in the same directory in both situations.

          Comment


          • #6
            I am assuming what i should have is:

            local files: dir "C:\Users\Desktop\stata8\BHPS" files "*hhsamp.dta" // MY WORKING DIRECTORY, AND I USE STATA 13
            local dir1 "C:\Users\Desktop\stata8\new merge"

            tempfile building
            save `building', emptyok

            foreach f of local files {
            local w = substr(`"`f' " ', 1, 1)
            use `"`f' " ', clear
            keep `w'hid `w'xhwght `w'lewght `w'lrwght`w'region `w'region2
            rename `w' * *
            gen wave = `"`w' " '
            sort hid wave
            append using `building'
            save `"`building' " ', replace
            }

            label data "hhsamp1-18, long format"
            save `"`dir1'/hhsamp_junk1-18.dta" ', replace


            But i feel my -local files:- might be where i have to properly define in this situation?

            Comment


            • #7
              Well, you are not using compound double quotes correctly. On the right hand end, you have put a blank space between " and ' -- that is wrong. The " and ' should be immediately adjacent.

              It matters. Try this:
              Code:
              . local junk x
              
              . display `"`junk'"' // CORRECT
              x
              
              . display `"`junk'" ' // INCORRECT SPACE BETWEEN " and '
              x" '
              The extra punctuation marks at the end would pass a non-existent filename to the -use- command, and could account for your problem.

              Comment


              • #8
                You are a genius Clyde. I really appreciate your explanations. This worked fine. I didn't notice that the directory was not well defined. Both codes worked fine by making sure it is the right working directory.

                Many thanks

                Comment


                • #9
                  I am working on a similar issue. Thank you both for the illuminating conversation above.

                  As you might imagine, I have attempted to run the code, with modifications, and encountered errors. Specifically, I am running:

                  Code:
                  cd "Q:\mydir\"
                  
                  local files :dir . files "*"
                  display `files'
                  
                  foreach f of local files {
                      import excel using "`f'", sheet("Sheet1") firstrow
                      save `f'.dta
                      clear
                  }
                  Last edited by Moses Pounds; 08 Dec 2015, 12:41.

                  Comment


                  • #10
                    The flavor (SE) should make no difference in the code: flavor affects only things like size limitations for memory, etc., and SE is just as "large" as MP, differing only in the use of parallel processing. So no issues there.

                    I'm pretty sure that all of the syntax in #6 will still work the same way back in version 12.1. The only one I question is -rename `w' * * -, but according to the -help whatsnew- file, group rename was already introduced in version 12. So I think you will not have a problem with version compatibility.

                    Comment


                    • #11
                      Thanks! Glad to hear that 12 is able to support this.

                      I'm continuing to troubleshoot, and I think I've determined that the problem is that
                      Code:
                      local blah : dir . files "*"
                      is putting all of the file names into a single concatenated item, not a list over which I can loop. Any insight you might have would be appreciated!
                      Last edited by Moses Pounds; 08 Dec 2015, 13:15.

                      Comment


                      • #12
                        I think you need to show exactly the commands you used and exactly how Stata responded. DO NOT RETYPE THEM to do this. Copy them directly from Stata's Results window or your log file and past them into a code block in this Forum. Also, showing how Stata responds after -display `"`blah'"' would be helpful.

                        Comment


                        • #13
                          The commands are as follows:

                          Code:
                          cd "Q:\mydir\"
                          
                          local files : dir . files "*"
                          display `files'
                          
                          foreach f of local files {
                              import excel using "`f'", sheet("Sheet1") firstrow
                              save "`f'.dta"
                              clear
                          }
                          The output of the -display- command is
                          Code:
                          . display `files'
                          file1.xlsxfile2.xlsxfile3.xlsxfile4.xlsxfile5.xlsxfile6.xlsxfile7.xlsxfile8.xlsx
                          > file9.xlsxfile10.xlsxfile11.xlsxfile12.xlsxfile13.xlsxfile14.xlsxfile15.xlsx
                          [. . .]


                          The result of the code overall is not an error, but simply
                          Code:
                          .
                          end of do-file
                          
                          .
                          with no resulting .dta files saved in the specified directory.

                          Comment


                          • #14
                            The results of -display `files'- are indeed confusing, but arise because -display- gets confused b its contents. I get similar results in my setup.

                            Her is what's going on. The results of macro extended function -dir-, to quote from the help file:

                            The returned string will contain each of the names, separated one from the other by spaces and each enclosed in double quotes. If mname is subsequently used in a quoted context, it must be enclosed in compound double quotes: `"`mname'"'. [emphasis added]
                            So to see the content of macro files correctly you need

                            Code:
                            display `"`files'"' // NOTE USE OF COMPOUND  DOUBLE QUOTES
                            Also, to get the -import excel- command to run correctly, you need, once again, to use compound double quotes, since the value of `f' itself already contains quotation marks.

                            Code:
                            import excel using `"`f'"', sheet("Sheet1") firstrow
                            Read more about these in -help quotes-. In particular, it is good practice when programming in Stata to always use compound double quotes to delimit any string that can contain ordinary double quotes. If you are unsure whether a string might contain quotes or not, err on the side of surrounding it in compound double quotes. Using compound double quotes will never cause an error; failing to do so often does. The only downside of using them is that they create sequences of characters that are difficult for human eyes to parse.

                            Comment


                            • #15
                              Ah, that makes sense; thank you! I will certainly incorporate compound double quotes in the future.

                              I'm now able to import a file and manipulate it within the loop, but the code is hanging on the -save- command (please see above). I've tried several levels of quotation marks, to no avail. I suspect that the problem is that, as you write above
                              the value of `f' itself already contains quotation marks
                              and these are not permitted in file names, whether .dta or otherwise. (I'm running Windows 7.)

                              Comment

                              Working...
                              X