Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop over different files in a folder, modify them, and append the resulting files.

    Hello,

    I have different files (.dta) in folder "C:\Users\Desktop\Stata\Data". I would like to loop over all the files in that folder, keep some variables, and append the resulting files.

    I do the following loop, in which I try to pick every file in the folder, select variables "var1", "var2", and "var3", save the new file with the selected variables with a new name (the original name of the file plus an a), and append the resulting files in a new dataset call List1. It does not give any error, but it's not working, it's not producing anything. Any idea?

    Code:
    clear all
    cd "C:\Users\Desktop\Stata\Datos"
    
    local files: dir "C:\Users\Desktop\Stata\Datos" files "*"
    
    foreach file in `files' {
        use `file'.dta, clear
        keep var1 var2 var3
        save `file'a.dta
        append using `file'a
        save List1.dta, replace
        erase `file'a.dta
    }
    Last edited by Diego Malo; 02 Nov 2022, 09:56.

  • #2
    I don't have the file structure to test this, but what you want is
    Code:
    clear *
    
    cd "C:\Users\Desktop\Stata\Datos"
    
    local files: dir ".\Datos" files "*"
    
    foreach file of loc files {
    
    
        use `file', clear
    
    
        keep var1 var2 var3
    
    
        sa `file', replace
    
    
        ap using `file'
    
    
        sa List1.dta, replace
    
    
        erase `file'
    
    }
    Last edited by Jared Greathouse; 02 Nov 2022, 10:07.

    Comment


    • #3
      Thank for your answer Jared Greathouse . The only different from your code to the mine is that instead of "in" you put "of loc". What's the difference? On the other hand, I want to save the modify files with a different name, not with the same, this is why I add an a in my code.

      When I run your code, it gives me the following error: "_"ben_r3.dta invalid name". ben_r3 is one of my dta files in my folder.
      Last edited by Diego Malo; 02 Nov 2022, 10:13.

      Comment


      • #4

        Code:
        local files: dir "C:\Users\Desktop\Stata\Datos" files "*"
        collects the names of all files, including .dta files, and anything else including extensions. So, you may need 1. to specify
        .dta files ONLY 2. not to spell out the .dta attachment in your loop.

        Comment


        • #5
          Thank for your answer Nick Cox

          According to your answer, I have tried:

          1) I collect all files of my folder, and I do not spell out the .dta in my loop

          Code:
          local files: dir "C:\Users\Desktop\Stata\Datos" files "*."
          
          foreach file in `files' {
              use `file', clear
              keep var1 var2 var3
              save `file'a, replace
              append using `file'a
              save List1.dta, replace
              erase `file'a
          }
          It gives me the following error: _"ben_r3.dta invalid name. ben_r3 is one of the files of my folder, the first one.

          2) I specify .dta files:

          Code:
          cd "C:\Users\Desktop\Stata\Datos"
          
          local files: dir "C:\Users\Desktop\Stata\Datos" files "*.dta"
          
          foreach file in `files' {
              use `file'.dta, clear
              keep var1 var2 var3
              save `file'a.dta, replace
              append using `file'a
              save List1.dta, replace
              erase `file'a.dta
          }
          It's still giving me the same error.
          Last edited by Diego Malo; 02 Nov 2022, 11:13.

          Comment


          • #6
            I think it's the same confusion, at least to start with. I use fs from SSC, which is just a wrapper for the syntax you're using. (Strictly, it can do more, namely call that syntax repeatedly.)

            In one of my directories when I ask for *.dta I get all the names, including the extensions.

            The syntax you're using acts in the same way.

            You're writing as if the extension were stripped and you have to add it back within your loop. Not so.

            Code:
            . fs *.dta
            arab.dta            banana.dta          fingado.dta         jaccard.dta         workwiththis.dta
            autism.dta          banana_results.dta  genome.dta          original_data.dta
            
            . ret li
            
            macros:
                          r(files) : ""arab.dta" "autism.dta" "banana.dta" "banana_results.dta" "fingado.dta" "genome..."
            
            . local files : dir "." files "*.dta"
            
            . di `"`files'"'
            "arab.dta" "autism.dta" "banana.dta" "banana_results.dta" "fingado.dta" "genome.dta" "jaccard.dta" "original_d
            > ata.dta" "workwiththis.dta"
            With fs you can see immediately what you're getting, as well as having in a returned result.

            Comment


            • #7
              Even after you sort out the current error you are receiving, you will not get the result you are hoping for, because in each iteration of your loop, you currently save the file you read, then append it to itself, and save that as List1.dta -- with the result that at the end of it all, you will have a file which will simply contain two copies of the data in the last file that the loop processes.

              Also, as a minor simplification, since you are not actually saving each of your files (you are just erasing them immediately after appending), you can even give them the same name.

              So I would suggest the following code:

              Code:
              clear
              save List1, emptyok replace
              
              local files: dir "C:\Users\Desktop\Stata\Datos" files "*.dta"
              
              foreach file of local files {
                  use "`file'", clear
                  keep var1 var2 var3
                  tempfile myfile
                  save `myfile'
                  use List1, clear
                  append using `myfile'
                  save List1, replace
                  erase `myfile'
              }
              Last edited by Hemanshu Kumar; 02 Nov 2022, 21:29.

              Comment


              • #8
                Thank you both for your answer.

                Nick Cox When I do it fs * and fs *.dta I obtained the same files, so I do not see the difference between adding .dta or not (sorry). I am applying the code in a different way (removing and adding .dta) but I have still the same error: "_"ben_r3.dta invalid name"

                Code:
                . fs *
                ben_r3.dta  ken_r3.dta  moz_r3.dta  saf_r3.dta  tan_r3.dta
                gha_r3.dta  mad_r3.dta  nig_r3.dta  sen_r3.dta
                
                . fs *.dta
                ben_r3.dta  ken_r3.dta  moz_r3.dta  saf_r3.dta  tan_r3.dta
                gha_r3.dta  mad_r3.dta  nig_r3.dta  sen_r3.dta
                Hemanshu Kumar Thank you for the code. You are right, my code was wrong. I am applying your code but I have the same error "_"ben_r3.dta invalid name"

                Comment


                • #9
                  Can you run this and let me know what happens?

                  Code:
                  clear
                  save List1, emptyok replace
                  
                  local files: dir "C:/Users/Desktop/Stata/Datos" files "*.dta"
                  
                  foreach file of local files {
                      use `"`file'"', clear
                      keep var1 var2 var3
                      tempfile myfile
                      save `myfile'
                      use List1, clear
                      append using `myfile'
                      save List1, replace
                      erase `myfile'
                  }

                  Comment


                  • #10
                    Hemanshu Kumar when I run your new code it says the following:

                    Code:
                    file ben_r3.dta" "gha_r3.dta" "ken_r3.dta" "list1.dta" "mad_r3.dta" "moz_r3.dta" "nig_r3.dta"
                        "saf_r3.dta" "sen_r3.dta" "tan_r3.dta" not found
                    When I add to your code `' in files after the foreach, it says the same than usual "_"ben_r3.dta invalid name".

                    Comment


                    • #11
                      Are you sure you are doing

                      Code:
                      foreach file of local files {
                      and not something like

                      Code:
                      foreach file in `"`files'"' {
                      The second version will create this error, but the first should not.

                      Also, change the use command to:
                      Code:
                      use `"C:/Users/Desktop/Stata/Datos/`file'"', clear

                      Comment


                      • #12
                        Hemanshu Kumar changing the use command makes the code work! Thank you!

                        Could you explain a bit why the code works now? I have lost with so many " ``, etc... I do not know why putting all `"`.
                        Last edited by Diego Malo; 03 Nov 2022, 13:33.

                        Comment


                        • #13
                          #8 is no surprise. The wildcards *(any file name) and *.dta (any file name that has extension .dta) will yield in practice yield the same set of files whenever .dta is the only extension in the same way as "all animals in the room" and "all cats in the room" have the same answer if all the animals in the room happen to be cats.

                          My point was quite different and my mention of fs was to underline that it shows you its results and so makes clearer what is happening. Regardless of how you do it, either command includes the extensions in what is returned.

                          On the other points, Hemanshu Kumar seems to be helping you make good progress.

                          Comment

                          Working...
                          X