Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loops and saving files

    Dear fellow Stata users,

    I am trying to apply same commands to different datasets. I am currently working on 4 quarterly datasets. Using

    Code:
    cd "yourfolder" local files: dir . files "*dta" foreach file of local files { use "`file'", clear //do something }
    However, when I press saving after this, STATA only saves results from the last quarter. But, I can see by looking at deleted observations, real changes etc that commands apply to all 4 quarters but I am not sure how to save it all. What should I add to my command?

  • #2
    Your description isn't entirely clear to me. But I think I know what you are trying to do.

    Here's what you've actually done: You have read in all the *dta files in "yourfolder", done some calculations with them, and then, in every case but the last, you have overwritten all of that work by reading in the next file. Only the last file does not get overwritten, so its results are subsequently saved when you "press saving."

    What I think you want to do is build up a file that retain all the results along the way. Now, it is not entirely clear whether you want to build that file up "side by side" or "vertically." I'm going to assume you want to stack the results from the four files "on top of each other."

    Code:
    cd "yourfolder"
    local files: dir "." files "*dta"
    tempfile building
    clear
    save `building', emptyok
    foreach file of local files {
        use `"`file'"', clear
        // do something
        append using `building'
        save `"`building'"', replace
    }
    At the end of this code you will have in memory the results for all of the files you read in, stacked "vertically." You can then do further operations on this data, or you can now save it with a real filename. (Do not now save it as `building', because `building' will automatically self-destruct after the do-file finishes running. Give it a real, permanent, file name.)

    Note: Not tested, beware of typos or other errors.

    If you need to build this file "side by side" instead of vertically stacked, the code is different. It will involve using -merge- instead of -append-, but it also will require some other modifications because -appending- an empty file is legal in Stata, but -merge-ing an empty file is impossible. If you are in this position and can't figure out the modifications needed, post back and I'll be happy to show you.
    Last edited by Clyde Schechter; 29 Jan 2021, 13:09.

    Comment


    • #3
      Note that this topic was previously posted and responded to at

      https://www.statalist.org/forums/for...tiple-datasets

      which is where the code in post #1 originated.

      Comment


      • #4
        Thank you, Clyde, your code worked perfectly!

        Comment


        • #5
          Hello,

          I am new to Stata (especially loops) and I am hoping someone could help me understand what I am doing incorrectly and provide some suggestions.

          I work with very large datasets that require sampling by year to build a master dataset. So far, I have been doing this manually (sampling by year over 18+ years years, saving, then appending-very time consuming). I want to learn how to use loops to do them for me: I want Stata to access each year of a dataset, sample 20%, keep a list of variables, then save it as "sample 20". Then, if possible, I would also like to have the datasets appended together after they have been saved separately. I tried to work with the code provided above, but I keep getting errors ('no variables defined).

          Code:

          cd "C:\Users\Elena\Desktop\practice for rdc\New folder"
          local files: dir "." files "*dta"
          tempfile sample
          clear
          save `sample'
          foreach file of local files {
          use `"`fullsamp*'"', clear
          sample 20
          keep personid year prov marst
          append using `fullsamp*'
          save `"`master'"', replace
          }


          Thank you in advance for the help!

          Comment


          • #6
            Well, the error message you are getting comes from the -save `sample'- command. There is nothing in active memory at the time this command is reached, and Stata does not permit you to save an empty data file unless you specify the -emptyok- option. So if you add that to that command, this error message will go away.

            But there are other errors further along in your code that will trip you up as well. -use `"`fullsamp*'"', clear- will give you an error message: you can only use one file at a time, not a whole list of files abbreviated by wildcards. Also, at least in the segment of code you show, local macro fullsamp is never defined. Similarly, -save `"`master'"', replace- is a problem because you have never defined local macro master. However, this time you will not get an error message. Instead, you will overwrite the last previously loaded file--which is not what you intend and will cause you to lose the data that was in it.

            Also, your problem is ill posed. You cannot save each 20% sample separately under the same filename "sample 20." If you want each sample saved separately, you must give each one a separate filename.

            Code:
            cd "C:\Users\Elena\Desktop\practice for rdc\New folder"
            local files: dir "." files "*dta"
            tempfile combined
            clear
            save `combined', emptyok
            foreach f of local files {
                use `"`f'"', clear
                sample 20, by(year)
                keep personid year prov marst
                save `"`f'_20"', replace
                gen source_file = `"`f'"'
                append using `combined'
                save `"`combined'"', replace
            }
            save combined_20_pct_samples, replace
            Note: If you want this code to give reproducible results every time it is run, you should include a -set seed 1234- (or whatever seed number you like) at the top.

            Comment


            • #7
              Hi Clyde,

              Thank you so much for your answer! Everything worked, except the saving each year of the sample 20. You mention this above and state that setting a seed will ensure a reproducible result instead of saving each file? As in, if I needed to recreate the dataset for whatever reason, I would just re-run the code instead of appending the years together.

              If I was to save each file, would that require a second loop inside the current one?

              Thank you again!

              Comment


              • #8
                Also, does the line of code 'sample 20, by(year)' refer to Stata accessing each dataset in the specified file location, which is split by year? Or is it sampling by year within a dataset?

                Sorry for all the questions, I am very new at this!
                Last edited by Elena Draghici; 29 Jan 2021, 16:58.

                Comment


                • #9
                  Thank you so much for your answer! Everything worked, except the saving each year of the sample 20.
                  I do not see why that didn't work. If you originally had a file called xyz.dta, after the code runs there should be a new file, xyz_20.dta, which contains the 20% sample (and xyz.dta will still be there, too). Are there no such files after you ran the code?

                  If I was to save each file, would that require a second loop inside the current one?
                  No, the code is already in that loop, and I can't see any reason it wouldn't have done the job. See response just above.

                  As in, if I needed to recreate the dataset for whatever reason, I would just re-run the code instead of appending the years together.
                  Yes, that's correct.

                  Also, does the line of code 'sample 20, by(year)' refer to Stata accessing each dataset in the specified file location, which is split by year? Or is it sampling by year within a dataset?
                  It's sampling by year within a data set. Once Stata is inside that loop, all the commands apply only to the one data set that is currently being processed at the time.

                  Comment


                  • #10
                    My bad, you are correct. All the sampled years are there along with the appended file

                    Thanks so much, Clyde!

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      Your description isn't entirely clear to me. But I think I know what you are trying to do.

                      Here's what you've actually done: You have read in all the *dta files in "yourfolder", done some calculations with them, and then, in every case but the last, you have overwritten all of that work by reading in the next file. Only the last file does not get overwritten, so its results are subsequently saved when you "press saving."

                      What I think you want to do is build up a file that retain all the results along the way. Now, it is not entirely clear whether you want to build that file up "side by side" or "vertically." I'm going to assume you want to stack the results from the four files "on top of each other."

                      Code:
                      cd "yourfolder"
                      local files: dir "." files "*dta"
                      tempfile building
                      clear
                      save `building', emptyok
                      foreach file of local files {
                      use `"`file'"', clear
                      // do something
                      append using `building'
                      save `"`building'"', replace
                      }
                      At the end of this code you will have in memory the results for all of the files you read in, stacked "vertically." You can then do further operations on this data, or you can now save it with a real filename. (Do not now save it as `building', because `building' will automatically self-destruct after the do-file finishes running. Give it a real, permanent, file name.)

                      Note: Not tested, beware of typos or other errors.

                      If you need to build this file "side by side" instead of vertically stacked, the code is different. It will involve using -merge- instead of -append-, but it also will require some other modifications because -appending- an empty file is legal in Stata, but -merge-ing an empty file is impossible. If you are in this position and can't figure out the modifications needed, post back and I'll be happy to show you.
                      Is it possible to do multiple folder operations inside a loop, like below:

                      Code:
                      foreach file of local files {
                      use `"`file'"', clear
                      // do something1
                      save `"`building'"', replace
                      // do something2
                      save `"`building'"', replace
                      }
                      I have a lot of datasets for which first I have to remove some variables, and then rename, and so on.



                      Comment


                      • #12
                        I don't understand the question. The code you show has nothing to do with multiple folders. You can, of course, do multiple things inside the loop and if you there is some reason to save the results part way through, and then later overwrite those results after you complete the tasks, well, yes, you can do that just as you show.

                        Comment

                        Working...
                        X