Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tempfiles are not being deleted automatically

    I am using STATA 13.1. I have been using tempfiles in order to merge 5 variables from dataset2 into dataset1. However when I end the do-file the tempfiles remain and are not deleted automatically and I cannot work out why.

    Here is my do-file code....


    use dataset1
    gen var...
    drop if ....
    tempfile dataset1_temp
    save " 'dataset1_temp' "

    use dataset2
    merge 1:m userid using " 'dataset1_temp' ", keepusing (var 1-5)
    tempfile dataset2_temp
    save " 'dataset2_temp' "
    end


    If anyone has any ideas why these tempfiles are not automatically deleted after closing the do-file I would very much like to hear them.

    Many thanks in advance.

    Claire.

  • #2
    (This should have been posted under General.)

    You saved your data files. Stata won't delete such files. If you don't want temporary files to be deleted at the end of the program, omit the save statements (which is the usual procedure).

    In merge processes, I think it is general (and more importantly better) practice to save important datasets in permanent files. If all goes well, you can delete files you no longer want when all is done. Temporary files aren't used here.

    (For "STATA" read "Stata"; read the Advice guide to the very end.)
    Last edited by Nick Cox; 10 Apr 2014, 09:19.

    Comment


    • #3
      Shouldn't this be on the "General" forum?
      -- Stas Kolenikov || http://stas.kolenikov.name
      -- Principal Survey Scientist, Abt SRBI
      -- Opinions stated in this post are mine only

      Comment


      • #4
        So what's etiquette on responding to posts that are in the wrong place? Respond as usual? Request that the original poster re-post in the correct place?

        I have some follow-up questions and comments about Nick's answer. I'll post them in this thread, with the recognition that continuing the conversation in the misplaced forum is a little weird.

        I don't understand how you could work with tempfiles without using save. My reading of the examples on page 271-272 of the programming manual are that tempfile followed by save as in Claire's example should create a saved file that Stata deletes upon completion of the dofile. I have no thoughts on why Stata is not deleting the files as expected.

        My question had to do with why it's best practice to use permanent files for merge processes. I can see the argument for files that you'll need for other purposes, but when doing data management tasks that involve intermediate data files that aren't set up in a way that's useful for anything other than putting together a new file through append or merge, what's the argument against using temporary files? I frequently find myself doing this for two reasons.

        1) My permanent data gets written to a network drive that has space constraints (and a backup system that does not immediately free up space upondeleting files). Using tempfiles allows me to save fragments that I don't need to keep permanently once they've been recombined in useful ways without having to keep track of which things go in my permanent network directory and which things need to go on a local drive and be deleted upon completion. As long as Stata exists normally my tempfiles seem to just get cleaned up with no additional action on my part.

        2) Using tempfiles means I don't have to come up with unique names for intermediate data I don't intend to keep. I worked with another analyst once who wrote dofiles that would write and delete permanent files with names like "temp" and "temp1". It turns out that this strategy works poorly if you're working in a shared environment or if you have the habit of running small chunks of dofiles without confirming that you're running the chunk that creates (or deletes) the right file. The first time we accidentally merged the wrong "temp" dataset as part of an import procedure, I switched to using tempfile much more heavily. There are obviously ways to get around these sorts of issues agreed upon naming conventions for permanent datasets. For data that's basically throw-away data once you've created the true final data set, though, it just seems easier to use tempfiles to ensure that the intermediate pieces get named uniquely and aren't available to accidentally be used by a different dofile.

        Comment


        • #5
          1) and 2) are good stories, each with a lesson. They could easily be the basis of a publishable Tip in the Stata Journal.

          It's very easy to use tempfile without save: I do it often in programs. For example, I produce a series of graph files and then combine them. The intermediate files don't matter.

          I was pontificating a little about "best practice". But imagine this. You are in the middle of a complicated merge and something crashes. How do you find the tempfile later?

          In essence, a merge is something important that I want to do just once and I want to be very careful about my data files. Naturally, there are other good ways of being careful about data files.

          More generally, I am happy if this exposes the fact that I only use tempfiles in certain ways.

          Comment


          • #6
            Originally posted by Sarah Edgington View Post
            So what's etiquette on responding to posts that are in the wrong place? Respond as usual? Request that the original poster re-post in the correct place?
            On most forums, the etiquette on this would be that a moderator would move the post to the correct place, perhaps with a 'soft redirect' in place which left a link to the post on the original (wrong) forum but moved the thread and its replies to the correct forum. I think that would work well here too.

            Comment


            • #7
              Sorry, but what I said earlier was nonsense, as I realised walking home from work.

              It's very easy to use tempfile without save: I do it often in programs. For example, I produce a series of graph files and then combine them. The intermediate files don't matter.
              A tempfile declaration is useless unless something is saved to that file. Temporary graph files are manifestly not an exception.

              On the original question of why a temporary file was not deleted: I offer a question of whether the program was interrupted before it finished.

              Comment


              • #8
                My philosophy generally is that if I'm in the middle of anything and something crashes, I want to start over at the beginning and work from the raw data anyway to make sure that the crash didn't cause some other unknown problem. I'll admit that there are times when using tempfiles is inconvenient and requires rerunning things I might otherwise not have to. On the other hand, in general I'm fairly inclined to go ahead and waste computing time in the interest of making absolutely sure a dofile runs from start to finish (and can be rerun to replicate results) so that's a trade-off that tends to bother me less.

                I still don't understand how to use a tempfile command without a matching save command, though. From a dofile, at least, the use of tempfile without save doesn't appear to do anything aside from create a macro with a filename.

                Take this as an example:
                Code:
                sysuse auto
                
                tempfile tempauto
                
                sysuse bplong
                
                append using `tempauto'
                If I run that, I get the error message "file C:\DOCUME~1\SEDGIN~1\LOCALS~1\Temp\ST_00000001.tmp not found"
                It appears that no actual tempfile is created unless I include the line save `tempauto' after the line where I declare the tempfile.

                Of course none of this gets at Claire's original question about why the tempfiles aren't being deleted. My experience is that they do generally get erased immediately, but when I looked at my temp directory there were still some old files that appeared to be Stata tempfiles hanging around. So clearly sometimes that isn't true. My best advice is that if it's important that the data be erased (for security or space reasons) to end your code with a line like capture erase 'dataset1_temp'. You'd need to repeat that for each tempfile created. Of course at the point where you have to manually delete tempfiles, you might as well just be writing permanent files.

                Comment


                • #9
                  Thank you all for your input. Your suggestions are very helpful. I apologies for posting in the wrong place.

                  Comment


                  • #10
                    Originally posted by Sarah Edgington View Post
                    My best advice is that if it's important that the data be erased (for security or space reasons) to end your code with a line like capture erase 'dataset1_temp'. You'd need to repeat that for each tempfile created. Of course at the point where you have to manually delete tempfiles, you might as well just be writing permanent files.
                    Another alternative is to set the location of your temporary files to an encrypted disk or partition. I have the following line in my .bash_profile:
                    alias secure-stata='env TMPDIR=/Volumes/scratch/tmp open -n /Applications/Stata/StataSE.app'
                    where scratch is an encrypted disk image, which makes it easy to obtain a "secure Stata session" when I need it (see this FAQ for Windows instructions).

                    On OS X, you can use Disk Utility to create an encrypted disk image. On other platforms, you can use TrueCrypt or any of a variety of other similar utilities.
                    Last edited by Phil Schumm; 11 Apr 2014, 04:04.

                    Comment


                    • #11
                      I muddied the waters here with various earlier posts. Let me try to tidy up (and correct some sloppy or incorrect statements made earlier).

                      tempfile by itself does nothing, I conjecture, except identify for you a legal filename in a directory or folder in which you have write access.

                      Code:
                      di "`c(tmpdir)'"
                      shows where that is. A tempfile declaration in itself is harmless, but the file itself won't exist (at least won't exist usefully) until you write something to it. Usually that means a save, but that is not the only way you can populate a file. For example, after a tempfile declaration you can copy a text file to a temporary file, Another example is saving a set of label definitions to a do-file which you run after a data restructure. Another example is a graph file. Naturally, that last implies a saving() option, so the distinction between save (specific command) and save (general word for putting stuff into a file is vital here.

                      Comment

                      Working...
                      X