Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Saving dataset with the same name as the data in the memory

    Hi,

    I want to read a data set into stata do a few steps on it and save it again with the same name. (actually I want to do the same on several data sets using a foreach loop, but that is not important or the problem that I am facing - so not bringing that issue here).

    The following code works fine [Windows 7, SP1, 4gb memory]
    Code:
    version 12.1
    drop _all
    set obs 10
    gen x=1
    save x, replace //upto this is just to create the example dataset
    use x, clear
    gen y=2
    save, replace
    This saves/replaces the data set x.dta in the working directory. But I am facing the problem when I am trying to use a sub directory within the working directory with the following code. The name of the sub directory is 'data' and it is directly within the working directory.

    Code:
    version 12.1
    drop _all
    set obs 10
    gen x=1
    save data/x, replace //upto this is just to create the example dataset
    use data/x, clear
    gen y=2
    save data/, replace
    What is happening is the last save command is saving a new dataset with the name ".dta", (while I want it to just replace "x.dta"). Is there anyway to solve this problem without specifying the filename in the last save command?

    Thank you.

  • #2
    To directly answer your question, if you just do -save, replace- without specifying the directory, you will get what you want. When the -save- command is issued without a filename specified, Stata will save its current in-memory data under the exact same filename that was specified in the preceding -use- command, including full path.

    That said, what you're trying to do is a terrible idea. Your work becomes irreproducible because you have destroyed the data you started from. It would be a much better idea to either save the original data sets under different but related names first, or save the result data sets under different but related names. Never clobber your original data. (Even if these data sets are not the original source data, but some intermediate point in a chain of data management steps, it is still better to preserve every step of the chain, along with the do-files that created them, so that at the end you have a complete audit trail of your analyses.)

    Comment


    • #3
      Dr. Schechter,

      Thank you. But does that (in reference to the last sentence of your first paragraph) mean that when the file name is not specified but a path is specified in the save command Stata will 'always' save the file without a filename?

      I am all for audit trails and reproducibility. But reproducibility for me is the original data (or the first dta data created from that original data), datasets created at major junctures and "all" do files. This particular command (or its actual loop version) prepares a number of smaller "derived" intermediate data (one variable per data set) for appending at a later step. I am not certain that I would benefit in any way from keeping or creating version2 of each of these data sets.

      Comment


      • #4
        Krishanu: I'm not sure you can accomplish what you want to do. And I personally wouldn't do it 'your way', as I don't think it's sufficiently transparent. If you're doing this many times in a loop, why not for example put the directory name into a local macro (where you want to save your temporary working file) and put the temporary working file's name into another macro and then refer to those?
        Code:
        local datadir `data'
        local dname `x'
        
        use `datadir'/`x', clear
        gen y=2
        save `datadir'/`x', replace
        The same thing would work using real names rather than macros but macros may help if you're looping

        Comment


        • #5
          Dr. Jenkins,

          I am doing exactly like you are saying - up to the point of using both the local macros that you specified (did not mention the macro as that was working fine). It is just the last save command where I wanted not to type in the file name macro - just wanted to know if the save command would use the filename from the -use- command even when a directory path is specified (using a macro or otherwise). Looks like it won't. Therefore I would now type in the file name macro in the last save command (as in your code).

          Thank you.
          Last edited by Krishanu Karmakar; 13 Jul 2014, 11:01.

          Comment


          • #6
            If you issue a -save- command with a path specified but no filename, you will get just what you got: a file saved under the name .dta. If your purpose is to overwrite the file you previously -use-d, then just specify -save, replace- with no filename or path information and you will get what you want. Once you specify something in the filename position in the command, Stata assumes that you have given the complete name and does exactly what you said, which is not, it seems, what you meant.

            Regarding the files being only intermediate files of a temporary nature, not worth saving, it might make more sense to use a tempfile--but having not seen the larger context of your project and code, I can't assert this strongly.

            Comment


            • #7
              Oops. Hit "Post Reply" when I meant to hit Preview.

              To continue my last thought, I guess it seems paradoxical that you care so much about the name under which these files are saved if their content is so ephemeral and disposable.

              Comment


              • #8
                Dr. Schechter,

                In reference to your post #6, you reply (and Dr. Jenkins') confirmed my hunch about the behavior of the -save- command. I have read superficially about tempfile, but never tried to understand them deeply or used them till now. I will have to look into this.

                Let me try to explain why I am doing what I am doing the way I am doing it. My attempt to preserve the file names is because of the local list of names that I create at the first step within my do file (not detailed here) which I can then reuse to append the files at a later step and then erase these intermediate single variable files. This way I can use the same local list to
                1. create these files from the original dataset,
                2. merge them into dataset X,
                3. change the name of the variables within each of the inidividual files from step 1 to (say) "abc", and then
                4. append the individual data sets into dataset Y - all with the same local list without creating a new local list everytime.

                Each of the above four tasks are done using -foreach- loops. I can of course (as you suggested) give these intermediate files a new name like `list'_new.dta, where `list' refers to the items of the local list I am referring to and then keep on referring them as `list'_new.dta in subsequent steps (instead of keeping the `list'.dta names for the next steps). (So the main reason is as following==) But since I am erasing all of them anyway so I did not bother to give them a new name.

                Anyway, I have to look into -tempfile- and how they can aid my work. Thank you for mentioning it.

                Comment

                Working...
                X