Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    This is the point: Yes, but (only) by save. (It would be interesting to be wrong on this.)

    Comment


    • #17
      I'd be very glad if you could find out to be wrong. My nice ado file would just be perfect with this last little problem solved

      Comment


      • #18
        I am just sitting here waiting for refutations, but not expecting any.

        Comment


        • #19
          I think your code snippet can be retrofitted to make use of preserve/restore and reduces to
          Code:
          * about to change the data in memory
          preserve
          
          keep `gr' `varlist'
          if strpos("`md'", "m") == 0 {
              bysort `gr' `varlist': keep if _n == 1    // drop duplicates
          }
          isid `gr' `varlist', sort
          
          .... (Program syntax)
          
          * save work
          tempfile work
          qui save "`work'"
          
          * restore original data
          restore
          
          merge m:1 `gr' `varlist' using "`work'", assert(match) nogen
          The many to one merge is required in the case where duplicates were removed. If the program stops at
          Code:
          isid `gr' `varlist', sort
          then that means that there is a logical error in your program. Your strategy of using
          Code:
          by `gr' `varlist': gen byte `temp' = _n
          [...]
          merge 1:1 `gr' `varlist' `temp' using "`orgfile'"
          will just mask the problem discussed in "Why does my do-file or ado-file produce different results every time I run it?" as the order of observations will be random within `gr' `varlist' groups (when you generate `temp')

          Comment


          • #20
            Hi Robert,

            there is no problem with the data, there is only a problem with the NAME of the data set in memory. Using preserve/restore just to solve this has a high prize in terms of runtime.
            So I think it is a smaller problem to tell the user that he has to save the active file with an appropriate name in an appropriate directory than to cause a considerable increase in runtime (the program already lasts about 40 minutes with my data).
            Its just that I don't like it!
            Therefore I hope Stata will create a solution for renaming data sets without really saving them for one of the next Stata updates.

            Greetings, Klaudia

            Comment


            • #21
              The answer to your question is: if the user has done changes to the original file (e.g. created a new variable with generate) these would be lost if I merge my files to the original file.
              I do not think this is necessarily so and Robert shows the basic setup. I still have doubts about the time penalty. Have you actually tried this even once? Also this reasoning seems questionable regarding the statement

              Without a preserve statement, I risk to lose the active dataset in the memory, that is true. But the dataset on disk will still be available
              This risk seems very inconvenient for any user to take, who will in that case have to reload her orininal (very large) dataset, and additionally do all the changes that were not previously saved - which were your concern in the first place. It will also take much (much ...) more time to repeat the steps described before and re-run the entire program.

              Bottom line, it seems to me as if the preserve/restore approach will be both, saver and (therefore) also faster in the long run or under more conditions.

              Best
              Daniel
              Last edited by daniel klein; 02 Mar 2015, 11:41.

              Comment


              • #22
                Klaudia,
                what if your data in memory was never saved? So that the c(filename) was empty to begin with?
                Perhaps it is simpler if your command works not on data in memory, but on a file, such as:

                Code:
                process using "filename.dta", replace
                Setting the c(filename) from Stata syntax is possible. The help file tells you how.

                Regards, Sergiy Radyakin.


                Comment


                • #23
                  Originally posted by Klaudia Erhardt View Post
                  there is no problem with the data, there is only a problem with the NAME of the data set in memory. Using preserve/restore just to solve this has a high prize in terms of runtime.
                  I did not say that there was a problem with your data, I said there may be a problem with your code. If execution time is such a concern, why are you dropping variables and perhaps observations, saving a copy of the data in memory and going through the expense of a merge when you could just create your two variables with the data in memory without dropping variables or observations?

                  Comment


                  • #24
                    @ Sergiy Radyakin.

                    Hello Sergiy,

                    this sounds interesting:

                    Originally posted by Sergiy Radyakin View Post
                    Setting the c(filename) from Stata syntax is possible. The help file tells you how.
                    But neither c(filename) nor c(return) nor help filename nor help set nor help set filename leads me to the answer.
                    Could you please offer me some hints how to set the c(filename) from Stata syntax or where to find information on it exactly?

                    Thanks and greetings, Klaudia

                    Comment


                    • #25
                      Maybe Sergiy thinks it is a bad idea to change this global (and I would be careful playing around with this, too), but the help file for creturn states that

                      c(filename) is equal to $S_FN
                      and you know how to define global macros.

                      As a last statement to this thread from my side, you as the author of your command are responsible for the way it is set up and how it works, but I would seriously reconsider some of the advice given here by arguably experienced Stata programmers.

                      Best
                      Daniel
                      Last edited by daniel klein; 03 Mar 2015, 04:53.

                      Comment


                      • #26
                        Hurrah, that was the solution!
                        To include
                        Code:
                        global S_FN = <the name of the file at the start of my program>
                        at the end of my syntax
                        Only little fly in the ointment: if a data window was already open before executing the program, the caption still shows the tmp-name. But my main preoccupation was that the user might not notice the tmp-property of the file, save the result thinking she was saving under the old name and never finding the result file again - this is solved now.

                        Regarding the advice given to other points of my code, like using preserve/restore or, what Sergiy said - what if the data in memory was never saved? - I will consider ervery thing thoroughly. These were additional issues to my original question, that's why I did not give them as much attention as I ought to. But now I will do so

                        Many thanks to everyone of you spending time and brains on my question!

                        Greetings, Klaudia

                        Comment


                        • #27
                          Sorry to be negative again, but although you appear satisfied, I remain totally puzzled at what you are trying to do here. It is a complete mystery to me why you should want to change that global. Either that is pointless or it is dangerous; I can't see a third possibility, namely that it is useful. If and only if you understand every use that Stata's official code can make of that global in its ados and in its compiled code could it possibly be a sensible idea to tinker with it.

                          StataCorp have remained quiet here but at a minimum you could ask tech-support as a check.

                          So, I can't let you sign off on this thread without flagging that I am not convinced that you have a good idea and I fear that you have a very bad one. Daniel's last comment in #25 is very carefully worded and well worth pondering.

                          Comment


                          • #28
                            Well, Nick, I will ask tech-support about the dangers of changing the filename of the data in memory by changing $S_FN - there might be if some instances or features of Stata keep the previous temp file name instead of updating it to the newly assigned name.
                            As to Daniels comment in #25 - and those of other experts in this thread - I had hoped to have answered it in #26

                            Greetings, Klaudia

                            Comment


                            • #29
                              Just to let you know how this issue continued:

                              waiting for the answer of tech support I wrote a new version of my program, with preserve and restore, and it seems to do exactly what I want. The runtime is about 15% longer though, so I'm thinking about including a 'fast'-option without preserve.

                              Thanks to all of you for your critical comments that put me on the way to modificate my program to comply better with data security issues !!!

                              Apart from this, I'd appreciate if Stata will offer a safe possibility to set the name of the dataset in the memory by syntax.

                              Greetings, Klaudia



                              Comment

                              Working...
                              X