Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • making a do-file run interactively

    Hello,

    I wrote a do-file that needs to be applied to many different data-sets with the risk that some things are not done properly for some data-sets. To make sure that things do run properly I had an idea of displaying a set of information created by the do-file and pausing the run until someone reads that information and hits "Y", or "N" key to continue or stop the run. More precisely (this is a simplified example), I need to make sure that two variables, say a1 and a2 sum to 1, and my do-file displays whether or not this holds for every observation. In some data-sets, these variables are named differently. My do-file reads from an excel file to know which two variables need to sum to 1. However, I need to be sure that the appropriate excel files are read, and that the excel files are appropriately formatted etc. So, I came up with a solution that I would display the list of the two variables that are supposed to add to 1 in that particular data-set. At this point, I would like my do-file to wait for confirmation or rejection. If the list of the two variables is indeed correct, I would hit "Y" upon reading that display, at which point the do-file would continue its run.

    Is this possible? If so, how?

    Please let me know if the above does not make sense.
    Thank you!
    Best wishes,
    Nona

  • #2
    You can check -help pause- and -help exit-.

    But, why not include the condition within an -if- (see -help ifcmd-) and have it stop when the condition is not fulfilled? (Do you really want to lose time clicking "yes" even though things are fine?) You can also use -assert-. Another approach is let it all run, and at the end, have some report, variable, etc, say something about the datasets that were problematic. There are probably many other ways to go. Users with more experience with this will chip in, for sure.
    You should:

    1. Read the FAQ carefully.

    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

    Comment


    • #3
      Many thanks Roberto. All are great ideas!

      Comment


      • #4
        You could do it in a fancy way and let the do file pause. However, what would the user do when (s)he finds an error? I usually would break off the run and fix the problem. So, in practice I usually just use assert that exits the entire run when an error is found, and precede that with a display that gives identifying information on where a potential errror is (e.g. which file or which variable is being asserted)
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Thanks Maarten! A really useful idea!

          Comment


          • #6
            Nona, when you need to make sure that something is true, use assert as suggested by Roberto. However some things are not easily 'assertable'. Some of us might remember prompts "Insert next disk, press any key when ready" while other may still remember "Rewind the tape to start, press any key when ready" (substitute punch-cards if applicable). Such messages still may be necessary in some environments. Usually you want to wait until the user confirms it's ok to continue, or need to abort. Here is how you can code this in Stata:
            Code:
            window stopbox rusure "Do you want to continue?`=char(13)'Yes=continue; No=stop here."
            window stopbox note "Good choice!"
            Whether you should go interactive or assertive depends on how easily it is to formalize your 'appropriately formatted' requirement for input files.

            Never use this for numeric lists though. I just can't add 37 floating point numbers mentally by looking at them on the screen. If the sum has to be 1.0000 add them up and ask: "The sum of whaterver was supposed to be 1.000 but is actually 0.997. Do you want to continue?". Or "convergence achieved only with precision 0.0002, acceptable y/n?" Or "cubic term not significant, do you want to fit a quadratic model instead?".

            For the fans of assert, try to turn the following question to the user into an assert statement:
            "are all the labels on the graph nicely positioned and don't overlap, or should the graph be rebuilt with a smaller font?"

            In some cases you may want to do data collection with Stata, like "how many cows you see in this picture?"
            http://www.moillusions.com/wp-conten...usion-Cows.jpg
            or somewhat more known Rorschach test (it ok to see cows there as well, afaik).

            If you do data collection, you really do need the user's response, not the assert-able truth (because "Everybody lies" (C) House MD)

            Best, Sergiy Radyakin

            Comment


            • #7
              Originally posted by Sergiy Radyakin View Post
              For the fans of assert, try to turn the following question to the user into an assert statement:
              "are all the labels on the graph nicely positioned and don't overlap, or should the graph be rebuilt with a smaller font?"
              How about

              Code:
              assert "all the labels on the graph" == "nicely positioned"
              assert "should the graph" == "be rebuilt with a smaller font?"
              ​and you interactively comment/remove the assert when you are satisfied. Seriously, the OP is not doing data collection but rather data input, from Excel to Stata so there's no need to query the user that I can see.

              In terms of Stata workflow, if you run the same do-file more than once, it should give the same results. That's how you show that you can reproduce your results. Normally, you would not need assert statements in a do-file since you would, as it is written and work progresses, check that it is doing what you think. If an Excel file is imported, you would check that the import is complete and correct before writing additional code in the do-file. However, the OP says that a

              do-file that needs to be applied to many different data-sets
              Presumably, this means that the do-file runs on data currently in memory (really bad idea in terms of reproducibility) AND/OR that arguments are passed to the do-file (either in the do statement or using globals). While it is possible to write a good do-file that repeatedly calls a do-file/ado in a reproducible way, I suspect that this is not the case here. The OP says

              However, I need to be sure that the appropriate excel files are read, and that the excel files are appropriately formatted etc.
              which suggests that the motivation is to avoid re-running upstream code if all went well up to that point. I would recommend that the OP (and anyone concerned about reproducibility) check out project, available from SSC. To install project, type

              Code:
              ssc install project
              I wrote project to manage my workflow in Stata and have been using it for everything I do for more than 5 years and it's rock solid. It encourages a modular workflow in Stata and addresses the concern of not wanting to re-run upstream code. As far as I know, it is the only tool currently available in Stata that can actually check that results are reproducible.

              Comment


              • #8
                Originally posted by Robert Picard View Post

                How about

                Code:
                assert "all the labels on the graph" == "nicely positioned"
                assert "should the graph" == "be rebuilt with a smaller font?"
                ​and you interactively comment/remove the assert when you are satisfied.
                That's the whole point of assert that it must be executed and must hold true under desired behavior. Otherwise you are running the code without an assertion. Compare to

                Code:
                program define foobar
                  syntax anything
                
                  assert `anything'*3==99
                  display "The product is ninety nine!"
                end
                
                foobar 11
                We get an error whatever for assert. Sure, we meant instead that the multiplier must be 9, not 3:

                Code:
                program define foobar
                  syntax anything
                
                  assert `anything'*9==99
                  display "The product is ninety nine!"
                end
                now we delete the assert
                Code:
                program define foobar
                  syntax anything
                  display "The product is ninety nine!"
                end
                and execute it as
                Code:
                foobar 12345
                Would an original assert hold? No. You can only assert what is certain and known. Unless you know the argument is always going to be 11, you can't possibly assert the product is going to be 99. Anything coming from outside, whether as file, internet resource, realtime clock, installed software, etc, is essentially environment. That's why assert should not be removed from the code when such uncertainty exists. Ask StataCorp. Despite many versions and many developers and rigorous testing every Stata that I've seen ships with the code that can stop at a certain moment with a message similar to "Assertion is false, contact the developers and let them know what you were doing". The messages of course vary, depending on which features were added (if you do check on me, ask "what is DSA having?" - I am dying to know). If the code is assert-able as you insist, there would be no need to leave these asserts in the production code. On the contrary, the message "something happened that should not" is evidence that not everything is foreseeable or worth investigating if the probability of it happening is low (or unknown but deemed low).

                Deleting an assert that can still fail is not a good idea.
                assert "all the labels on the graph" == "nicely positioned" Doesn't make sense to me. It will always fail. Always. Even if the labels are nicely positioned. Just replace it with a stop statement then.

                In Nona's case:
                I need to make sure that two variables, say a1 and a2 sum to 1
                it is clearly a job for assert. No doubts and no argument here.
                However,
                the excel files are appropriately formatted etc
                is not so obvious. You can assert the number of sheets, rows, and columns, emptiness of a cell, or presence of a header. But unless "appropriately" is defined you can't assert it.

                I am not using -project- myself, but I imagine its something like a home-made ReSharper for Stata, with the difference that it relies on user's descriptions (inputs and outputs of each logical block). If -project- can determine whether particular variable can be used by a certain code without a user's hint, let me know, as I requested this -vartouch- function 173 years ago from StataCorp (similar to 'dataset changed' flag, but on the varible, and not necessarily changed but values read or written). It seems like it is not possible in principle unless the whole code is documented, and all users follow the same convention of declaring all this in every ado file they write. (with Mata it is a bit more strict, then in Stata, since .do/.ado files can be generated on-the-fly, but generating .mlibs on-the-fly is also possible, so not going to happen).

                Best, Sergiy

                Comment


                • #9
                  Robert, Sergiy,

                  Thanks for very interesting discussion.


                  do-file that needs to be applied to many different data-sets



                  Presumably, this means that the do-file runs on data currently in memory (really bad idea in terms of reproducibility) AND/OR that arguments are passed to the do-file (either in the do statement or using globals). While it is possible to write a good do-file that repeatedly calls a do-file/ado in a reproducible way, I suspect that this is not the case here. The OP says
                  I did not refer to the data in memory. I agree that reproducibility is key, so the do-file starts off with raw data and the arguments are passed using globals as well as some reports about the data. These reports however may be written in different formats, hence the worry that Stata's reading of them might be off sometimes. Not because Stata has a problem, but because the authors of those reports are not necessarily concerned about how my do-file will be able to understand what they wrote.

                  Once again, thanks for your inputs. I really learned a lot from this thread!

                  Best wishes,
                  Nona

                  Comment

                  Working...
                  X