Version control of user-written ados

diana gold

Join Date: Jun 2019

Posts: 23
#1

Version control of user-written ados

06 Nov 2019, 07:27

Hi,

In a project with multiple collaborators using various computers, I'm having trouble with version control of a user-written ado command - let's call this command userwrittenado - available in ssc. In the master run do file of the project, all required ado commands from ssc are installed through this loop:

Code:

* Loop over all the commands to test if they are already installed, if not, then install foreach command in userwrittenado otherado1 otherado2 otherado3 { cap which `command' if _rc == 111 ssc install `command' }

For the collaborators with version 1.0 of userwrittenado, the project works fine. However, collaborators with version 2.0 of userwrittenado run into errors later on. Since anyone who didn't have the older version installed get the newer version from ssc in the loop above, this is bound to be trouble.

The userwrittenado is neatly written and it supports its old syntax. I intended to use this, but can't figure out how to store the version that is displayed after which into a macro. Here is a pseudo-code of what I would ideally want to do:

Code:

which userwrittenado * HELP: how can I retrieve what is displayed by which??? local displayed_by_which if `displayed_by_which' == "*! userwrittenado 1.0 1jan2001" { * old command, can just use old syntax userwrittenado, manyoptions } else if `displayed_by_which' == "*! userwrittenado 2.0 1jan2002" { * new command, must adapt the syntax by adding old as an option userwrittenado, manyoptions old }

Is it possible to fix the above pseudo-code so that I can store what is displayed by which into a macro?

I understand I could just hammer into my project the old version of the ado. But would prefer to do this more neatly.

More generally, any advice on how to manage versions of userwritten ados for replicability is very much appreciated!

Thank you for your time,

Diana
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35403
#2

06 Nov 2019, 07:52

You're coy about the details but diplomacy does not help here. The details could be integral to a solution.

I would just say: rewrite the command in question so that all your collaborators can use the same version. Or distribute copies of version 1.0 to all your collaborators, so that everyone is using the same stuff. (It would be prudent to use a new name to cut down on clashes.) (Why install the code afresh each time any way?)

As an occasional contributor to SSC, I am very happy if people want to take my code and hack and share it around privately between consenting adults. I can't see that any way.

What gets me annoyed are things you are not proposing to do, namely if and when people borrow my code and then claim in public that they wrote the command themselves. I imagine that's a widespread attitude to such borrowing, which is naturally plagiarism.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3818
#3

06 Nov 2019, 07:57

Diana might be interested in one or both of these commands.

Edit:

In my opinion, Nick points the only way to really ensure reproducible results: copy the respective version and share it with your collaborators. I tend to set up a main folder for a project, then create a subfolder, say, ados, into which I install all community-contributed commands that I use in the project.

I say that coping the version of userwrittenado that you use is the only way in the long run because the authors of userwrittenado might well decide to release version 3.0, which no longer supports old syntax. The downside is that you will not automatically benefit from any bug fixes that version 3.0 might include.

Best
Daniel

Last edited by daniel klein; 06 Nov 2019, 08:07.
1 like
Comment
diana gold

Join Date: Jun 2019

Posts: 23
#4

06 Nov 2019, 08:27

Thank you Nick for the suggestion clarifying the concerns around borrowing vs plagiarism.

Thank you so much, Daniel - those two commands are exactly what I need! Thank you for making the time to neatly document and share this work in your original post, it is super useful!!!

Best,
Diana
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4934
#5

06 Nov 2019, 08:33

I have incorporated large segments of other people's code into my own programs (including some of Nick's). I always ask for permission (almost always given) and provide acknowledgment in the code itself and/or help file.

I do not like being reliant on other's user-written programs. The authors might make some change without even realizing that they are zapping my programs. Further I can customize the code to optimize it for my purposes.

Sometimes SSC authors who update their programs give the old version a new name and put it on SSC, e.g. I have gologit29 for those who are condemned to using old versions of Stata. If that didn't happen in this case, you can take the old version and rename it userwrittenado1, making tweaks to the code as necessary. Or maybe write the authors with your problem and see if they can come up with an easy fix -- they may have accidentally zapped part of their code unintentionally. For example I've occasionally tweaked my code using commands that I didn't realize required Stata 15, and it was usually easy enough to rewrite them using code that does work in earlier versions of Stata.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
daniel klein

Join Date: Mar 2014

Posts: 3818
#6

06 Nov 2019, 10:12

Originally posted by Richard Williams View Post

Sometimes SSC authors who update their programs give the old version a new name and put it on SSC, e.g. I have gologit29 for those who are condemned to using old versions of Stata.

From a user perspective, I appreciate authors keeping older versions and making them available. However, the outlined approach is not as convenient as it could be. Not only do I, have to determine and download the appropriate version, but I also need to change the calls to the respective command in all of my old do-files because the command now has a new name. Therefore, I would, even more, appreciate an updated command that automatically calls the appropriate older version for me. If the appropriate version of the command depends on the version of Stata that I am running (or have set via version), this is easy to implement. For example, tuples (SSC) does this and so does ivreg2 (SSC). Because in most cases it really does not require a lot of effort, I would like to see more program authors adopt this way of version control.

Best
Daniel
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4934
#7

06 Nov 2019, 16:25

Stata is pretty good about supporting version control. Many/most user-written commands are not.

It might not be that hard to call, say, the Stata 10 version of a command. What might be a bigger hassle, though, is getting your help file to behave properly. I've never tried it, but is there an easy way for the help file to change depending on the version of Stata that is used?

It would be nice if more authors did as Daniel suggests. On the other hand, most authors are not professional programmers, nor do they get paid for this, so they are not going to put as much effort into their commands.

Roger Newson is one person who does go to heroic lengths to try to accommodate users of his programs who have different versions of Stata. See the text after "Downloading Roger Newson's packages from this website" at

http://www.rogernewsonresources.org.uk/stata.htm

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4934
#8

06 Nov 2019, 17:38

To be clear, official Stata commands like logit have had their syntax and ereturned results change across time. But, if you type something like

version 9
logit y x1 ...

you can be fairly sure the Stata 9 syntax will still work and the ereturned results will be the same as in Stata. (I think Stata sometimes even calls the old program.) This can be especially important for user-written post-estimation commands, that might expect to see results formatted the same as they were in an earlier version of Stata.

However, you can't count on user-written commands being so nice. If you type

version 9
gologit2 y x1 x2...

you can't count on gologit2 still supporting exactly the same syntax and ereturned results as it did way back when. As the author of gologit2, I haven't gone out of my way to zap somebody's do files, but I have also not hesitated to make improvements across time (probably the biggest improvement being support for factor variables and the margins command). But, gologit29 is still out there if somebody desperately needs it.

Also, a user-written program may have originally been written under Stata 10 and may therefore have a version 10 statement. But the program gets revised and now has a version 15 statement. Hence, all those people with Stata 14 or lower suddenly can't use the program anymore.

Like Daniel says, it would be nice if authors were more considerate of those that might be left behind when they change their programs. I tried to be nice by releasing gologit29, but I suppose I could have been even nicer by supporting version control as well as Stata does. But that would take work and many/most authors don't do it.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
daniel klein

Join Date: Mar 2014

Posts: 3818
#9

07 Nov 2019, 01:09

Originally posted by Richard Williams View Post

What might be a bigger hassle, though, is getting your help file to behave properly. I've never tried it, but is there an easy way for the help file to change depending on the version of Stata that is used?

Not that I am aware of. For the particular transition from Stata 9 (or lower) to Stata 10 (or higher), this is very simple: include one .hlp and one .sthlp file in the package. Stata 9, being unaware of .sthlp files will always call the .hlp file. Stata 10 (or higher) on the other hand, will first look for the .sthlp file and, thus, also always show the correct version of the help file. In any other situation, you would either need to indicate, in the help-file, that cerain features are only available in Stata version X (or higher) and/or include a link to different versions of the help files.

Originally posted by Richard Williams View Post

It would be nice if more authors did as Daniel suggests. On the other hand, most authors are not professional programmers, nor do they get paid for this, so they are not going to put as much effort into their commands.

Having authored some commands myself, I completely understand that argument. However, I also feel that this is one of the (or perhaps: the main) reasons why many users

Originally posted by Richard Williams View Post

[...] do not like being reliant on other's user-written programs.

As a consequence, they either reinvent wheels or copy a specific version of a community-contributed command, which makes it harder to benefit from

Originally posted by Richard Williams View Post

[...] author['s who do] not hesitated to make improvements across time

The topic of dependence on and version control for community-contributed commands pops up from time to time. Some are happy with the situation as it is, some accept the situation because they think that there is no way to improve it, and some have made suggestions on how to improve the current situation. Usually, the suggestions involve establishing some kind of standard -- set by the respective person. Sometimes that standard centers around yet more community-contributed commands -- written by the respective person. While I believe that setting and committing to a standard is the only possible way to improve the situation, StataCorp is really the only authority to set such standards. Ideally, a solution would not require authors to change their habits.

Best
Daniel
Comment
skolenik

Join Date: Mar 2014

Posts: 100
#10

07 Nov 2019, 10:51

Back to diana gold question -- it is marginally doable although depends on the goodwill of the developer to properly document the version in the ado file. People who have version 2.0 of their files on SSC can be expected to be in that niche though.

Code:

findfile regress.ado local where `r(fn)' file open regress_ado using `"`where'"', text read file read regress_ado line1 if substr("`line1'", 1, 16) == "*! version 1.1.0" { * work with 1.1.0 syntax } else if substr("`line1'", 1, 16) == "*! version 1.3.2" { * work with 1.3.2 syntax } file close regress_ado

-- Stas Kolenikov || http://stas.kolenikov.name
-- Principal Survey Scientist, Abt SRBI
-- Opinions stated in this post are mine only
Comment

diana gold

Join Date: Jun 2019
Posts: 23

#11

13 Nov 2019, 19:07

Thank you all for the insightful comments!

Echoing the suggestion by Nick Cox and Daniel Klein of "copy the respective version and share it with your collaborators", I've put together a command to do precisely this.

As of now, the command is only available in GitHub. I intend to put it in SSC once I finish a more through testing. Here is an example to anyone wishing to use it:

Code:

* Install the package from GitHub
net install dependencies, from("https://raw.githubusercontent.com/dianagold/dependencies/master/src") replace

* Install something from SSC, that will be frozen as a test
cap ssc install ietoolkit

* Only 1 person in the project freezes the dependencies. And add this zip to the project shared drive (or Github or Dropbox)

* Freeze the dependencies:
* - quotes and full paths do no harm, though not needed
dependencies, freeze adolist("ietoolkit") using("dependencies_myproject.zip") replace
* - specifying the package makes it redundant to specify its components
dependencies, freeze adolist(ietoolkit iegraph iematch) using(dependencies_myproject.zip) replace
* - option all will freeze all packages in the registry stata.trk in plus
dependencies, freeze all using("dependencies_myproject.zip") replace

* Adding to a project master do file

* Unfreeze: at the beginning
dependencies, unfreeze using("dependencies_myproject.zip")

* Remove: at the end (not needed but recommendable)
dependencies, remove

If anyone has comments or suggestions on how to improve the command or help file, it would be very much appreciated.

Best,
Diana

Comment

daniel klein

Join Date: Mar 2014

Posts: 3818
#12

14 Nov 2019, 13:56

Diana

thanks for sharing your code.

As I have stated in the last paragraph in #9, I do not believe that any suggestion that is based on community-contributed commands will be adopted by a critical mass of users. That does not mean that such commands are useless. If your approach works for you (and your collaborators) that is fine. That should be your primary goal. Others might find it useful, too, so sharing the code on SSC is a nice thing to do.

I have had a quick look into your code and also tried a simple example. I ran into a bug that is triggered in line 129:

Code:

else if confirm new file `using'

will always exit with error but not in the way you want. I think you want to lose the if in that line. I have not done more systematic testing.

Whenever I create or erase files and alter system settings (e.g., order of the ado-path), I tend to set up my main program as

Code:

program dependencies version 13 code to back up all system settings etc. nobreak { capture noisily break _dependencies `0' local Rc = _rc depending on `Rc', restore the system settings, etc. } exit `Rc' end

shifting the work to _dependencies and making sure that no matter what happens in _dependencies, the callers' settings, etc. are always safe.

Overall, I would have designed the syntax differently; I like to think in a more stata-ish way. For example, I would not have using() as an option but implement standard Stata's syntax using element. I would not have the sub-commands as options either; I would have them as sub-commands, similar to those used elsewhere (e.g., net, ssc, mi)

Code:

dependencies subcmd ...

I would not necessarily have zip fils; do those work well on Mac and Unix?

I would make more use of Mata's path*() functions instead of working around potential problems with backslashes, quotes, etc. I would probably use Stata's file command instead of reading in stata.trk as a dataset (although I could be convinced otherwise; I see the advantages and like the idea). I would give the caller the possibility to specify which of potentially many stata.trk files they want.

I would tend to keep the (inner) process as simple as possible. I do not really see why we need an additional file, dependencies.trk (but I might be missing something). I would not change the order of the ado-path, without the caller's explicit request to do so.

More generally, I would not set up a complete system of commands for what you are trying to do. I do see the benefits of automating the collection of the components of already installed packages, that is, the freeze part. However, using an uncompressed folder instead of a zip file, the unfreeze-ing process basically boils down to setting the ado-path, assuming that any Stata user knows how to copy a folder from one place to the other. I do not think that I would want to rely on a community-contributed system of commands to do this.

I could go on but I think this already illustrates my initial point: I am in no position to tell you how to do things (better). I would do things differently and I have my reasons. I am sure you, Nick and others would disagree with me on several points I made; even if not, that does not mean anything. StataCorp. is really the only authority to set standards. Still, your command might well be useful for others.

Best
Daniel

Last edited by daniel klein; 14 Nov 2019, 13:58.
2 likes
Comment
diana gold

Join Date: Jun 2019

Posts: 23
#13

17 Nov 2019, 15:51

Dear daniel klein

Thank you for the time and patience in checking the code and providing suggestions. I really appreciate it.

I implemented most of the changes you suggested: using Mata's path*() functions, making the syntax more stata-ish, and considering multiple stata.trk. Those are things that either had not occurred to me or I didn't know how to do - and learned a lot in the process, consulting your examples and suggestions.

The unaddressed suggestions are the (superfluous) metadata dependencies.trk and other sub-commands besides freeze. Personally, I like having the metadata file with the date and original source of frozen files, and the option to unfreeze / remove the dependencies it in one line, so I can easily add it in a project. Others can just ignore those. With regards to the operational system concern, I understand that zip files are readable in Mac and Unix - maybe the commands zipfile and unzipfile in Stata would need adaptation, but I can't test, so will just deflect it. In future projects, I will try to be more cognizant of it.

Best,
Diana
2 likes
Comment

Announcement