RFC (request for comments) on new package "require.ado"

Sergio Correia

Join Date: Apr 2014

Posts: 420
#1

RFC (request for comments) on new package "require.ado"

12 Jul 2021, 06:17

Hi all,

There is an undocumented ado-file in ftools which I found quite useful for my own work, so I've spinned it off into its own package, so far named "require". I haven't submitted to SSC or released v1.0, so any feedback at this point would be really useful. This is how I use "require", and how I expect others might benefit from it:

1) At the beginning of do-files, to ensure whoever runs it (me on other computers, coauthors, etc.) have the correct versions of programs:

Code:

clear all cls .. require rdrobust>=8 // at least version 8 require reghdfe>=6.0.1 // at least version 6.0.1 require tuples // no specific version, just that it's installed (so it's equivalent to "which tuples") ...

2) In my own packages, to ensure dependencies are met

Code:

program reghdfe syntax ... require ftools>=2.47.0 ... end

In terms of extra features, there are three so far:

1) You can also auto-install the package if it doesn't exist:

Code:

require tuples, install // will install from SSC require gtools>=1.7.5, install from(https://raw.githubusercontent.com/mcaceresb/stata-gtools/master/build/) // install from a given URL, like "net install gtools, from()"

2) You can also put all the dependencies in an external text file and then type:

Code:

require using requirements.txt

3) "require" returns a few useful things such as the version, date, and version details (major, minor, patch) in "s()", which you can see with "sreturns list".

To install the package, you can run:

Code:

net install require, from("https://raw.githubusercontent.com/sergiocorreia/stata-require/master/src/")

Any thoughts are more than welcome,
Sergio

PS: a preeemptive FAQ:
Does this work with the github package? --> I'm not that familiar with github.ado because some of the packages I use most (gtools, plus my own) don't work with it as their code is in the "src" or "build" subfolders.

How does this detect version numbers? --> It searches the first string (starbang) and tries to make sense of the string. This is very heuristic because every author had its own best practices (and which changed with time). So far, it seems I'm able to pick up about two-thirds of the most common Stata packages (with a bias for newer packages), but that can definitely be improved
Tags: None

1 like
daniel klein

Join Date: Mar 2014

Posts: 3818
#2

12 Jul 2021, 07:23

Similar ideas have come up from time to time; see these programs from the SSC and the respective references in the help files. Probably the most elaborate variation on this that I have seen is Ben Jann's adolist (SSC).

Anyway, when I implemented rqrs and which_version, I stumbled over the same problems as the ones you mention in your preemptive questions (and some more).

The first problem is, that for this to be useful, you would need to explicitly install the program, require, first. That is, you can obviously not type

Code:

require require

but would need to include

Code:

net install require, from("https://raw.githubusercontent.com/sergiocorreia/stata-require/master/src/")

in any do-file (and ado-file, program, etc.) that relies on require. I doubt that this is going to be adopted by a critical mass of users and/or programmers.

As you are well aware, the problem with different versions of community-contributed programs goes beyond different preferences of authors concerning notation. This touches upon more general discussions of whether one (or all) specific versions of a program should be accessible when they are outdated. Again, require would obviously work best if everyone adopted the same notation and agreed on which versions of a program should be accessible at any time; I do not believe this is going to happen.

Another problem occurs with packages that have a different name than a specific program we are using. For example, esttab is/used to be one of the most popular programs from the SSC. Yet, the name of the package is estout. Sometimes, there is not even a Stata program, but a Mata package, e.g., moremata. Although these issues are probably more puzzling for the "regular" Stata user than for the experienced Stata programmer, I, as a programmer, would expect these issues to be handled by a command like requires. This, in turn, would again call for some standardization in how authors are supposed to document and maintain their packages.

In general, I believe the problem that require addresses is not primarily a technical problem. The challenge is to get Stata programmers to agree on certain standards and principles. But then, I might take require for much more than it is intended to be. If it is merely intended as a convenience tool for those who find it useful, I am sure there are people who will find it useful.
2 likes
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#3

12 Jul 2021, 08:35

Hi Daniel,

Those are definitely great points.

Originally posted by daniel klein View Post

The first problem is, that for this to be useful, you would need to explicitly install the program, require, first.

Completely true, and it's one of the reasons why I avoided depending on other packages (e.g. ftools).

That said, if the goal is to nudge users to have it installed, I could add it as part of "ftools" (where its older version, "ms_get_version", sits). Thus, anyone using ftools/reghdfe/etc. would have it installed (but I'm not sure if having a huge package is good practice). Or alternatively, if it proves useful enough, it can get added to base Stata, as e.g. dataex and other packages in the past.

About your other points, even though its impossible for require to work in every case (because there are many variants of starbang strings, some which don't even include versions), I find it solves three important problems that I am just unable to solve otherwise:
In a recent project, I was getting different results than my coauthors even after running the same do-file and with the same Stata installed. It turns out that I was using the dev version of rdrobust, which contained improvements that changed some results. Thus, to ensure this never happens again, we now have "require rdrobust>=8" (and similar for reghdfe, etc.)

reghdfe depends on ftools, but if you run a new version of reghdfe with an old version of ftools, unexpected errors can happen. The easiest ways around it is to just type "require ftools>=x.y.z"

If you want to reproduce a published paper, then using the exact version of packages used by the paper is key. Of course, recovering older versions of SSC packages is an open problem...

Originally posted by daniel klein View Post

In general, I believe the problem that require addresses is not primarily a technical problem. The challenge is to get Stata programmers to agree on certain standards and principles. But then, I might take require for much more than it is intended to be. If it is merely intended as a convenience tool for those who find it useful, I am sure there are people who will find it useful.

Definitely! But because getting others to agree is almost impossible, and I can't go back in time to change already existing packages, this was the best way (the only way?) I found to be able to work with do-files and ado-files and ensure that others have the required dependencies.
2 likes
Comment
daniel klein

Join Date: Mar 2014

Posts: 3818
#4

12 Jul 2021, 10:30

Originally posted by Sergio Correia View Post

I could add it as part of "ftools" (where its older version, "ms_get_version", sits). Thus, anyone using ftools/reghdfe/etc. would have it installed (but I'm not sure if having a huge package is good practice).

I see nothing wrong with that. More generally, it seems to me that relying on your own stuff is quite common practice even among Stata programmers.

Originally posted by Sergio Correia View Post

Or alternatively, if it proves useful enough, it can get added to base Stata, as e.g. dataex and other packages in the past.

That rarely happens these days. I cannot speak for StataCorp but I guess incorporating code comes with some responsibility (if only documentation and certification scripts). I guess dataex was a rare exception because StataCorp has an obvious interest in keeping Statalist user-friendly (attract possibly new users, give them a better experience associated with the software, and, sure enough, save Stata tech-support team a lot of work). If version control for community-contributed packages did have some priority, StataCorp would probably come up with their own require. By the way, require is kind of a "nice" name that StataCorp might indeed choose to use. You might want to change the name to minimize the risk that your programming tool might stop working in the future because it gets superseded by some official StataCorp code.

Originally posted by Sergio Correia View Post

I find it solves three important problems that I am just unable to solve otherwise:
In a recent project, I was getting different results than my coauthors even after running the same do-file and with the same Stata installed. It turns out that I was using the dev version of rdrobust, which contained improvements that changed some results. Thus, to ensure this never happens again, we now have "require rdrobust>=8" (and similar for reghdfe, etc.)

reghdfe depends on ftools, but if you run a new version of reghdfe with an old version of ftools, unexpected errors can happen. The easiest ways around it is to just type "require ftools>=x.y.z"

I see how that fixes your problems; I just doubt that it will be equally useful for others who rely on stuff that does not use the same notation to indicate the version numbers as rdrobust does. I still think it is absolutely worth putting require into a separate package that you ship as a stand-alone and/or with ftools.

Originally posted by Sergio Correia View Post

[*]If you want to reproduce a published paper, then using the exact version of packages used by the paper is key. Of course, recovering older versions of SSC packages is an open problem...

You cannot reproduce results obtained with Stata 16 in Stata 17, either, if the former proved incorrect. So the general "problem" is not really specific to the way the SSC archive is set up. Personally, I see no point, scientific or otherwise, to reproduce incorrect/outdated results. We have had this (interesting) discussion on Statalist several times, and I find myself more and more agreeing with those who doubt that there is a "problem" to solve here at all. And, if there is a problem, it seems sufficient (perhaps even better than to require some minimum version) to make sure that the latest version of the respective software is installed. If that software is hosted on the SSC, this boils down to including the one line

Code:

ssc install packagename , replace

near the top of the do-file. That is, of course, just my opinion. Others might well look just for the functionality that require provides.

Last edited by daniel klein; 12 Jul 2021, 10:33.
2 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29947
#5

12 Jul 2021, 10:56

Some different thoughts.

Including -require- as part of the -ftools- package feels problematic to me. The two are not inherently related to each other. When I go to SSC or other sites to install a new user-written program or package, I read the help file to get a clear sense of what it does and how I might use it. I expect to get a full and honest disclosure of what I will be downloading. Were I later to learn that some additional program with an extraneous purpose has sneaked onto my computer, it would anger me. So at the least, there needs to be full disclosure that the package includes, whether you like it or not, an extraneous program that serves other purposes. Whether that disclosure would then deter me from downloading the package is hard to say--I have a lot of trust in some authors (including you, Sergio Correa), but coming from authors I'm not as comfortable with, it might prove a barrier.

Related to that, having the -install- option on -require- raises a similar problem. In this case, presumably the additional program(s) being installed are relevant, so less problematic. (At least that is true with the intended usage of -require-, though I can imagine it being misused.) But again, I think the user has a right to be told in advance about everything that is going to be placed on his/her computer when choosing to install a program. Full disclosure is the issue here.

Personally, I am happier with simply stating in the help file of a program (package) "Use of this program requires program xyz, version a.b.c or later" and having the program check whether that requirement is met when it runs. If not met, the program can abort with an error message reminding the user to install the required other program(s). If I recall, Robert Picard's -rangejoin- does that to assure that the needed -rangestat- is already present. Fair enough to tell me that I can't have A without B. But I dislike having B surreptitiously installed without my active consent.
4 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#6

12 Jul 2021, 15:22

I see why require was written. My reaction is, however, close to that of daniel klein and Clyde Schechter.

The philosopher Sidney Morgenbesser when asked his opinion of pragmatism -- a word coined in the first instance by C.S. Peirce to describe an approach in philosophy -- replied "It's all very well in theory but it doesn't work in practice."

As already pointed out, a Catch-22 here. is that require is itself required to be installed before you can use it successfully. (This is a good example of how being fussy about fonts can have advantages in explanation, although the Gould notation -require- to flag Stata syntax would work as well.)

So, that is a reason why I would be unlikely to use it in my own programs. Even if it became an official command, I would be reluctant, because I try to make my code usable by people with old versions of Stata,. That has little bearing on whether other people might want to adopt it, except in so far as their approach is similar.

Being fussy about version control is important and StataCorp takes this very seriously and by the way the Stata Technical Bulletin and Stata Journal are very serious about documenting updates, including program fixes, to the extent that authors want to do that. As we've discussed here in the past, SSC does not support version control except in a roundabout way that you can give different versions of a command different names.

That said, a lot of energy can be spent on uploading and documenting different versions of community-contributed commands to little useful purpose. I just updated stripplot on SSC yesterday and I count 35 different versions flagged in the code since 1999. How many of those are publicly accessible? One. How many of those do I have on my own machines? Only a few. How many questions do I get about previous versions as compared with the latest version? None that I can recall. Naturally, an anecdote is not an analysis, but I'd say that's typical of much of my own work, even though I am more zealous about updating in the Stata Journal whatever is published there.

Just occasionally it's really important that a much used command included a really big mistake, so that readers need to know which version particular authors used, but on the whole that is unusual.
1 like
Comment

Announcement

RFC (request for comments) on new package "require.ado"

Comment

Comment

Comment

Comment

Comment