Wishlist for Stata 16

Nick Cox

Join Date: Mar 2014

Posts: 35405
#76

05 Mar 2018, 11:42

Not the question, but I have stopped using lowess, for two reasons. First, if I use lowess then I have to explain at some point what it is, which for most readerships is awkward as the Stata idea of lowess isn't equivalent to many others. The method has morphed and mutated in various ways over 40 or so years and through several hands. Second, lpoly is much more flexible (pun intended) in how it can be used and much easier to link to standard literature. (But it isn't faster.)

But the main point is that made by Clyde: often one can't tell how long it takes to climb the mountain before you've done it.
Comment
Ronán Conroy

Join Date: Apr 2014

Posts: 14
#77

05 Mar 2018, 13:02

If I were Statacorp, I'd be looking at RStudio and thinking "we have to look better than that, and fast". Package management, help search, ability to browse multiple file types, and – big selling point – ability to do literate programming – all within 'one window to rule them all' : it's a splendid piece of design.

And it's just a small thing, but every time I have to explain to my students that in the dialog for 2-way tabulate, when Stata means row percentages it calls them relative frequencies. It would be good if it said percentages.
1 like
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#78

05 Mar 2018, 21:25

Very fair points Clyde Schechter and Nick Cox. Even though there are instances where it is not possible to predict the time for estimations, there are still some where it is possible to give a sense to users. If i'm running simulations that estimate 10,000 OLS models (or more), I resort to setting a timer for 50 or so and then manually doing the maths to figure out how long this will take. It's crude, as i'm often not doing anything when I count the time for the first 50 -- but if simulations are running in the background i will likely be using my computer for other things at the same time -- it's better than not having any idea though!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#79

06 Mar 2018, 08:31

Stata already has the same attitude, insofar as dots are issued with some commands as Stata loops repeatedly through the same kind of calculation.
(Equally, these dots are often just irritating and I cherish the option to turn them off.)

It doesn't seem difficult to add this to a simulation program such as you're describing. That's not quite what you're asking, but it is often equivalent.

I still remember one computer programming book from the 1980s issuing repeated advice "This may take some time, so go and get yourself a coffee". No allowance for other tastes!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29947
#80

06 Mar 2018, 09:03

Stata already has the same attitude, insofar as dots are issued with some commands as Stata loops repeatedly through the same kind of calculation.
(Equally, these dots are often just irritating and I cherish the option to turn them off.)

Yes, particularly if you are, say, doing a simulation with 50,000 reps, the dots get very annoying!

In response to this, when Robert Picard and I developed the -runby- command (SSC), which was explicitly designed for, in effect, looping with a large number of iterations, we struggled with how to, on the one hand, let the user know that progress is being made, but not flood the output log with dots or messages. Robert came up with a solution that I think is brilliant. -runby- will run silently by default, but you can get a progress report by specifying the -status- option. If you specify -status-, you will initially get a progress report roughly every second, and then after 5 seconds, the reporting rate slows to every 5 seconds, then later to every 15 seconds, and then to just every minute. Each progress report indicates the number of iterations processed so far, the elapsed time so far, and an estimate (based on extrapolation to the total number of iterations needed) of how much time remains. There is also some information about the amount of data generated, and a running tally of the number of -by()- groups that generated errors (analogous to red x's in StataCorp's -dots-).

I really hope that StataCorp will take a look at the code for this and adopt his approach, in lieu of -dots- for its multiple-iterations commands.

But this will not be applicable to iterations of likelihood maximization for the reasons noted earlier.
2 likes
Comment
Sule Yaylaci

Join Date: Jan 2018

Posts: 52
#81

06 Mar 2018, 11:56

Adding the option 'mlmv' to gsem command would be great!
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#82

06 Mar 2018, 12:48

Nick Cox: I have seen the dots on some commands, e.g. bootstrapping, and personally I find them quite helpful! I'm pretty ignorant as to their underlying limitations though so perhaps knowing this would frustrate me more. And the textbook you mention offers sage advice, except i'd replace 'coffee' with 'scotch' for those late night coding sessions.

Clyde Schechter: i've used runby a couple of times (mostly when Robert Picaud suggested it on this forum). I hadn't previously had a good look at the help file though, and wasn't aware of the -status- option. It sounds like a sensible approach, balancing information provided to the user with potentially flooding their output window, and it would be great if StataCorp could integrate your code into future releases.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29947
#83

14 Mar 2018, 13:47

I would like to see the 32 character limit on variable names relaxed to, say 48 or even 64.

I know, when we first started out the limit was much more stringent. But I really like using names that are explanatory, and I dislike abbreviating names by omitting vowels or the like. And sometimes you have to distinguish between variations on a theme (admission_date, discharge_date, surgery_date, etc.) I agree that 32 characters usually accommodates things; I rejoiced when the limit was first raised to 32.

But sometimes in data management it becomes necessary to create new variables based on the names of existing variables. For example, you might loop over a bunch of variables looking for certain kinds of data problems, and you might want to create a new variable doing something like

Code:

foreach v of varlist whatever { // MAYBE CALCULATE SOME STUFF FIRST // TO IDENTIFY PROBLEMATIC OBSERVATIONS FOR // VARIABLE V gen byte problem_`v' = some_logical_expression }

Well, that entails adding 8 characters to the variable name. So if you started out with 25, the code breaks. Of course you can use -strtoname()- to get you a substitute name that will fit, but then you have the problem that your problem_* names are no longer completely parallel with the names of the variables they are referring to, so that now writing another loop to fix the problems gets complicated with dancing around the name differences. Yes, of course, instead of problem_, one might use flag_, p_, or even just _, particularly if the variables are only needed in the interim and will not be saved with the data set. But even these will cause a break if applied to 28, 31 or 32 character names.

I know, there is probably no upper limit that will satisfy every need. But I can't help thinking that raising it to 48 or 64 would do little harm and would be welcomed by a non-negligible number of us who run into these problems.
4 likes
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#84

15 Mar 2018, 03:51

Originally posted by Clyde Schechter View Post

...

I'd like to echo that for macro names and variable labels. 80 characters to describe a variable is often way too short, severely limiting the use of this feature. The macro problem is essentially identical to the one posed by Clyde. I.e. I might make macros to hold some means of variables or regressions results across a wide set of specifications (e.g. small vs large firms, estimation in logs or not, using different dependent variables etc etc) and you very quickly reach the 32 character limit that way.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#85

15 Mar 2018, 04:48

I think the problem with longer variable names and variable labels is not that people don't want them sometimes (they naturally do) or that there is a problem of principle in changing the limit (there presumably isn't). The problem is where is Stata to find the space to put them in output. There is a fairly substantial area of difficulty there in revising many commands. (Or, more likely, they just end up abbreviated any way, so where's the gain?)
1 like
Comment
Andrea Discacciati

Join Date: Feb 2016

Posts: 194
#86

15 Mar 2018, 07:18

Originally posted by Nick Cox View Post

(Or, more likely, they just end up abbreviated any way, so where's the gain?)

Clyde (#83) illustrated one example where longer variable names are welcome. The usefulness of longer variable names in that example is orthogonal to how variable names are displayed (abbreviated or not).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#87

15 Mar 2018, 08:39

Andrea Discacciati Those properties will not be orthogonal, much though we all like making such quips.

Long variable names will often (not always, agreed) be fairly useless whenever you can't see them.

I don't know how you'd calculate the correlation, but my prior on it isn't centred at zero.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3818
#88

15 Mar 2018, 09:13

Originally posted by Nick Cox View Post

Long variable names will often (not always, agreed) be fairly useless whenever you can't see them.

True, but you can always see long variable names when you read code. This is how I understand the example given by Clyde in #83. Form this perspective, any output of variable names is indeed irrelevant.

Best
Daniel
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#89

15 Mar 2018, 09:41

daniel klein
Clyde Schechter

That's part of my "not always" in #87.

It's a little tricky but long since possible to concatenate macro names to make a longer readable version of variable names.

Consider

Code:

local patient p_ local history h_

so that you could have

Code:

`patient``history'*

as a more informative variant on

Code:

p_h_*

assuming a bundle of variables on patient history. Clearly you can invent examples of your own close to home.
2 likes
Comment
daniel klein

Join Date: Mar 2014

Posts: 3818
#90

15 Mar 2018, 09:51

Nick: Nice!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment