Wishlist for Stata 16

Nicholas Winter

Join Date: Mar 2014

Posts: 122
#211

28 Feb 2019, 07:07

I would love to see two utility additions that would streamline programming a bit.

First: an addition to SMCL that acts like the {opt} directive, but which infers the minimum abbreviation from capitalization rather than the position of a colon. That is, I might do the following in a help file:

Code:

{opt key:words(string)}

Which looks like this when viewed:

keywords(string)

But it would be convenient to be able to code this as

Code:

{nickopt KEYwords(string)}

This would be convenient because the -syntax- statement in the program already has the minimum abbreviation indicated that way, so it would save some time in creating help files...

Second: a subcommand parser, modelled on -syntax-, that handles the abbreviation of subcommands behind the scenes. Right now, programs that allow abbreviated subcommands begin with code like this (taken from graph.ado):

Code:

gettoken do 0 : 0, parse(" ,") local ldo = length("`do'") if "`do'" == bsubstr("display",1,max(2,`ldo')) { // draw/display gr_draw_replay `0' exit } if "`do'" == bsubstr("save",1,max(4,`ldo')) { // save gr_save `0' exit } if "`do'" == bsubstr("use",1,max(3,`ldo')) { // use gr_use `0' exit } if "`do'" == bsubstr("print",1,max(5,`ldo')) { // print gr_print `0' exit } if "`do'" == bsubstr("dir",1,max(3,`ldo')) { // dir gr_dir `0' exit } if "`do'" == bsubstr("describe",1,max(1,`ldo')) { // describe gr_describe `0' exit }

But wouldn't it be nice (and easier to debug) to be able to use my imagined -subcommandsyntax- command, which would return the unabbreviated subcommand in the local `subcommand'

Code:

gettoken do 0 : 0, parse(" ,") subcommandsyntax 0 : DIsplay save use print dir Describe ... gr_`subcommand' `0'

Last edited by Nicholas Winter; 28 Feb 2019, 07:17.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#212

07 Mar 2019, 13:43

Please enable the Bayesian commands to export their output in a standard format. The commands don't write to r(table), nor do they replay their estimates with the estimates store or replay commands. bayesstats summary will replay the estimation results, but it doesn't produce any output that we can capture. This appears to mean that we have to copy-paste results after running Bayesian estimation commands, which is cumbersome, error-prone, and a possible incentive for people to defect to other software packages like R.

Some discussion here.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
David Radwin

Join Date: Mar 2014

Posts: 368
#213

19 Mar 2019, 18:37

My wish: quantile (percentile) estimation with design-adjusted standard errors (that is, with the svy: prefix). This can be done using epctile (Stas Kolenikov, from findit epctile) but it doesn't always yield results.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
2 likes
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#214

27 Mar 2019, 11:24

Originally posted by Weiwen Ng View Post

Please enable the Bayesian commands to export their output in a standard format. The commands don't write to r(table), nor do they replay their estimates with the estimates store or replay commands. bayesstats summary will replay the estimation results, but it doesn't produce any output that we can capture. This appears to mean that we have to copy-paste results after running Bayesian estimation commands, which is cumbersome, error-prone, and a possible incentive for people to defect to other software packages like R.

Some discussion here.

Withdrawn. The post-estimation command bayesstats summary does write to a table called r(summary), as pointed out by Ben A. Dwamena

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
JanDitzen

Join Date: Jan 2015

Posts: 348
#215

03 Apr 2019, 03:28

Maybe it has been mentioned before, it would be great to have a possibility to loop over unique values of a variable (string or double) without any prior steps. At the moment my usual approach is to transfer the variable into mata or use levelsof. Both approaches have disadvantages, either require additional code or can be problematic if the variable type is not predetermined (i.e. loop over string or non-strings). What I am thinking of would be something like:

Code:

foreach lname of unique varname {

Last edited by JanDitzen; 03 Apr 2019, 03:30. Reason: changed distinct to unique
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#216

03 Apr 2019, 03:52

JanDitzen Any time I have been tempted (the last time was quite a while ago) to loop over distinct values, I have found that the solution was not to (explicitly) loop, but to use the by prefix instead. So can you tell us more about typical tasks you want to perform with that loop? Maybe we can find a solution that does not require you to wait till the new version of Stata.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
JanDitzen

Join Date: Jan 2015

Posts: 348
#217

03 Apr 2019, 04:11

Maarten Buis, I am thinking about operations which require multiple lines of code. If I am not mistaken, by only applies to a single line. A recent example was that I wanted to download and process data from UN comtrade. To do it automatically, I first obtained a list of all countries and then looped over their codes. Within each loop I download the dataset and process it to bring it in the format I would like. A (simplified) example without the processing steps is the following (requires my comtrade user written command - https://janditzen.github.io/comtrade/):

Code:

clear cd "C:/downloads/comtrade" ** obtain list of all countries comtrade list partner , listall ** Remove world and all as not needed drop if value == "World" | value == "All" ** get number of countries for display use only levelsof id, local(CtryList) foreach ctry in `CtryList' { comtrade api, maxdata(500) type(C) freq(A) years(2017) reporterc(`ctry') partnerc(all) traderegime(all) hs(HS) cl(271111) append("`c(pwd)'/hs271111_`yr'.dta") nocheck ** more calculations here }
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#218

03 Apr 2019, 04:39

You can have multiple by commands one after another, that is often how I solve such problems. The bigger problem is that comtrade does not seem byable, which would make sense given what it does.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#219

03 Apr 2019, 05:53

JanDitzen For unique read distinct! https://www.stata-journal.com/articl...article=dm0042

Allowing that would, I guess, help much less than you hope. It is what inside the loop that matters: in particular if there are any if qualifiers, they can slow things way down.

Check out the the Picardesque trio of rangestat, rangerun, runby (all from SSC) which do help with many of these problems. They are here now, and not dependent on the caprice and timetable of StataCorp.

But I don't have suggestions for your specific problem.

Robert Picard
1 like
Comment
Bill Magee

Join Date: Apr 2014

Posts: 10
#220

15 Apr 2019, 08:39

Perhaps include history of ado-file installs - under "What's New" in Help
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#221

16 Apr 2019, 08:16

I would like to see a more logical and consistent approach to how -collapse- interacts with string variables. Currently, you can use string variables with (count), (first), (last), (firstnm) and (lastnm). That all makes sense. And it makes sense that you can't use them with numerical operators like (sum) or (mean) etc. Then there is the issue of ordering: strings have a natural alphabetic order. So it might make sense for Stata to also provide (min) and (max) operators for string variables under -collapse-. Or you could argue that such operators are probably not useful and not worth implementing: we rarely would need those, and could emulate them by -encode-ing first. But what Stata actually does is allow (min) but not (max)--which makes no sense to me.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#222

16 Apr 2019, 17:11

Originally posted by Clyde Schechter View Post

But what Stata actually does is allow (min) but not (max)--which makes no sense to me.

Do you think that this has to do with the way missing values are handled in string variables?

Perhaps with the added wrinkle of Unicode now. With other software (e.g., Microsoft SQL Server), the sort order of string data, especially in an international setting, gets thick fast.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#223

16 Apr 2019, 19:30

Those are good points for not treating string variables as having ordinal properties at all. But how would they justify supporting (min) but not (max)? True, the empty string (missing value) sorting first defines an asymmetry, but for numeric variables have an asymmetry as well with missing values sorting last. (min) and (max) both ignore missing anyway, so the same convention could be applied to string variables.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#224

16 Apr 2019, 21:09

Good points. Sounds like a bug, frankly: if Stata can sort a string variable, then there's no reason why collapse can't deliver the "maximum" string value.

(By the way, missing is not ignored by collapse for string variables as it is for numeric—run the attached do-file to see.)
Attached Files

string_collapse.do (350 Bytes, 1 view)
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4946
#225

17 Apr 2019, 07:22

If it can legitimately be done, I would like to see fixed effects models supported by xtologit. This paper claims that "The fixed effects ordered logit model is widely used in empirical research in economics." Well, maybe so, but I don't know how they do it. https://www.cemmap.ac.uk/uploads/cem...is%20Muris.pdf

Using a hybrid model has been suggested. (I think maybe the user-written xthybrid can do it easily) Allison discusses the pros and cons of that approach at https://statisticalhorizons.com/prob...-hybrid-method If Stata can improve on xthybrid or make it easily implemented as an option in commands maybe that would be a good approach, e.g. add a hybrid option.

I wonder if fixed effects options could be built into me commands. Right now many models can be estimated by either xt or me commands. But, as far as I know, FE models can only be estimated with xt commands.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
2 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment