Wishlist for Stata 19

Odongo Kodongo

Join Date: Mar 2022

Posts: 10
#166

07 Nov 2023, 07:04

Include a command for best projection reiterative truncated projected least squares.
Comment
Bert Lloyd

Join Date: Apr 2014

Posts: 105
#167

14 Nov 2023, 07:59

I would welcome some minor improvements to the awesome bookmark feature of the do-file editor:

1. Fix the issue of displaying the bookmark navigation pane when Stata and the Do-File Editor are on different monitors, as discussed here:
https://www.statalist.org/forums/for...27#post1610427
Chinh Nguyen (StataCorp) mentioned that StataCorp was aware of this problem but AFAIK it has not been fixed.

2. Add the ability to dock the navigation pane on the side of the do-file editor window, and/or a keyboard shortcut to jump to the navigation pane, as discussed here:
https://www.statalist.org/forums/for...re#post1610169

(And of course it would be great if these were implemented in Stata 17 and 18 as well.)
2 likes
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#168

14 Nov 2023, 20:03

The command ivregress nicely allows factor notation for endogenous explanatory variables, making it convenient to include squares and interactions among endogenous variables and then using margins. For example,

Code:

ivregress 2sls y (c.w c.w#c.w c.w#c.x1 = c.z c.z#c.z c.z#c.z1) x1 x2 ... xk, robust margins, dydx(w)

As far as I can tell, xtivreg doesn't allow this feature, making things cumbersome for fixed and random effects versions of IV.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#169

15 Nov 2023, 01:31

Dear Jeff,
While that is not possible with official commands, you could combine the ssc command f_able to allow you to create the variables yourself.
in your setup

ssc install f_able
fgen w2 = w^2
fgen wx1 = w*x1
ivregress 2sls y (w w2 wx1 = c.z c.z#c.z c.z#c.z1) x1 x2 ... xk, robust
f_able w2 wx1, auto
margins, dydx(w)

This allows for almost any kind of variable transformation
F
2 likes
Comment
Niels Henrik Bruun

Join Date: Aug 2014

Posts: 552
#170

16 Nov 2023, 00:28

I propose options to the -collect preview- commands such as md, latex, and html etc. Now -collect preview- always show output in smcl, but it would be nice to see the output in the log as, e.g., markdown(md).
A workaround is:

Code:

collect export tmp.md type "tmp.md" rm "tmp.md"

Kind regards

nhb
Comment
Zurab Sajaia

Join Date: May 2016

Posts: 4
#171

16 Nov 2023, 14:22

Maybe this doesn't need to be part of the new version, but if at all possible, it would be fantastic to have access to the Stata parser, Python's ast (abstract syntax tree) module is a perfect example. This would allow development of code linters, formatters,. validators, full-fledged plugins for vscode/other editors based on language servers, etc.
3 likes
Comment
Ariel Linden

Join Date: Apr 2014

Posts: 153
#172

16 Nov 2023, 17:09

I would very much like to see:

(1) a suite of official machine learning tools (there are several user-written commands but only lasso and npregress are official commands, and they certainly don't represent the current standard).

(2) a way to speed up mixed models (all of them). I have projects in which mixed (which is certainly the "fastest" of the bunch) has taken 12 WEEKS (yes, weeks!) to complete. That's ridiculous. And I am using an extremely expensive version of Stata for 12-cores! The slowness of multilevel models leads to bad practices. If it takes me 12 weeks to run a single model using -mixed-, I may use -mixed- for a binary outcome because I know that using -melogit- may take twice as long. I am not sure why this cannot be performed as a parallel process to speed it up, or some other mechanism...

Fingers crossed!

Ariel
5 likes
Comment
Zhenhuan Chen

Join Date: Jun 2023

Posts: 2
#173

18 Nov 2023, 18:34

Compared with Matlab, R and other software, Stata has a big gap in drawing 3D plots. For example, three-dimensional scatter plots and so on. I sincerely hope that this will be addressed in Stata19. In particular, the ability to manually rotate angles in the view, as in Matlab.
1 like
Comment
Niels Henrik Bruun

Join Date: Aug 2014

Posts: 552
#174

19 Nov 2023, 03:28

if possible, trace should also be a prefix command with the option depth like

Code:

trace, depth = 2: command_to_be_traced

.

Kind regards

nhb
5 likes
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 413
#175

19 Nov 2023, 15:59

Ariel Linden I hear you about mixed models in Stata. There have been other threads on this. My personal opinion is that it's worth the one-time expense to purchase a mixed model-specific program. I think MLwiN is the best. It's continually updated and they created a Stata program (runmlwin) that calls MLwiN from Stata and returns results for further manipulation. You can't do everything with it like you can with built-in mixed, such as margins, but if you know what you are doing, you can get those yourself with a little extra code. I view it similarly as I do Mplus. If I'm doing serious structural equation modeling, I'm going to spend to get the best. Of course, some will disagree.
1 like
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#176

21 Nov 2023, 06:08

I'd like to see the power and sample size functionality expanded to provide coverage for additional models. There are a few proprietary standalone pieces of software that seem to have an extremely large number of tests covered to estimate required sample sizes, power, minimal detectable effect, etc... (e.g., nQuery Advisor, PASS, Power and Precision). Additionally, it would be great if the existing power and sample size commands could be extended to cover complex sampling designs as well (Valliant, Dever, & Kreuter, 2018 have some examples for a handful of tests for how to do this). Lastly, it would also be great if some tooling related to survey sample allocation could be added to the survey commands.
3 likes
Comment
Fahad Mirza

Join Date: Sep 2018

Posts: 240
#177

26 Nov 2023, 23:24

Will be nice if boxplot can be included in twoway plots. Not just as twoway but as an immediate command too. Will help to include boxplots with other plots such as bar. Right now if I have to make a box plot in twoway, I have to perform manual calculations then paste on the plot according to where I would like to see it.
1 like
Comment
Fabian Fortner

Join Date: Oct 2016

Posts: 38
#178

28 Nov 2023, 08:44

Please don't restrict anymore the possible length of variable names: I followed previous discussions about this: If I design a project, I would never name variables longer than 32 characters. However, for some projects this is a given (as such datasets are handed over to me for processing) and I just HAVE to handle names longer than 32 and it is a huge PITA and a big source of possible errors to rename everything, just because stata has such restrictions in place.
1 like
Comment
Fernando Furquim

Join Date: May 2014

Posts: 20
#179

28 Nov 2023, 10:28

Enable use of wildcards in -reshape long- for naming variable stubs
1 like
Comment
david

Join Date: Apr 2014

Posts: 1
#180

28 Nov 2023, 13:35

I second all the recommendations in #85.

Some additional wishes:
Increase limits in general. I regularly work with very large datasets (>100GB) and often encode strings which can significantly reduce the file sizes (think of replacing 1B observations of a str2000 with an int or long), but the limit is 65,536 unique values - I think this could easily be increased. Other limits that I've hit up against and would be nice to increase include the number of variables (rare, to exceed 120k, but has happened), the # of characters in a command, and number of arguments in inlist (particularly strings).

Allow the use of a RAM drive to save files to memory. For example "save test.dta, ram" would save the file into ram, rather than writing it to a disc, which is faster to write and read later on. You can, of course, create a RAM drive separately and save the files to it, but Stata could do this dynamically.

Generally speed up and optimize existing commands. For example, despite sort being significantly improved, it's not consistently leveraged. For example:

Code:

clear all sysuse auto, clear expand 1000000 timer on 1 duplicates drop make, force timer off 1 sysuse auto, clear expand 1000000 timer on 2 sort make duplicates drop make, force timer off 2 timer list

On my system:

1: 89.48 / 1 = 89.4830
2: 53.41 / 1 = 53.4090

Most string functions are also pretty slow - maybe things like sed or tr from Linux could be adopted to speed up commands like subinstr and strpos. Reshaping also is far from optimized with large datasets ex(reshaping 1000 variables from wide to long) can take very long.

With "append, force", rather than force the new data type to conform with the existing type, allow the option to convert all nonmatching variables to strings. Currently if you use "append, force" the data in the appended variable will be lost. On a one-off basis, using tostring for the original or appended data is trivial, but when there are hundreds of variables and many files, this is not trivial. One solution is to bring in all data in as strings and then destring as possible after files are appended, but that is not memory efficient.

Create a much better many-to-many merge. Joinby (which I still believe should be what m:m does) is incredibly slow and memory inefficient for large datasets to the point of being unusable. There are work-arounds that I've implemented (using expand and m:1 / 1:m) but it would be nicer to have this functionality built into Stata.

Better support for JSON files. I consider JSON to be a hideous file structure, but it's becoming very common and Stata has limited ability to work with it, particularly nested files.
4 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment