Wishlist for Stata 18

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2386
#346

11 Apr 2022, 15:55

I don't know if SAS implemented their own version of SQL with PROC SQL (I suspect so) or if they mask some backend to some other database. Whatever it is, they are their own "dialect", adding on features that only make sense in a SAS context and not quite allowing full ANSI-specification SQL. I would really be happy to see something like an SQL interface since there are clear advantages for some needs. But, I think this is where Stata seems to philosophically follow the idea of embracing existing frameworks -- by allowing ODBC/JDBC interfaces to these databases -- which means users who desire those tools and functionality must implement the setup steps themselves.
2 likes
Comment
Bjarte Aagnes

Join Date: Apr 2014

Posts: 783
#347

12 Apr 2022, 04:23

Stata Installation Qualification Tool (IQT): adding API, or command line arguments, to integrate IQT in automated deployment processes.
2 likes
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#348

12 Apr 2022, 06:36

William Lisowski
The SAS "SQL" implementation is not what I would consider to be anything but an example of how not to approach the problem. There are packages in several languages that implement various types of SQL functionality, particularly with regards to joins, but they do so using their own native approaches that fit within their current programming paradigms. Leonardo Guizzetti is correct that SAS is implementing something like their own version of a subset of SQL that is pretty far removed from the full ANSI specification.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#349

12 Apr 2022, 07:57

Nothing in #348 contradicts the point I made in #345 that

... SAS approached the problem back in the day by creating PROC SQL to understand SQL commands (and interface directly with SQL databases) rather than shoehorn SQL capabilities into existing DATA step commands.

Statalist is not the forum in which to argue the strengths of SAS's particular implementation of SQL within PROC SQL, which was not the case I was making. I will point out that my copy of the SQL In a Nutshell (2nd Edition, 2004) reference sets the stage for the book in its Preface:

SQL in a Nutshell, Second Edition, describes the latest ANSI standard, SQL2003, version of each command and then documents each platform's implementation of that command.

The platforms comprise six databases popular at the time. So SAS was not alone in adapting the standard (originally published in 1986) to the needs of their implementation. I would prefer Stata do the same should it implement SQL functionality. It that, it would differ from what was done with Python. My preference is to apply SQL syntax to native Stata datasets rather than have to move the datasets into and out of an external database for what are largely data management tasks.
Comment
Jay Patel

Join Date: Apr 2014

Posts: 2
#350

19 Apr 2022, 15:41

Sophisticated tools to support investment/portfolio development and analyses. Examples: estimation of weights for optimal diversification (with constraints; robust; Bayesian; ...). Bootstrap standard errors for estimates of "optimal" portfolio weights. Portfolio performance analyses that relaxes assumption of i.i.d. observation periods.
Comment
Jay Patel

Join Date: Apr 2014

Posts: 2
#351

19 Apr 2022, 15:46

More comprehensive "table" creation. For example, neither *tabdisp* nor *list* allow a *collect* command to readily gather output appropriate for a table to be output via *putdocx*.
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2386
#352

19 Apr 2022, 15:59

Originally posted by Jay Patel View Post

More comprehensive "table" creation. For example, neither *tabdisp* nor *list* allow a *collect* command to readily gather output appropriate for a table to be output via *putdocx*.

I agree with the spirit of the request to expand the collect system, and I expect those features are being developed. A slight aside, putdocx directly supports listing data already.
2 likes
Comment
Anthony Killeen

Join Date: Jun 2017

Posts: 13
#353

21 Apr 2022, 20:43

Show the name of the frame in the Data Browser!
4 likes
Comment
Luis Pecht

Join Date: May 2017

Posts: 146
#354

22 Apr 2022, 08:12

I great addition to Stata´s Machine Learning capabilities ( eg. Lasso ) would be an automated feature engineering, just like python's FeatueTools (https://featuretools.alteryx.com/en/stable/).

Once one establishes relationships among Level 0 and higher-levels (Level 1,2,3) datasets and define a cut-off time (Any data after this point in time will be filtered out before calculating features, to avoid "label leakage"), hundreds of features ( variables) are "automagic" created using max, min, means, counts,sum, etc and can be feed into ML algorithims.

IMHO, would involve frames, frlinks, rangestats and lots of egen to translate that in Stata.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2386
#355

22 Apr 2022, 08:41

Originally posted by Anthony Killeen View Post

Show the name of the frame in the Data Browser!

Anthony, this is shown in the Properties pane of the Data Browser/Editor (View > Properties).
Comment
Anthony Killeen

Join Date: Jun 2017

Posts: 13
#356

22 Apr 2022, 14:32

Thank you!
Comment
daniel klein

Join Date: Mar 2014

Posts: 3818
#357

26 Apr 2022, 07:03

Repeated requests (unlikely to happen, I know):

1. Do not allow m:m merges. These are never useful outside of StataCorp. Even if they were useful in very rare situations, they surely produce more harm than good. If required, keep m:m under version control.

2. Remove mi's suggestion to use the force option. You never want that; yet, we regularly see it used (blindly copied) on posts to Statalist.
8 likes
Comment
alejoforero

Join Date: Sep 2014

Posts: 50
#358

28 Apr 2022, 07:46

When using merge with a very large master dataset and a small using dataset, Stata saves the master dataset in preserve to sort the using dataset first. This can be extremely slow if the master does not fit in preserve memory and so Stata has has to save it in I/O temporary files. This is completely unnecessary if the using dataset fits easily in the preserve memory, and can be sorted there before the merge.

A nice trick I use is:

Code:

frame create sorter frame sorter: use usingdataset.dta, clear frame sorter: sort merging_variables frame sorter: save usingdataset.dta, replace merge 1:1 merging_variables using "usingdataset.dta"

Adding this trick into merge code could potentially save a lot of time doing the merge without any sacrifice whatsoever.
1 like
Comment
Jeremy Lim

Join Date: Feb 2017

Posts: 14
#359

28 Apr 2022, 20:01

For Latent Class Analysis (LCA) as conducted in gsem, to have Latent Transition Analysis and Stata equivalents of Mplus' R3STEP for 3-step latent class regression accounting for classification error, and Knownclass for multiple-group LCA.

Gsem currently estimates multiple-group LCA using the groups() and ginvariant() options, but ginvariant() does not offer an option to constrain coefficients but not variances. More details in this earlier post. Hence, this has to be done manually as advised by Stata Tech Support: "You could place those needed constraints into the constraint definition and then supply to the -constraints()- option after -gsem-", which can be very tedious with 100s of constraints for a 4-class model with 7 groups.
2 likes
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#360

29 Apr 2022, 12:40

Can Stata Corp please fix the documentation for -matrix accum- and add some examples that illustrate how these commands
matrix glsaccum
matrix opaccum
matrix vecaccum
are used?

In particular -matrix glsaccum- has been in the current state of documentation since about Stata 7. I do not understand nearly anything from the current abstract explanation of what -glsaccum- does, there are no examples showing how the command is used anywhere, etc.

And the command -glsaccum- is useful because once upon the time I had a lucky day and managed to implement all the estimators in Wooldridge 2010 "Chapter 7: Estimating Systems of Equations by OLS and GLS" from scratch, just using -glsaccum-, without even reaching the limits of the command (I used the same weighting matrix accross groups, and it apparently allows the weighting matrix to vary).

In short, the plain -matrix accum- is clear. But the more complicated versions listed above and in particular -glsaccum- are not clear at all in the manual .
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment