Wishlist for Stata 19

Rich Goldstein

Join Date: Mar 2014

Posts: 4438
#316

23 Jul 2024, 06:48

I believe that most calibration is performed graphically these days; there are summary statistics easily available (and they have been discussed on Statalist) and there has been a recent proposal for another such statistic with a test and I would like to see this added to Stata; the procedure, with R code, is presented in Sadatsafavi, M and Petkau, J (2024), "Non-parametric inference on calibration of predicted risks," Statistics in Medicine, 43: 3524-3538
2 likes
Comment
daniel klein

Join Date: Mar 2014

Posts: 3821
#317

24 Jul 2024, 13:19

This is copied from another thread:

Given just how often people get confused over encode and destring, I wonder whether encode would benefit from issuing a note when used on variables that contain only numeric characters; something like:

Code:

encode stringvar , generate(numvar) note: stringvar contains only numeric characters; consider using destring

where the word destring should link to the help file (or to the section in the help of encode that explains the difference between encode and destring).
4 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29951
#318

24 Jul 2024, 16:45

Not only do I agree with daniel klein in #317, the warning should be in all caps and bold face. Better still, make it an error to use -encode- with an all numeric characters string variable unless an option -onlynumericok- has been specified. (And maybe require verification with two-factor authentication to confirm--just kidding!)
5 likes
Comment
Erik Reinbergs

Join Date: Oct 2022

Posts: 33
#319

05 Aug 2024, 17:55

In dtable, it's really useful that requesting fvpercent with the , svy option produces the correct survey weighted percentages. An option to present the confidence intervals around those percent estimates would be really useful.

Last edited by Erik Reinbergs; 05 Aug 2024, 17:58. Reason: Edit: Fixed formatting.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29951
#320

05 Aug 2024, 18:54

Re #319. The documentation makes it clear that -dtable- is intended to display descriptive statistics. Confidence intervals are not descriptive, they are inferential statistics and so are outside the scope of -dtable-'s intended use.

Stata has the -svy: proportion- command which produces frequencies and confidence intervals. Moreover, it is structured as an estimation command, so it's results can be -estimates store-d and then tabulated using -etable-, whose raison d'etre is the presentation of estimation results with their inferential statistics.
1 like
Comment
Erik Reinbergs

Join Date: Oct 2022

Posts: 33
#321

05 Aug 2024, 19:35

I hear you, but that does not change that displaying the confidence intervals as an option would be a feature that would be nice for me - hence the post. My understanding might not be completely accurate, but I think of the fvpercent that accounts for the survey design as itself a point inference (the same inference produced by svy: prop) that dtable already displays - hence the desire to also present the confidence intervals. Those confidence intervals are an important part of the description to me ¯\_(ツ)_/¯.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#322

06 Aug 2024, 07:37

I was excited when the project manager came out, but in practice I am not using it anymore. I have been thinking about what changes could be make to let it realize its potential:

Make it possible to give a project the name of a .do file that will be run the moment that project is opened.
This would be somewhat similar to a profile.do file. In my case, when I open a project I would always want to cd to that directory, so that is what I would add to such a "project_profile.do" file. I have been experimenting with sysdir to an ado folder local to a research project. That way all the community contributed packages used in that project are in the local project folder. If someone wants to replicate my stuff, I can just give them a .zip file of the folder, and they have everything they need (It is a bit more complicated because I typically work with individual level data and there are legal and ethical concerns having to do with privacy, but there is nothing Stata can (and should) do about that.) So those sysdir commands would also end up in the "project_profile.do" file

Have commands, in addition to the GUI, that lets you interact with the project like add and remove files and groups
Call me old fashioned, but I like working with commands rather than the GUI. On top of that it would allow me to let my mkproject command set up a project.

Make it possible to add labels to the project, the files, and the groups
I like numbering filenames to avoid such names as final.do, final_final.do, final2.do, and really_final.do, ... However, the downside is that it is hard to know what myproject_dta04.do is doing other than be a part of the data preparation of the myproject project. If I could add a label to that file in the project, then that would make my live easier.

integrate with Git
This is probably a bigger ask, but I suspect that the intersection between the group that is potentially interested in using a project manager and the group is interested in using Git is potentially large enough to at least consider investing in that capability. I have noticed that when working with VS code in other languages I was much more consistent in my commits than when I am working with Stata.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
3 likes
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#323

08 Aug 2024, 16:20

I can't believe I haven't thought of this sooner, but dictionaries would help A LOT for programming (I mean in the sense of python.

In Python, dictionaries are basically data structures with key value pairs. Let's say for some reason we're interested in the color shirt a respondent to a survey wears. In Python, we could do something like

Code:

shirt_dict = {'Alice': ['White', 'Grey'], 'Pablo': ['Grey', 'Blue']}

, where the stuff before the colon is the key and the elements after the colon is a value (a list, in this case). We already have the Stata equivalent of lists, macros. But, dictionaries allow us to index macros to other things that we may care about.

Another example. Say we wish to make a dictionary of the times certain units ever undergo an intervention. In Python, we could do something like

Code:

treatdates= {1989:["California"], 2023:["Wyoming"]}

Now, we have a handy-dandy way of storing these in a singular area. Of course, we can extend this. We could, in Python, store datasets in dictionaries, or indeed lists of datasets in dictionaries.

Of course, I know software limitations exist and I wouldn't expect Stata to become Python, but part of the reason why I love Python (as a 7 year Stata user) is because it's data structures are pretty much undefeated, in my eyes, in terms of how it allows people to store results and organize useful information.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#324

08 Aug 2024, 16:24

Re: #323

I recall William Gould discussing dictionaries briefly in the Mata programming book. I think Mata has this type of object defined as an associative array. However, I don’t think they are nearly as versatile as how Python implements them, though I’m sure you could write your own wrapper to handle some additional functionality.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 808
#325

08 Aug 2024, 19:30

Regarding #323, lists, dictionaries, and arrays are fairly standard across most programming languages, and I also occasionally miss them in Stata. Without some kind of basic array or list type, you can't really build more complex objects. One can build one's own dictionary, but it's very difficult without some kind of array (let alone a heap, binary tree, hashset, or any other basic data structure). You could probably do it with a frame wrapped by a command. That said, I think the reason arrays et al. are missing follows fairly naturally from the language design philosophy. Data goes in the singular dataframe. Named objects are typically avoided with the exception of macros and maybe scalars, which are relegated to advanced topics along with return values. This is so that the syntax for commands doesn't require the user to pass named objects around all the time like you might in R or python. Instead you reference data in the global dataframe.

This opinion may be unpopular, but I don't think Stata (especially ado, but maybe also mata) is well suited to really complex programming in general and it is sometimes awkward for data manipulation tasks (though better than some alternatives). But wow is it an excellent collection of easy to use statistical models with a nice and intuitive syntax for basic problems. Essentially, it is really really good at what it is designed to do. Even python doesn't scale well for really large projects because of the way it handles scope, no private class members, no interfaces, and because of the interpreter. I actually really like languages with explicit type systems and powerful compilers, but wouldn't necessarily want to see that kind of thing implemented in Stata.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 808
#326

08 Aug 2024, 19:40

By the way, I do know that mata supports many of the features I list in #325. I've played around with mata a few times and it's a decent language. I would probably use it if I were creating my own community contributed command and needed the features.
Comment
Erik Reinbergs

Join Date: Oct 2022

Posts: 33
#327

09 Aug 2024, 21:19

sem wish list (implement Mplus features currently not in Stata):
FIML in gsem

Diagram output following sem/gsem commands

WLSMV estimation
1 like
Comment
Mahdi Jafari

Join Date: Jul 2024

Posts: 15
#328

11 Aug 2024, 00:21

Do something so that it gets easier to work with data especially with matrix in Stata. I wish I could work with data in Stata as in Python.
One obvious caveat is that in Stata you can't define a scalar variable an do some basic math to calculate desired outcome (yes, it is possible but very confusing and not handy at all)
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#329

11 Aug 2024, 04:29

Mahdi Jafari On Python versus Stata, it'd be helpful if you could elaborate more specifically what you mean, providing examples. And have you checked out Mata (bundled for free with Stata), which is matrix-orientated. On your seond point, I don't understand your point sufficiently well -- please clarify. (I find -display- is a convenient way to do some 'basic math' calculations. Also look into the possibilities of using local and global macros to hold the results of 'basic math' calculations.)
Comment
Mahdi Jafari

Join Date: Jul 2024

Posts: 15
#330

11 Aug 2024, 06:26

Originally posted by Stephen Jenkins View Post

Mahdi Jafari On Python versus Stata, it'd be helpful if you could elaborate more specifically what you mean, providing examples. And have you checked out Mata (bundled for free with Stata), which is matrix-orientated. On your second point, I don't understand your point sufficiently well -- please clarify. (I find -display- is a convenient way to do some 'basic math' calculations. Also look into the possibilities of using local and global macros to hold the results of 'basic math' calculations.)

I didn't know about Mata. Thank you. I'll look into it, although it seems pretty complicated with many new syntax at first glance.
Specifically, I would like to work with my dataset in Stata as I work with them in Python Dataframe style.
More rigorously, I wish I could reach out to any cell, row, column and arbitrary subset of data in Stata and do whatever I want with them, use or combine the results with the original dataset. I know there are ways to get what I described but as I said they are not handy enough.

Compare how easier it is to write a GMM/MLE program in Python v.s. Stata

Last edited by Mahdi Jafari; 11 Aug 2024, 06:30.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment