Wishlist for Stata 19

George Ford

Join Date: Aug 2014

Posts: 3118
#421

12 Feb 2025, 13:29

add weight options to egen mean
2 likes
Comment
Arnold Levinson

Join Date: Jun 2014

Posts: 13
#422

18 Feb 2025, 11:44

Could you expand the column for value labels after --svy: tabulate--? Right now it only shows 8 characters, and when labels start with similar words one has to run --tabulate varname-- to see the full labels. Minor inconvenience, but since there's lots of room in the results table, maybe an easy change?
Thx...arnold
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4940
#423

21 Feb 2025, 07:37

I would like to see a het option added to gsem, similar to the option that is already available in hetprobit, hetoprobit, and the user-written oglm.

AI claims there already is a het option in gsem, and I think we should do everything we can to keep AI from looking bad. ;-)

https://www.statalist.org/forums/for...option-in-gsem

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Brandon Istenes

Join Date: Feb 2025

Posts: 3
#424

21 Feb 2025, 12:31

I would like for the expression "x > y" to evaluate to missing when x is missing.

I have been programming for many years and cannot imagine why someone would *want* for "missing" to be treated as "infinity." If it simplifies things for Stata's calculations internally, let that be an implementation detail which is hidden from the user. As it stands, all it does is make code harder to reason about, and make it very easy to get mysterious bugs. It is unintuitive and surprising behavior.

To support backward compatibility with Stata scripts that intentionally use "missing" to mean "infinity," or use numerical comparison to check for missingness, this should probably be controlled by a setting.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29948
#425

21 Feb 2025, 12:48

There are problems with the suggestion in #424. To see the problem, suppose we want to evaluate the expression -(x > y) | (a > b)-, and suppose that x is missing. If the x > y part evaluates to missing, then how do we evaluate missing | (a > b)? In the current situation, missing, like all non-zero numeric expressions, is treated as true when logical operators are applied. So the net result would necessarily be true. But that seems inconsistent with the intention of #424, which, I presume, is that if x is missing we really don't know whether x > y or not, so we shouldn't call it true: it really should equal the truth value of (a > b). I could come up with other similar situations where the evaluation of logical expressions would become paradoxical if we adopted the convention proposed in #424. I think the only way out of this would be to convert all logical operators to 3-valued logic.

Now, I would have no objection to moving to 3-valued logic: I not infrequently find myself having to emulate 3-valued logic in my code, and would be happy to have it built-in. But that's a major change, and people who are not used to working with 3-valued logic might find this as great a difficulty as adjusting to the current convention that missing values are greater than any real number.

But I think adopting the convention that x > y evaluates to missing when x is missing while still keeping two-valued logic would be the worst of both worlds.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 808
#426

21 Feb 2025, 13:19

I would also prefer a 3 value logic system in Stata and I think that is what #424 really wants. This has been debated on this forum before (and reportedly many times at Stata Corp) but Brandon Istenes, I'm skeptical this change will be made at this point because it will likely break a mountain of legacy code.

adopting the convention that x > y evaluates to missing when x is missing while still keeping two-valued logic would be the worst of both worlds.

Last edited by Daniel Schaefer; 21 Feb 2025, 13:38.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 808
#427

21 Feb 2025, 15:01

Although (just thinking about this a bit more) there is a versioning system that supports code written on older Stata standards already. Maybe it's possible after all.
1 like
Comment
Brandon Istenes

Join Date: Feb 2025

Posts: 3
#428

23 Feb 2025, 10:57

Yes, a well-designed three value logic would allow consistent and intuitive behavior.

I would point out that there are preferable two-value alternatives to the current behavior. For example, having `x > y` resolve to `missing` and `missing` be treated as false/0 by logic operators would allow reasonably intuitive resolutions for -(x > y) | (a > b)- and -(x > y) & (a > b)-. The former would resolve to -(a > b)- as desired and the latter would resolve to false, which makes some intuitive sense because "missing" is not "true" and "a & b" is expected to resolve to true iff a and b are both true. A surprising formulation would be -!(x > y)- which would resolve to true. But then the user should just use -x <= y-, which would resolve to missing as expected. The surprising formulations with this behavior would be much rarer and a bit less surprising. At present, one has to guard against completely nonsensical results ("missing is bigger than a million") every time one uses an inequality operator.

All that said, yes a three value logic would allow for the best results.
Comment
John Mullahy

Join Date: Dec 2016

Posts: 742
#429

23 Feb 2025, 12:46

Reupping a request from a couple versions ago to expand the format settings for tables beyond what is accomplished using the set cformat, set pformat, and set sformat commands.

Code:

help set_cformat

For instance it would be valuable to have something like a set tformat command to control with a single command the formatting of elements of results tables generated by commands like corr, sum, etc. (E.g. suppose one desires display of a correlation matrix where each element in the reported matrix is formatted as something like %3.2f)

An alternative would be an option within each command to accomplish the same objective.
2 likes
Comment
Brandon Istenes

Join Date: Feb 2025

Posts: 3
#430

25 Feb 2025, 10:08

Another request: the Command pane should support

Code:

///

line continuations,

Code:

//

comments, and

Code:

/* */

comments.

This is becoming increasingly important as IDEs become more powerful, and now with AI support. On Linux I would just use the Stata REPL in my IDE (Cursor, a VSCode fork), but there's no Stata REPL for Windows. So I have to copy-paste code from my do-files into the Command window. That means I have to limit myself (and AI assistants) to code that works in the Command pane.

An alternative workflow would be to open the file in the Do-file Editor at the same time and Ctrl-D run code from there. However, the Do-file Editor does not auto-reload changed files, so every time I wanted to run changes, I would have to close the file and re-open it, which is not a viable workflow.

So I guess there are three feature requests here, and I've just headlined what I assume is the easy one (the first):
- Full comment syntax support in Command pane
- A Windows REPL
- Do-file editor should auto-reload changed files
2 likes
Comment
Todd Jones

Join Date: Oct 2020

Posts: 43
#431

25 Feb 2025, 11:24

Allow one to scale scatterplot by size of a third variable exactly (for example, by area). See https://www.statalist.org/forums/for...third-variable.

Last edited by Todd Jones; 25 Feb 2025, 11:28.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#432

27 Feb 2025, 06:37

On #424 and sequels, I will add my own quasi-political reading that this isn't going to change. People can have endless fun and frustration discussing whether the original decision for two-valued logic was wrong, but it isn't going to change.

It's anecdotal, but relevant, that I've heard two presentations on three-way logic, in which someone argued for three-way logic being built into Stata, either optionally or just overwriting the present logic, after which the audience seemed to divide up into three

1. The present logic does bite occasionally, but it's what it is. Leave well alone.

2. Three-way logic is needed, but not at all the present proposal, which is arbitrary and illogical and won't extend at all easily or intuitively to more complicated evaluations.

3. Great idea (the speaker, alone).

To avoid more repetition, let's address the idea that this could be implemented under version control and through a setting. Sounds good, until

* as a reader of someone else's code I have to work out what their preferred setting is (seriously, I can't know unless they say)

* I am obliged to check code for whether it changes the setting (and switches it back when done?) Note that this means anywhere in a chain of do-files and ado-files following such a setting OR in a profile.do that almost no-one ever shows

* naive users get bitten too because this is yet another thing to check for -- and the answer is, well, different results are possible because of someone's setting

* this undermines all previous explanations in text books, course material, etc., etc.

As for "programming for many years", I will throw in my number, which is 52, not that that validates (or invalidates) anything above.
4 likes
Comment
alejoforero

Join Date: Sep 2014

Posts: 50
#433

28 Feb 2025, 10:17

On the fly compression for .dtas files. Current implementation saves a normal .dta file in temporary path to then call zip and compress it. This is heavily IO limited both in terms of space and speed. Stata binary files are already large (somebody already suggested columnar storage/parquet), a better implementation of compression might help us with big datasets.
1 like
Comment

George Ford

Join Date: Aug 2014
Posts: 3118

#434

01 Mar 2025, 08:02

Code:

egen mpg_b = resample(mpg) [ , noreplacement block(#) n(#) cluster(varname) weight(varname) capturedrop ]

Comment

Tobias Bergsmann

Join Date: Feb 2023

Posts: 5
#435

05 Mar 2025, 06:45

One feature that I highly enjoy from other IDEs (e.g. VS Code) is the ability to highlight similar strings in my code with selection:
when I select a string within my code with mouse click-drag or double-click:
e.g. "variable_temperature" every other instance of that string "variable_temperature" gets highlighted too which is usefull to quckly navigate through my code or check if I used this somewhere already without losing my workflow. This can also be achived with already existing Str+F (find) which shows me other instances of that string but jumps around in the code, which I find rather distracting. I would very much welcome this feature when I edit my do files in the do-file Editor.

Code:

gen variable_temperature =. //highlighted when other, similar string is selected (do other stuff) rename variable_temperature temp //selected via double-click or mouse drag

INFO: please remark that here I don't refer to "string" as a datatype, but rather just a piece of text from my code.
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment