  • add weight options to egen mean


    • Could you expand the column for value labels after --svy: tabulate--? Right now it only shows 8 characters, and when labels start with similar words one has to run --tabulate varname-- to see the full labels. Minor inconvenience, but since there's lots of room in the results table, maybe an easy change?


      • I would like to see a het option added to gsem, similar to the option that is already available in hetprobit, hetoprobit, and the user-written oglm.

        AI claims there already is a het option in gsem, and I think we should do everything we can to keep AI from looking bad. ;-)
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 18.5 MP (2 processor)

        EMAIL: [email protected]


        • I would like for the expression "x > y" to evaluate to missing when x is missing.

          I have been programming for many years and cannot imagine why someone would *want* for "missing" to be treated as "infinity." If it simplifies things for Stata's calculations internally, let that be an implementation detail which is hidden from the user. As it stands, all it does is make code harder to reason about, and make it very easy to get mysterious bugs. It is unintuitive and surprising behavior.

          To support backward compatibility with Stata scripts that intentionally use "missing" to mean "infinity," or use numerical comparison to check for missingness, this should probably be controlled by a setting.


          • There are problems with the suggestion in #424. To see the problem, suppose we want to evaluate the expression -(x > y) | (a > b)-, and suppose that x is missing. If the x > y part evaluates to missing, then how do we evaluate missing | (a > b)? In the current situation, missing, like all non-zero numeric expressions, is treated as true when logical operators are applied. So the net result would necessarily be true. But that seems inconsistent with the intention of #424, which, I presume, is that if x is missing we really don't know whether x > y or not, so we shouldn't call it true: it really should equal the truth value of (a > b). I could come up with other similar situations where the evaluation of logical expressions would become paradoxical if we adopted the convention proposed in #424. I think the only way out of this would be to convert all logical operators to 3-valued logic.

            Now, I would have no objection to moving to 3-valued logic: I not infrequently find myself having to emulate 3-valued logic in my code, and would be happy to have it built-in. But that's a major change, and people who are not used to working with 3-valued logic might find this as great a difficulty as adjusting to the current convention that missing values are greater than any real number.

            But I think adopting the convention that x > y evaluates to missing when x is missing while still keeping two-valued logic would be the worst of both worlds.


            • I would also prefer a 3 value logic system in Stata and I think that is what #424 really wants. This has been debated on this forum before (and reportedly many times at Stata Corp) but Brandon Istenes, I'm skeptical this change will be made at this point because it will likely break a mountain of legacy code.

              adopting the convention that x > y evaluates to missing when x is missing while still keeping two-valued logic would be the worst of both worlds.
              Last edited by Daniel Schaefer; 21 Feb 2025, 13:38.


              • Although (just thinking about this a bit more) there is a versioning system that supports code written on older Stata standards already. Maybe it's possible after all.


                • Yes, a well-designed three value logic would allow consistent and intuitive behavior.

                  I would point out that there are preferable two-value alternatives to the current behavior. For example, having `x > y` resolve to `missing` and `missing` be treated as false/0 by logic operators would allow reasonably intuitive resolutions for -(x > y) | (a > b)- and -(x > y) & (a > b)-. The former would resolve to -(a > b)- as desired and the latter would resolve to false, which makes some intuitive sense because "missing" is not "true" and "a & b" is expected to resolve to true iff a and b are both true. A surprising formulation would be -!(x > y)- which would resolve to true. But then the user should just use -x <= y-, which would resolve to missing as expected. The surprising formulations with this behavior would be much rarer and a bit less surprising. At present, one has to guard against completely nonsensical results ("missing is bigger than a million") every time one uses an inequality operator.

                  All that said, yes a three value logic would allow for the best results.


                  • Reupping a request from a couple versions ago to expand the format settings for tables beyond what is accomplished using the set cformat, set pformat, and set sformat commands.
                    help set_cformat
                    For instance it would be valuable to have something like a set tformat command to control with a single command the formatting of elements of results tables generated by commands like corr, sum, etc. (E.g. suppose one desires display of a correlation matrix where each element in the reported matrix is formatted as something like %3.2f)

                    An alternative would be an option within each command to accomplish the same objective.


                    • Another request: the Command pane should support
                      line continuations,
                      comments, and
                      /* */

                      This is becoming increasingly important as IDEs become more powerful, and now with AI support. On Linux I would just use the Stata REPL in my IDE (Cursor, a VSCode fork), but there's no Stata REPL for Windows. So I have to copy-paste code from my do-files into the Command window. That means I have to limit myself (and AI assistants) to code that works in the Command pane.

                      An alternative workflow would be to open the file in the Do-file Editor at the same time and Ctrl-D run code from there. However, the Do-file Editor does not auto-reload changed files, so every time I wanted to run changes, I would have to close the file and re-open it, which is not a viable workflow.

                      So I guess there are three feature requests here, and I've just headlined what I assume is the easy one (the first):
                      - Full comment syntax support in Command pane
                      - A Windows REPL
                      - Do-file editor should auto-reload changed files


                      • Allow one to scale scatterplot by size of a third variable exactly (for example, by area). See
                        Last edited by Todd Jones; 25 Feb 2025, 11:28.


                        • On #424 and sequels, I will add my own quasi-political reading that this isn't going to change. People can have endless fun and frustration discussing whether the original decision for two-valued logic was wrong, but it isn't going to change.

                          It's anecdotal, but relevant, that I've heard two presentations on three-way logic, in which someone argued for three-way logic being built into Stata, either optionally or just overwriting the present logic, after which the audience seemed to divide up into three

                          1. The present logic does bite occasionally, but it's what it is. Leave well alone.

                          2. Three-way logic is needed, but not at all the present proposal, which is arbitrary and illogical and won't extend at all easily or intuitively to more complicated evaluations.

                          3. Great idea (the speaker, alone).

                          To avoid more repetition, let's address the idea that this could be implemented under version control and through a setting. Sounds good, until

                          * as a reader of someone else's code I have to work out what their preferred setting is (seriously, I can't know unless they say)

                          * I am obliged to check code for whether it changes the setting (and switches it back when done?) Note that this means anywhere in a chain of do-files and ado-files following such a setting OR in a that almost no-one ever shows

                          * naive users get bitten too because this is yet another thing to check for -- and the answer is, well, different results are possible because of someone's setting

                          * this undermines all previous explanations in text books, course material, etc., etc.

                          As for "programming for many years", I will throw in my number, which is 52, not that that validates (or invalidates) anything above.


                          • On the fly compression for .dtas files. Current implementation saves a normal .dta file in temporary path to then call zip and compress it. This is heavily IO limited both in terms of space and speed. Stata binary files are already large (somebody already suggested columnar storage/parquet), a better implementation of compression might help us with big datasets.


                            • Code:
                              egen mpg_b = resample(mpg) [ , noreplacement block(#) n(#) cluster(varname) weight(varname) capturedrop ]


                              • One feature that I highly enjoy from other IDEs (e.g. VS Code) is the ability to highlight similar strings in my code with selection:
                                when I select a string within my code with mouse click-drag or double-click:
                                e.g. "variable_temperature" every other instance of that string "variable_temperature" gets highlighted too which is usefull to quckly navigate through my code or check if I used this somewhere already without losing my workflow. This can also be achived with already existing Str+F (find) which shows me other instances of that string but jumps around in the code, which I find rather distracting. I would very much welcome this feature when I edit my do files in the do-file Editor.

                                gen variable_temperature =. //highlighted when other, similar string is selected
                                (do other stuff)
                                rename variable_temperature temp //selected via double-click or mouse drag
                                INFO: please remark that here I don't refer to "string" as a datatype, but rather just a piece of text from my code.

