You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
Could you expand the column for value labels after --svy: tabulate--? Right now it only shows 8 characters, and when labels start with similar words one has to run --tabulate varname-- to see the full labels. Minor inconvenience, but since there's lots of room in the results table, maybe an easy change?
Thx...arnold
I would like for the expression "x > y" to evaluate to missing when x is missing.
I have been programming for many years and cannot imagine why someone would *want* for "missing" to be treated as "infinity." If it simplifies things for Stata's calculations internally, let that be an implementation detail which is hidden from the user. As it stands, all it does is make code harder to reason about, and make it very easy to get mysterious bugs. It is unintuitive and surprising behavior.
To support backward compatibility with Stata scripts that intentionally use "missing" to mean "infinity," or use numerical comparison to check for missingness, this should probably be controlled by a setting.
There are problems with the suggestion in #424. To see the problem, suppose we want to evaluate the expression -(x > y) | (a > b)-, and suppose that x is missing. If the x > y part evaluates to missing, then how do we evaluate missing | (a > b)? In the current situation, missing, like all non-zero numeric expressions, is treated as true when logical operators are applied. So the net result would necessarily be true. But that seems inconsistent with the intention of #424, which, I presume, is that if x is missing we really don't know whether x > y or not, so we shouldn't call it true: it really should equal the truth value of (a > b). I could come up with other similar situations where the evaluation of logical expressions would become paradoxical if we adopted the convention proposed in #424. I think the only way out of this would be to convert all logical operators to 3-valued logic.
Now, I would have no objection to moving to 3-valued logic: I not infrequently find myself having to emulate 3-valued logic in my code, and would be happy to have it built-in. But that's a major change, and people who are not used to working with 3-valued logic might find this as great a difficulty as adjusting to the current convention that missing values are greater than any real number.
But I think adopting the convention that x > y evaluates to missing when x is missing while still keeping two-valued logic would be the worst of both worlds.
I would also prefer a 3 value logic system in Stata and I think that is what #424 really wants. This has been debated on this forum before (and reportedly many times at Stata Corp) but Brandon Istenes, I'm skeptical this change will be made at this point because it will likely break a mountain of legacy code.
adopting the convention that x > y evaluates to missing when x is missing while still keeping two-valued logic would be the worst of both worlds.
Although (just thinking about this a bit more) there is a versioning system that supports code written on older Stata standards already. Maybe it's possible after all.
Yes, a well-designed three value logic would allow consistent and intuitive behavior.
I would point out that there are preferable two-value alternatives to the current behavior. For example, having `x > y` resolve to `missing` and `missing` be treated as false/0 by logic operators would allow reasonably intuitive resolutions for -(x > y) | (a > b)- and -(x > y) & (a > b)-. The former would resolve to -(a > b)- as desired and the latter would resolve to false, which makes some intuitive sense because "missing" is not "true" and "a & b" is expected to resolve to true iff a and b are both true. A surprising formulation would be -!(x > y)- which would resolve to true. But then the user should just use -x <= y-, which would resolve to missing as expected. The surprising formulations with this behavior would be much rarer and a bit less surprising. At present, one has to guard against completely nonsensical results ("missing is bigger than a million") every time one uses an inequality operator.
All that said, yes a three value logic would allow for the best results.
Reupping a request from a couple versions ago to expand the format settings for tables beyond what is accomplished using the set cformat, set pformat, and setsformat commands.
Code:
help set_cformat
For instance it would be valuable to have something like a set tformat command to control with a single command the formatting of elements of results tables generated by commands like corr, sum, etc. (E.g. suppose one desires display of a correlation matrix where each element in the reported matrix is formatted as something like %3.2f)
An alternative would be an option within each command to accomplish the same objective.
Comment