Sorting variables (meaning variable names) by their means

Nick Cox

Join Date: Mar 2014

Posts: 35809
#1

Sorting variables (meaning variable names) by their means

29 Mar 2023, 08:32

In a thread yesterday https://www.statalist.org/forums/for...nput-variables I had need, or at least desire, to re-order a list of variable names by their means. The context was improving the legend of a graph.

Then I alluded to a quick-and-dirty command written quickly for the purpose but did not show its code. (It involved creating a new dataset temporarily.)

This morning (local time, as usual) I wrote a more presentable version, but have yet to write any help, partly because in practice that is just about as much work, and partly because in principle I wonder whether I am missing something already written, and am mindful of what I call the YABA (*) problem, which perhaps is best encapsulated by this dialog[ue] which you may have experienced, inwardly or otherwise:

Enthusiast for X: A great plus about X is that there are thousands of packages to choose from!

Sceptic about X: A great minus about X is that there are thousands of packages to choose from!

I leave open whether X fits Stata, or indeed any other software you may know (about).

Here is my code, with a request for comments on whether this is another wheel re-invented:

Code:

*! 1.0.0 NJC 29 March 2023 program sortmean // invtokens() introduced in Stata 10 version 10 syntax [varlist] [if] [in] [, ALLobs DESCending] quietly { ds `varlist', has(type numeric) local varlist "`r(varlist)'" if "`allobs'" != "" { marksample touse, novarlist } else marksample touse count if `touse' if r(N) == 0 error 2000 local direction = cond("`descending'" != "", -1, 1) mata: _sortmean("`varlist'", "`touse'", `direction') } di "`sortlist'" c_local sortlist `sortlist' end mata void _sortmean(string vector varnames, string scalar tousename, real scalar direction) { real matrix data real vector means string vector names st_view(data = ., ., varnames, tousename) if (sum(missing(data)) > 0) { means = colsum(data) :/ colsum(data :< .) means = means' } else means = mean(data)' names = tokens(varnames)' names = names[order(means, direction)] st_local("sortlist", invtokens(names')) } end

Some would want to solve this using frames but I am not especially fluent with frames and also mindful that a solution with frames disenfranchises many people still using old versions of Stata and lacking funds to upgrade. It would be surprising now to find many people using versions before 10.

The analog[ue] problem of ordering predictors by their "importance" in a model fit evokes some interest, but no enthusiasm.

Here are some silly examples:

Code:

. sysuse auto, clear (1978 automobile data) . sortmean mpg length price weight mpg length weight price . sortmean mpg length price weight, desc price weight length mpg . di "`sortlist'" price weight length mpg . tabstat `sortlist', c(s) Variable | Mean -------------+---------- price | 6165.257 weight | 3019.459 length | 187.9324 mpg | 21.2973 ------------------------

* YABA : Yet Another Blasted Add-on (**)

** Any reminiscences of https://www.youtube.com/watch?v=GPjp84cjEXM ?

Last edited by Nick Cox; 29 Mar 2023, 08:51.
Tags: None

2 likes
daniel klein

Join Date: Mar 2014

Posts: 3890
#2

29 Mar 2023, 14:07

Seems similar to vorter (SSC); see this post to the old listserver for an early reference and this post for a recent mentioning. I added sorting (i.e., ordering) by statistics in 2015.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35809
#3

29 Mar 2023, 14:10

daniel klein Thanks very much for the reference; I will check it out.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35809
#4

30 Mar 2023, 04:33

daniel klein is right, unsurprisingly. vorter can do something very similar (and several other things too). Here is the similarity and the difference:

Code:

. sysuse auto, clear (1978 automobile data) . vorter (mean) price mpg weight, not . ret li macros: r(varlist) : "mpg weight price" r(oorder) : "price mpg weight" r(corder) : "make price mpg rep78 headroom trunk weight length turn displacement gea.." matrices: r(mean) : 1 x 3 . sortmean price mpg weight mpg weight price . di "`sortlist'" mpg weight price

For what I want, sortmean is very specific but makes the lightest possible touch to the data by producing only a local macro.

If you've come to this thread because of its title, vorter is more likely to be what you want.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35809
#5

09 Apr 2023, 05:36

An update: a revised command sortmean is now bundled with the upsetplot and vennbar packages on SSC. It is just a small deal and so won't be posted or advertised by us as a separate package, but only in conjunction with its use in support of those other commands.

Anyone curious can get more information by typing

Code:

ssc type sortmean.sthlp

Renewed thanks to daniel klein not only for flagging vorter on SSC but also for detailed suggestions in correspondence.
Comment

Announcement

Sorting variables (meaning variable names) by their means

Comment

Comment

Comment

Comment