Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sorting variables (meaning variable names) by their means

    In a thread yesterday https://www.statalist.org/forums/for...nput-variables I had need, or at least desire, to re-order a list of variable names by their means. The context was improving the legend of a graph.

    Then I alluded to a quick-and-dirty command written quickly for the purpose but did not show its code. (It involved creating a new dataset temporarily.)

    This morning (local time, as usual) I wrote a more presentable version, but have yet to write any help, partly because in practice that is just about as much work, and partly because in principle I wonder whether I am missing something already written, and am mindful of what I call the YABA (*) problem, which perhaps is best encapsulated by this dialog[ue] which you may have experienced, inwardly or otherwise:

    Enthusiast for X: A great plus about X is that there are thousands of packages to choose from!

    Sceptic about X: A great minus about X is that there are thousands of packages to choose from!

    I leave open whether X fits Stata, or indeed any other software you may know (about).

    Here is my code, with a request for comments on whether this is another wheel re-invented:

    Code:
    *! 1.0.0 NJC 29 March 2023
    program sortmean
        // invtokens() introduced in Stata 10
        version 10
        
        syntax [varlist] [if] [in] [, ALLobs DESCending]
        
        quietly {
            ds `varlist', has(type numeric)  
            local varlist "`r(varlist)'"
            
            if "`allobs'" != "" {
                marksample touse, novarlist
            }
            else marksample touse
    
            count if `touse'
            if r(N) == 0 error 2000
    
            local direction = cond("`descending'" != "", -1, 1)
    
            mata: _sortmean("`varlist'", "`touse'", `direction')
        }
    
        di "`sortlist'"
        c_local sortlist `sortlist'
    end
    
    mata
    
    void _sortmean(string vector varnames, string scalar tousename, real scalar direction) {
        real matrix data
        real vector means
        string vector names
        st_view(data = ., ., varnames, tousename)
    
        if (sum(missing(data)) > 0) {
            means = colsum(data) :/ colsum(data :< .)
            means = means'
        }
        else means = mean(data)'
    
        names = tokens(varnames)'
        names = names[order(means, direction)]
        st_local("sortlist", invtokens(names'))  
    }
     
    end
    Some would want to solve this using frames but I am not especially fluent with frames and also mindful that a solution with frames disenfranchises many people still using old versions of Stata and lacking funds to upgrade. It would be surprising now to find many people using versions before 10.

    The analog[ue] problem of ordering predictors by their "importance" in a model fit evokes some interest, but no enthusiasm.

    Here are some silly examples:

    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . sortmean mpg length price weight
    mpg length weight price
    
    . sortmean mpg length price weight, desc
    price weight length mpg
    
    . di "`sortlist'"
    price weight length mpg
    
    . tabstat `sortlist', c(s)
    
        Variable |      Mean
    -------------+----------
           price |  6165.257
          weight |  3019.459
          length |  187.9324
             mpg |   21.2973
    ------------------------
    * YABA : Yet Another Blasted Add-on (**)

    ** Any reminiscences of https://www.youtube.com/watch?v=GPjp84cjEXM ?
    Last edited by Nick Cox; 29 Mar 2023, 08:51.

  • #2
    Seems similar to vorter (SSC); see this post to the old listserver for an early reference and this post for a recent mentioning. I added sorting (i.e., ordering) by statistics in 2015.

    Comment


    • #3
      daniel klein Thanks very much for the reference; I will check it out.

      Comment


      • #4
        daniel klein is right, unsurprisingly. vorter can do something very similar (and several other things too). Here is the similarity and the difference:


        Code:
        . sysuse auto, clear
        (1978 automobile data)
        
        . vorter (mean) price mpg weight, not
        
        . ret li
        
        macros:
                    r(varlist) : "mpg weight price"
                     r(oorder) : "price mpg weight"
                     r(corder) : "make price mpg rep78 headroom trunk weight length turn displacement gea.."
        
        matrices:
                       r(mean) :  1 x 3
        
        . sortmean price mpg weight
        mpg weight price
        
        . di "`sortlist'"
        mpg weight price
        For what I want, sortmean is very specific but makes the lightest possible touch to the data by producing only a local macro.

        If you've come to this thread because of its title, vorter is more likely to be what you want.

        Comment


        • #5
          An update: a revised command sortmean is now bundled with the upsetplot and vennbar packages on SSC. It is just a small deal and so won't be posted or advertised by us as a separate package, but only in conjunction with its use in support of those other commands.

          Anyone curious can get more information by typing

          Code:
          ssc type sortmean.sthlp
          Renewed thanks to daniel klein not only for flagging vorter on SSC but also for detailed suggestions in correspondence.

          Comment

          Working...
          X