Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Request syntax assistance with v17 [table] command to get table of frequency in descending frequency order

    I apologize for even asking, assuming there is something very simple I simply cannot find.

    As [table] is the new go-to tabular presentation and analysis command in version 17, and I very much appreciate the ability to manipulate styles and easily export the results, I am trying to change my workflow to use this command instead of say [tabulate] for summary representations.

    What I cannot figure out how to do, using the new [table] command, is sort summary results by order of (descending) frequency, which is extremely easy to do in tabulate. I can't seem to find a "sort", "rowsort", or "colsort" equivalent.

    I can definitely get the tables I want with other commands, so that is not by itself the problem. I just can't do it with [table]. And I can get creative with value labels and renaming, and [bysort v:replace vnum = _N] etc., but that seems inelegant. Likewise, I am familiar with several SSC packages that address this challenge, but I'd prefer to do this within Stata proper if possible.

    The problem I'd like to solve is described below:

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1 v
    "a"
    "a"
    "b"
    "b"
    "b"
    "c"
    "c"
    "c"
    "c"
    "d"
    "d"
    "d"
    "d"
    "e"
    end


    This is the table I'd let to get out of [table], that is trivial to get out of [tabulate]:

    tabulate v, sort

    v | Freq. Percent Cum.
    ------------+-----------------------------------
    c | 4 28.57 28.57
    d | 4 28.57 57.14
    b | 3 21.43 78.57
    a | 2 14.29 92.86
    e | 1 7.14 100.00
    ------------+-----------------------------------
    Total | 14 100.00


    This is the closest I can come in a single command - but I can't figure out how to sort by descending frequency.

    table v, stat(freq) stat(perc)

    ------------------------------
    | Frequency Percent
    --------+---------------------
    v |
    a | 2 14.29
    b | 3 21.43
    c | 4 28.57
    d | 4 28.57
    e | 1 7.14
    Total | 14 100.00
    ------------------------------


    It seems no matter what I do, the [v] column is sorted by the value of [v] and not by the order of the calculated summary statistics.

    Again, apologies for being inept. A search through StataList and through the help files and manuals has not delivered the syntax or an example I am seeking.

  • #2
    The table you seek is not trivially produced by -table-, and seems impossible without considerable pre-programming. You may be interested in this FAQ by Nick Cox for a more direct approach.

    Comment


    • #3
      So after 45 minutes of tinkering, I got this to work. Right now it handles only non-missing values. Perhaps this can be a starting point if you want to use -tables-. I'm by no means an expert with tables, so there may be some inefficiencies here.

      As a toy example, I'll use the auto dataset to try to replicate the following table. I use -fre- by Ben Jann and -elabel- by Daniel Klein, both from SSC.

      Code:
      . fre repair , desc nomiss
      
      repair -- Repair record 1978
      -----------------------------------------
             |      Freq.    Percent       Cum.
      -------+---------------------------------
       3 c   |         30      43.48      43.48
       4 d   |         18      26.09      69.57
       5 e   |         11      15.94      85.51
       2 b   |          8      11.59      97.10
       1 a   |          2       2.90     100.00
       Total |         69     100.00           
      -----------------------------------------
      The example code:

      Code:
      clear *
      cls
      
      sysuse auto
      keep rep78
      rename rep78 repair
      
      /* I'm adding a label to -rep78- to make the example more realistic */
      lab def repair 1 "a", modify
      lab def repair 2 "b", modify
      lab def repair 3 "c", modify
      lab def repair 4 "d", modify
      lab def repair 5 "e", modify
      label values repair repair
      
      preserve
      contract repair
      drop if mi(repair)
      gsort -_freq repair   // sort by descending frequency of repair
      
      /*
      I will need to compute cumulative percent to match the final layout in the table.
      But, -table- will list factor variables in ascending sequence by numeric value of repair.
      Thus, I clone -repair- to -xrepair- and relevel it, in order to match the numeric order with the
      desired display order.
      */
      qui levelsof repair, local(oldlvls)
      local newlvls
      forval i = 1/`=_N' {
        local newlvls `newlvls' `=repair[`i']'
      }
      local recodelvls
      forval i = 1/`=_N' {
        local recodelvls `recodelvls' (`: word `i' of `newlvls'' = `: word `i' of `oldlvls'')
      }
      
      clonevar xrepair=repair
      recode xrepair `recodelvls'
      
      elabel copy repair xrepair
      elabel recode xrepair `recodelvls'
      label values xrepair xrepair
      
      * generate cumulative percentage
      gen _cumpct = sum(_freq)
      summ _freq, meanonly
      qui replace _cumpct = _cumpct / `r(sum)' * 100
      
      * Now start making the -table-
      collect create MyTable, replace
      table (xrepair) () [fw=_freq] ,  ///
            stat(sumw) stat(percent)  ///
            name(MyTable) append
      table (xrepair) () [fw=_freq], ///
            stat(firstnm _cumpct) nototals ///
            name(MyTable) append
            
      collect layout (xrepair) (result[sumw percent firstnm])
      collect recode result sumw     = freq ///
                            percent  = pct  ///
                            firstnm  = cumpct
      collect label levels result freq   "Frequency" ///
                                  pct    "Percent"   ///
                                  cumpct "Cum. Percent" , modify
      collect layout (xrepair) (result[freq pct cumpct])
      collect style cell result[freq] , nformat(%9.0fc)
      collect style cell result[pct cumpct] , nformat(%9.1f) sformat("%s%%")
      collect style cell border_block, border(right, pattern(nil))
      collect style cell xrepair[.m] , border(top, pattern(solid))
      collect preview
      
      restore
      Final result is

      Code:
      . collect preview
      
      ------------------------------------------------------
                          Frequency   Percent   Cum. Percent
      ------------------------------------------------------
      Repair record 1978                                    
        c                        30     43.5%          43.5%
        d                        18     26.1%          69.6%
        e                        11     15.9%          85.5%
        b                         8     11.6%          97.1%
        a                         2      2.9%         100.0%
      ------------------------------------------------------
        Total                    69    100.0%               
      ------------------------------------------------------
      Last edited by Leonardo Guizzetti; 29 Aug 2021, 21:48.

      Comment


      • #4
        Looks like this is an old post, but I have figured out how to do this in my own work, if that is helpful. The trick involves using preserve/restore, within the preserve collapsing to get a variable that is the frequency of each category, having a text variable that has the value labels, using egen rank to create a variable that is the rank order of the frequencies, then using labmask to label the values of the rank variable. I use this often, so I created a file with all of the variations of the variables that I can merge into analysis files when I wish.

        Comment


        • #5
          Consider also variations on


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str1 v
          "a"
          "a"
          "b"
          "b"
          "b"
          "c"
          "c"
          "c"
          "c"
          "d"
          "d"
          "d"
          "d"
          "e"
          end
          
          myaxis V = v, sort(count) descending
          
          table V
          
          --------------------
                  |  Frequency
          --------+-----------
          v       |          
            c     |          4
            d     |          4
            b     |          3
            a     |          2
            e     |          1
            Total |         14
          --------------------
          myaxis is from the Stata Journal and was written up in https://www.stata-journal.com/articl...article=st0654

          There was a quicker summary at https://www.statalist.org/forums/for...e-or-graph-use

          The usual 3 years' delay before the Stata Journal paper emerges from behind a paywall does not apply, given agreements between British universities and journal publishers.

          What myaxis does in this case is, naturally, broadly equivalent to @Pamela Oliver's solution, but no preserve restore step is needed or used.

          This solution fails @Ken Goldberg's criterion of using official Stata only.

          Detail: labmask mentioned in #4 is also from the Stata Journal.
          Last edited by Nick Cox; 07 Feb 2022, 18:35.

          Comment

          Working...
          X