Request syntax assistance with v17 [table] command to get table of frequency in descending frequency order

Ken Goldberg

Join Date: Aug 2021

Posts: 3
#1

Request syntax assistance with v17 [table] command to get table of frequency in descending frequency order

29 Aug 2021, 10:45

I apologize for even asking, assuming there is something very simple I simply cannot find.

As [table] is the new go-to tabular presentation and analysis command in version 17, and I very much appreciate the ability to manipulate styles and easily export the results, I am trying to change my workflow to use this command instead of say [tabulate] for summary representations.

What I cannot figure out how to do, using the new [table] command, is sort summary results by order of (descending) frequency, which is extremely easy to do in tabulate. I can't seem to find a "sort", "rowsort", or "colsort" equivalent.

I can definitely get the tables I want with other commands, so that is not by itself the problem. I just can't do it with [table]. And I can get creative with value labels and renaming, and [bysort v:replace vnum = _N] etc., but that seems inelegant. Likewise, I am familiar with several SSC packages that address this challenge, but I'd prefer to do this within Stata proper if possible.

The problem I'd like to solve is described below:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 v
"a"
"a"
"b"
"b"
"b"
"c"
"c"
"c"
"c"
"d"
"d"
"d"
"d"
"e"
end

This is the table I'd let to get out of [table], that is trivial to get out of [tabulate]:

tabulate v, sort

v | Freq. Percent Cum.
------------+-----------------------------------
c | 4 28.57 28.57
d | 4 28.57 57.14
b | 3 21.43 78.57
a | 2 14.29 92.86
e | 1 7.14 100.00
------------+-----------------------------------
Total | 14 100.00

This is the closest I can come in a single command - but I can't figure out how to sort by descending frequency.

table v, stat(freq) stat(perc)

------------------------------
| Frequency Percent
--------+---------------------
v |
a | 2 14.29
b | 3 21.43
c | 4 28.57
d | 4 28.57
e | 1 7.14
Total | 14 100.00
------------------------------

It seems no matter what I do, the [v] column is sorted by the value of [v] and not by the order of the calculated summary statistics.

Again, apologies for being inept. A search through StataList and through the help files and manuals has not delivered the syntax or an example I am seeking.
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#2

29 Aug 2021, 19:26

The table you seek is not trivially produced by -table-, and seems impossible without considerable pre-programming. You may be interested in this FAQ by Nick Cox for a more direct approach.
1 like
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2371

29 Aug 2021, 21:31

So after 45 minutes of tinkering, I got this to work. Right now it handles only non-missing values. Perhaps this can be a starting point if you want to use -tables-. I'm by no means an expert with tables, so there may be some inefficiencies here.

As a toy example, I'll use the auto dataset to try to replicate the following table. I use -fre- by Ben Jann and -elabel- by Daniel Klein, both from SSC.

Code:

. fre repair , desc nomiss

repair -- Repair record 1978
-----------------------------------------
       |      Freq.    Percent       Cum.
-------+---------------------------------
 3 c   |         30      43.48      43.48
 4 d   |         18      26.09      69.57
 5 e   |         11      15.94      85.51
 2 b   |          8      11.59      97.10
 1 a   |          2       2.90     100.00
 Total |         69     100.00           
-----------------------------------------

The example code:

Code:

clear *
cls

sysuse auto
keep rep78
rename rep78 repair

/* I'm adding a label to -rep78- to make the example more realistic */
lab def repair 1 "a", modify
lab def repair 2 "b", modify
lab def repair 3 "c", modify
lab def repair 4 "d", modify
lab def repair 5 "e", modify
label values repair repair

preserve
contract repair
drop if mi(repair)
gsort -_freq repair   // sort by descending frequency of repair

/*
I will need to compute cumulative percent to match the final layout in the table.
But, -table- will list factor variables in ascending sequence by numeric value of repair.
Thus, I clone -repair- to -xrepair- and relevel it, in order to match the numeric order with the
desired display order.
*/
qui levelsof repair, local(oldlvls)
local newlvls
forval i = 1/`=_N' {
  local newlvls `newlvls' `=repair[`i']'
}
local recodelvls
forval i = 1/`=_N' {
  local recodelvls `recodelvls' (`: word `i' of `newlvls'' = `: word `i' of `oldlvls'')
}

clonevar xrepair=repair
recode xrepair `recodelvls'

elabel copy repair xrepair
elabel recode xrepair `recodelvls'
label values xrepair xrepair

* generate cumulative percentage
gen _cumpct = sum(_freq)
summ _freq, meanonly
qui replace _cumpct = _cumpct / `r(sum)' * 100

* Now start making the -table-
collect create MyTable, replace
table (xrepair) () [fw=_freq] ,  ///
      stat(sumw) stat(percent)  ///
      name(MyTable) append
table (xrepair) () [fw=_freq], ///
      stat(firstnm _cumpct) nototals ///
      name(MyTable) append
      
collect layout (xrepair) (result[sumw percent firstnm])
collect recode result sumw     = freq ///
                      percent  = pct  ///
                      firstnm  = cumpct
collect label levels result freq   "Frequency" ///
                            pct    "Percent"   ///
                            cumpct "Cum. Percent" , modify
collect layout (xrepair) (result[freq pct cumpct])
collect style cell result[freq] , nformat(%9.0fc)
collect style cell result[pct cumpct] , nformat(%9.1f) sformat("%s%%")
collect style cell border_block, border(right, pattern(nil))
collect style cell xrepair[.m] , border(top, pattern(solid))
collect preview

restore

Final result is

Code:

. collect preview

------------------------------------------------------
                    Frequency   Percent   Cum. Percent
------------------------------------------------------
Repair record 1978                                    
  c                        30     43.5%          43.5%
  d                        18     26.1%          69.6%
  e                        11     15.9%          85.5%
  b                         8     11.6%          97.1%
  a                         2      2.9%         100.0%
------------------------------------------------------
  Total                    69    100.0%               
------------------------------------------------------

Last edited by Leonardo Guizzetti; 29 Aug 2021, 21:48.

Comment

Pamela Oliver

Join Date: Dec 2016

Posts: 21
#4

07 Feb 2022, 10:18

Looks like this is an old post, but I have figured out how to do this in my own work, if that is helpful. The trick involves using preserve/restore, within the preserve collapsing to get a variable that is the frequency of each category, having a text variable that has the value labels, using egen rank to create a variable that is the rank order of the frequencies, then using labmask to label the values of the rank variable. I use this often, so I created a file with all of the variations of the variables that I can merge into analysis files when I wish.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35206
#5

07 Feb 2022, 18:24

Consider also variations on

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str1 v "a" "a" "b" "b" "b" "c" "c" "c" "c" "d" "d" "d" "d" "e" end myaxis V = v, sort(count) descending table V -------------------- | Frequency --------+----------- v | c | 4 d | 4 b | 3 a | 2 e | 1 Total | 14 --------------------

myaxis is from the Stata Journal and was written up in https://www.stata-journal.com/articl...article=st0654

There was a quicker summary at https://www.statalist.org/forums/for...e-or-graph-use

The usual 3 years' delay before the Stata Journal paper emerges from behind a paywall does not apply, given agreements between British universities and journal publishers.

What myaxis does in this case is, naturally, broadly equivalent to @Pamela Oliver's solution, but no preserve restore step is needed or used.

This solution fails @Ken Goldberg's criterion of using official Stata only.

Detail: labmask mentioned in #4 is also from the Stata Journal.

Last edited by Nick Cox; 07 Feb 2022, 18:35.
Comment

Announcement