Thanks to Kit Baum as always, a new command called myaxis is now downloadable from SSC. Stata 8.2 is required.
I will subvert the usual order and give examples first and then the overall story. Naturally you can bail out whenever you wish, and I lost some people already at the title.
rep78 in the auto data is an ordered (ordinal, grade) variable but for my purposes I will pretend that it isn't. myaxis maps such a variable to a new variable according to some sort criterion. Here we will just sort on counts (frequencies), largest first. tabulate already has a handle to do that, but this is just to get us started.
Let's suppose instead that we wanted to sort on mean mpg for each category.
.
.
As yet a further twist, we might want to sort on a subset's values, because we are looking ahead to a two-way table or graph. And yes, we should have insisted on a less barbarous display format:
.
Note that myaxis does not fall over when there is nothing to summarize, as with foreign cars for repair record 1 and 2.
One more. Here is another categorical variable:
All the examples are table output but FWIW my personal motivation was mostly graphical, hence the name of the command.
There is already a graphical example at https://www.statalist.org/forums/for...using-by/page2 where a myaxis call replaces three lines of code with one. See posts #16 and #17.
(I did toy with the idea of calling the command just axis but a predictable disadvantage of that would be that (say) search axis would reasonably turn up much other stuff. A few people may recall that an egen function axis() has been in egenmore for a while (2004), but I didn't feel obliged to pick up all its functionality and I did feel obliged to extend support in other directions.
So, the deal here is
myaxis maps an existing "categorical" variable, meaning usually a numeric variable with integer codes and value labels, or equivalently a string variable, to a new variable with integer values 1 up and with value labels, sorted according to a specified criterion.
The command name myaxis is to be parsed "my axis". The second element "axis" arises from a leading application of the command. You have a categorical variable that would define an axis of a graph, or one dimension of a table (the rows, or the columns, say), but the existing order of categories is not ideal. Some graph and table commands offer sorting on the fly, but this command may help wherever other commands do not offer that.
The problem is split by myaxis into these parts:
1. Calculation of a numeric variable on which to sort categories. myaxis treats this as an application of egen. Note: If a variable already exists that defines the sort order and is constant within categories, then asking for (say) its minimum, mean, or maximum within each category will suffice.
2. Deciding whether you want ascending order (the default) or descending order (highest value goes first). Descending order requires negation of the variable from #1.
3. Mapping your categorical variable to integers 1 up. The group() function of egen does the work here, but myaxis is careful to split ties according to the original variable. (For example: suppose nominal categories A, B, C, D, E have frequencies 7, 7, 42, 3, 1 and you want them sorted by frequency. You don't want A and B lumped together because they have the same frequency.)
4. Fixing a variable label. myaxis uses a new variable label if supplied; otherwise, the original variable label; and, if that does not exist, the original variable name.
5. Fixing value labels. This is even more important than #4 for helpful display in a graph or table. myaxis uses the original value labels if defined and otherwise the original string or numeric values.
All of those steps are easy in principle, but some are fiddly in practice, so myaxis bundles them together on your behalf.
I will subvert the usual order and give examples first and then the overall story. Naturally you can bail out whenever you wish, and I lost some people already at the title.
rep78 in the auto data is an ordered (ordinal, grade) variable but for my purposes I will pretend that it isn't. myaxis maps such a variable to a new variable according to some sort criterion. Here we will just sort on counts (frequencies), largest first. tabulate already has a handle to do that, but this is just to get us started.
Code:
. sysuse auto, clear (1978 Automobile Data) . myaxis wanted=rep78, sort(count) descending . tab wanted Repair | Record 1978 | Freq. Percent Cum. ------------+----------------------------------- 3 | 30 43.48 43.48 4 | 18 26.09 69.57 5 | 11 15.94 85.51 2 | 8 11.59 97.10 1 | 2 2.90 100.00 ------------+----------------------------------- Total | 69 100.00 . tab wanted, nola Repair | Record 1978 | Freq. Percent Cum. ------------+----------------------------------- 1 | 30 43.48 43.48 2 | 18 26.09 69.57 3 | 11 15.94 85.51 4 | 8 11.59 97.10 5 | 2 2.90 100.00 ------------+----------------------------------- Total | 69 100.00
.
Code:
. myaxis wanted2=rep78, sort(mean mpg) descending . tab wanted2, su(mpg) Repair | Summary of Mileage (mpg) Record 1978 | Mean Std. Dev. Freq. ------------+------------------------------------ 5 | 27.363636 8.7323849 11 4 | 21.666667 4.9348699 18 1 | 21 4.2426407 2 3 | 19.433333 4.1413252 30 2 | 19.125 3.7583241 8 ------------+------------------------------------ Total | 21.289855 5.8664085 69
As yet a further twist, we might want to sort on a subset's values, because we are looking ahead to a two-way table or graph. And yes, we should have insisted on a less barbarous display format:
Code:
. myaxis wanted3=rep78, sort(mean mpg) subset(foreign==1) descending . format mpg %2.1f . tab wanted3 foreign , su(mpg) nost nofreq Means of Mileage (mpg) Repair | Record | Car type 1978 | Domestic Foreign | Total -----------+----------------------+---------- 5 | 32.0 26.3 | 27.4 4 | 18.4 24.9 | 21.7 3 | 19.0 23.3 | 19.4 1 | 21.0 . | 21.0 2 | 19.1 . | 19.1 -----------+----------------------+---------- Total | 19.5 25.3 | 21.3
Note that myaxis does not fall over when there is nothing to summarize, as with foreign cars for repair record 1 and 2.
One more. Here is another categorical variable:
Code:
. webuse nlsw88, clear (NLSW, 1988 extract) . . myaxis wanted=industry, sort(median wage) descending . tabstat wage, s(median mean) by(wanted) format(%3.2f) Summary for variables: wage by categories of: wanted (industry) wanted | p50 mean -----------------+-------------------- Transport/Comm/U | 10.12 11.44 Public Administr | 8.40 9.15 Mining | 8.09 15.35 Finance/Ins/Real | 7.05 9.84 Professional Ser | 6.70 7.87 Construction | 6.69 7.56 Manufacturing | 6.19 7.50 Business/Repair | 5.33 7.52 Ag/Forestry/Fish | 4.53 5.62 Wholesale/Retail | 4.53 6.13 Entertainment/Re | 4.23 6.72 Personal Service | 3.89 4.40 -----------------+-------------------- Total | 6.28 7.78 --------------------------------------
There is already a graphical example at https://www.statalist.org/forums/for...using-by/page2 where a myaxis call replaces three lines of code with one. See posts #16 and #17.
(I did toy with the idea of calling the command just axis but a predictable disadvantage of that would be that (say) search axis would reasonably turn up much other stuff. A few people may recall that an egen function axis() has been in egenmore for a while (2004), but I didn't feel obliged to pick up all its functionality and I did feel obliged to extend support in other directions.
So, the deal here is
myaxis maps an existing "categorical" variable, meaning usually a numeric variable with integer codes and value labels, or equivalently a string variable, to a new variable with integer values 1 up and with value labels, sorted according to a specified criterion.
The command name myaxis is to be parsed "my axis". The second element "axis" arises from a leading application of the command. You have a categorical variable that would define an axis of a graph, or one dimension of a table (the rows, or the columns, say), but the existing order of categories is not ideal. Some graph and table commands offer sorting on the fly, but this command may help wherever other commands do not offer that.
The problem is split by myaxis into these parts:
1. Calculation of a numeric variable on which to sort categories. myaxis treats this as an application of egen. Note: If a variable already exists that defines the sort order and is constant within categories, then asking for (say) its minimum, mean, or maximum within each category will suffice.
2. Deciding whether you want ascending order (the default) or descending order (highest value goes first). Descending order requires negation of the variable from #1.
3. Mapping your categorical variable to integers 1 up. The group() function of egen does the work here, but myaxis is careful to split ties according to the original variable. (For example: suppose nominal categories A, B, C, D, E have frequencies 7, 7, 42, 3, 1 and you want them sorted by frequency. You don't want A and B lumped together because they have the same frequency.)
4. Fixing a variable label. myaxis uses a new variable label if supplied; otherwise, the original variable label; and, if that does not exist, the original variable name.
5. Fixing value labels. This is even more important than #4 for helpful display in a graph or table. myaxis uses the original value labels if defined and otherwise the original string or numeric values.
All of those steps are easy in principle, but some are fiddly in practice, so myaxis bundles them together on your behalf.
Comment