Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • countvalues now available from SSC

    Thanks as always to Kit Baum, a new command countvalues is now available from SSC. Stata 12 is required. If interested, you may install it using

    Code:
    ssc inst countvalues
    The immediate stimulus to writing this was a detail within a recent thread

    https://www.statalist.org/forums/for...missing-values

    in which counts were needed of the number of zeros in several variables. There I suggested a loop across variables, but thinking about the problem underlined that it is easily more general, in that you might be interested in counts of several integer values. (I draw short of supporting numbers with fractional parts or string values.)

    I know of various commands that address related problems, but there seemed to be need for a different tool -- unless, naturally, someone can show that an existing tool is fine.

    My first line of attack was to populate a matrix with counts for several variables and one or more distinct values. But then various extra challenges arose, including (1) wanting to sort output in various ways (2) wanting to suppress rows and/or columns with all zeros, as sparse matrices arise all too frequently in practice (3) the possibility that the output table is itself of interest as a further reduced dataset, for further calculations or graphics, say.


    As usual, some concrete examples are likely to appeal more than an abstract description. A first simple problem is just to count missing values (and, indeed, other commands exist to do this):

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . countvalues, values(.)
    
      +------------------+
      |         name   . |
      |------------------|
      |        price   0 |
      |          mpg   0 |
      |        rep78   5 |
      |     headroom   0 |
      |        trunk   0 |
      |       weight   0 |
      |       length   0 |
      |         turn   0 |
      | displacement   0 |
      |   gear_ratio   0 |
      |      foreign   0 |
      +------------------+
    Then the thought arises that I don't need or want to be told about zeros, so in that case rowspositive as an option suppresses rows with row total zero:

    Code:
    . countvalues, values(.) rowspos
    
    
      +-----------+
      |  name   . |
      |-----------|
      | rep78   5 |
      +-----------+
    That first example underlines that system missing . and indeed other numeric missing values .a to .z all count as integers, the criterion being each value is equal to its own floor or ceiling.


    As a problem a smidgen more exciting, let's look systematically at indicator variables. As an auxiliary I use findname (Stata Journal) to find all variables such that all values are 0, 1 or system missing.
    .
    Code:
    . webuse nlswork, clear
    (National Longitudinal Survey. Young Women 14-26 years of age in 1968)
    
    . findname, all(inlist(@, 0, 1, .)) local(myvars)
    msp nev_mar collgrad not_smsa c_city south union
    
    . countvalues `myvars', values(0 1 .)
    
      +---------------------------------+
      |     name       0       1      . |
      |---------------------------------|
      |      msp   11324   17194     16 |
      |  nev_mar   21968    6550     16 |
      | collgrad   23739    4795      0 |
      | not_smsa   20469    8057      8 |
      |   c_city   18336   10190      8 |
      |    south   16843   11683      8 |
      |    union   14728    4510   9296 |
      +---------------------------------+
    Then just to show some of the handles available: you can choose to show variable labels and to sort output differently: here on the counts of 1s and descending:

    Code:
    . countvalues `myvars', values(0 1 .) variablelabels sort(1 desc)
    
      +-----------------------------------------------------+
      |                        label       0       1      . |
      |-----------------------------------------------------|
      | 1 if married, spouse present   11324   17194     16 |
      |                   1 if south   16843   11683      8 |
      |            1 if central city   18336   10190      8 |
      |                1 if not SMSA   20469    8057      8 |
      |           1 if never married   21968    6550     16 |
      |        1 if college graduate   23739    4795      0 |
      |                   1 if union   14728    4510   9296 |
      +-----------------------------------------------------+
    Another application might be that you have a bundle of variables, all grades from 1 to 5, say strongly disapprove to strongly approve. Like it or not, many people refer to such variables as Likert variables.

    Want to take such output and do much more, draw a graph or even calculate percents from frequencies? That is where the saving() option of countvalues comes in. Any number of things are just one step away from the table of frequencies, and there is little or no point in trying to squeeze them all in as extra options.

    Last edited by Nick Cox; 07 Mar 2021, 10:53.
Working...
X