Count of distinct units in a panel with "by" option

Neelakanda Krishna

Join Date: May 2021
Posts: 107

Count of distinct units in a panel with "by" option

05 Dec 2021, 04:21

Hi Beloved Stata Members
I would like to get the count of distinct units (co_code in my case) based on say each year (how many distinct units are there in each year). I tried user written command distinct, but it doesn't work with option-by-

Here is my sample data

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long co_code int year float(ncfo eqissue)
    11 2000       .      .
    11 2001       .      .
    11 2002       .      .
    11 2003       .      .
    11 2004       .      .
    11 2005       .      .
    11 2006       .      .
    11 2007       .      .
    11 2008       .      .
    11 2009      .6   60.3
    11 2010    21.1   -6.9
    11 2011     2.3     .3
    11 2012   127.4   51.5
    11 2013   445.8   83.5
    11 2014   367.5     70
    11 2015     303      .
    11 2016   254.2     49
    11 2017   444.7      .
    11 2018   605.7   -2.1
    11 2019   533.6   -1.1
    11 2020   447.4      .
    11 2021   520.4    -.2
 96387 2000       .      .
 96387 2001       .      .
 96387 2002       .      .
 96387 2003       .      .
 96387 2004    39.7     .1
 96387 2005   285.4    1.8
 96387 2006   263.8 1554.7
 96387 2007  1167.6   10.3
 96387 2008  2910.6 -123.1
 96387 2009  5906.6     12
 96387 2010  3422.3 3097.8
 96387 2011  2991.6 1664.9
 96387 2012 -1544.6      .
 96387 2013   494.8 -164.2
 96387 2014   497.6      .
 96387 2015  -513.2      .
 96387 2016  1990.6      .
 96387 2017    1294      .
 96387 2018       .      .
 96387 2019  1370.3      .
 96387 2020  1685.8      .
 96387 2021       .      .
 36277 2000       .      .
 36277 2001       .      .
 36277 2002       .      .
 36277 2003       .      .
 36277 2004       .      .
 36277 2005       .      .
 36277 2006       .      .
 36277 2007       .      .
 36277 2008       .      .
 36277 2009       .      .
 36277 2010       .      .
 36277 2011       .      .
 36277 2012       .      .
 36277 2013       .      .
 36277 2014       .      .
 36277 2015       .      .
 36277 2016       .      .
 36277 2017       .      .
 36277 2018   361.9      .
 36277 2019  2529.9      .
 36277 2020  2450.5      .
 36277 2021  3210.7      .
491859 2000       .      .
491859 2001       .      .
491859 2002       .      .
491859 2003       .      .
491859 2004       .      .
491859 2005       .      .
491859 2006       .      .
491859 2007       .      .
491859 2008       .      .
491859 2009       .      .
491859 2010       .      .
491859 2011       .      .
491859 2012       .      .
491859 2013       .      .
491859 2014       .      .
491859 2015       .      .
491859 2016       .      .
491859 2017       .      .
491859 2018       .      .
491859 2019       .      .
491859 2020       .      .
491859 2021       .      .
 73119 2000       .      .
 73119 2001       .      .
 73119 2002       .      .
 73119 2003       .      .
 73119 2004   158.7  231.1
 73119 2005  1098.4  763.1
 73119 2006  3122.2 2221.2
 73119 2007  1258.8  -34.8
 73119 2008 -2962.1 4558.2
 73119 2009 -1883.8     50
 73119 2010  -413.4  191.3
 73119 2011 -4878.9      .
end

[/CODE]

. distinct co_code, by (year)
option by() not allowed

How to get distinct count with one or more criteria?

Tags: None

Fei Wang

Join Date: Oct 2021

Posts: 726
#2

05 Dec 2021, 04:27

If each line represents a unique pair of id and year, then the code would be simple as below.

Code:

bys year: gen dist_id = _N
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#3

05 Dec 2021, 05:24

distinct is from the Stata Journal, as you are asked to explain (FAQ Advice #12).

The 2008 paper remains pertinent, but if you want to use the command, go for the latest update:

SJ-20-4 dm0042_3 . . . . . . . . . . . . . . . . Software update for distinct
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q4/20 SJ 20(4):1028--1030
sort() option has been added

SJ-15-3 dm0042_2 . . . . . . . . . . . . . . . . Software update for distinct
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q3/15 SJ 15(3):899
improved table format and display of large numbers of
observations

SJ-12-2 dm0042_1 . . . . . . . . . . . . . . . . Software update for distinct
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q2/12 SJ 12(2):352
options added to restrict output to variables with a minimum
or maximum of distinct values

SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q4/08 SJ 8(4):557--568
shows how to answer questions about distinct observations
from first principles; provides a convenience command

As documented in its help, the by prefix is allowed, as here

Code:

. sysuse auto, clear (1978 automobile data) . by foreign : distinct rep78 ------------------------------------------------------------------------------------------------------------------ -> foreign = Domestic ------------------------------ | total distinct -------+---------------------- rep78 | 48 5 ------------------------------ ------------------------------------------------------------------------------------------------------------------ -> foreign = Foreign ------------------------------ | total distinct -------+---------------------- rep78 | 21 3 ------------------------------

For a large number of panels, this output may not be most convenient. I tend to use a device explained in the 2008 paper (and which hinges on official commands only):

Code:

. egen tag = tag(foreign rep78) . egen ndistinct = total(tag), by(foreign) . tabdisp foreign, c(ndistinct) ---------------------- Car | origin | ndistinct ----------+----------- Domestic | 5 Foreign | 3 ----------------------

Here's a link to the paper. https://www.stata-journal.com/articl...article=dm0042 See especially p.563.
1 like
Comment
Neelakanda Krishna

Join Date: May 2021

Posts: 107
#4

05 Dec 2021, 10:01

Thanks a lot Nick Cox and Fei Wang from saving me from a huge manual work using Excel.
Comment

Announcement

Count of distinct units in a panel with "by" option

Comment

Comment

Comment