Stata: Count distinct values of a variable by another one?

Rui Duarte

Join Date: Jul 2014

Posts: 2
#1

Stata: Count distinct values of a variable by another one?

01 Jul 2014, 08:51

My little Stata Problem:
I have a table like this:

I want to create a variable that counts the number of different cat for each citing. This is... For the A citing there are 2 cat... the 3 and the 6. So I want another variable (dif_cat) with two 2.
For this sample it would look something like this:

Can you help me?
PS: I know this has nothing to do with Stata (but it may inspire someone) with an actually programming language I would try something such as: Having a cycle doing citing column and checking if equal to the one before Having an auxiliary empty vector Having a second cycle within the first that wouldsee if the current cat was in the vector and if not put it there. When the citing changed I would count the lenght of the auxiliary matrix, reset it and do it again. The problem is that I need this in Stata code :S
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#2

01 Jul 2014, 08:54

there are several ways; but here is one: "egen newvar=count(cat), by(citing)"
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#3

01 Jul 2014, 09:06

Rich's code counts non-missing values, regardless of whether they are distinct. A solution with similar flavour is

Code:

egen tag = tag(cat citing) egen distinct = total(tag), by(citing)

For a review of this territory, see http://www.stata-journal.com/sjpdf.h...iclenum=dm0042
Comment
Rui Duarte

Join Date: Jul 2014

Posts: 2
#4

01 Jul 2014, 09:10

Thanks!... but both codes are generating just a simple count column of the number of different citings....

Last edited by Rui Duarte; 01 Jul 2014, 09:39. Reason: unsolved yet.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#5

01 Jul 2014, 09:53

You've cross-posted this at http://stackoverflow.com/questions/2...by-another-one Please do read the FAQ Advice as was requested of you before posting to see our policy on that and other points.

But on your question: isn't that exactly what you asked for? If not, you need to explain the difference.

Last edited by Nick Cox; 01 Jul 2014, 09:55.
Comment

Sergiy Radyakin

Join Date: Apr 2014
Posts: 1867

01 Jul 2014, 12:19

Written without much thinking, just to reproduce the results in the initial "want" table:

Code:

clear all
// it is the job of topic starter to write the data generation part, is it so difficult??
input str1 citing int cat
"A" 3
"A" 6
"B" 5
"B" 2
"B" 5
"B" 2
"C" 2
"C" 4
"C" 3
"D" 5
"E" 1
"E" 1
end

// start working here

preserve

generate total=1
collapse (sum) total, by(citing cat)
drop total
gen total=1

collapse (sum)total, by(citing)
list
tempfile tmp

sort citing
save `"`tmp'"'
restore

sort citing
merge citing using `"`tmp'"'
drop _merge
list

Best, Sergiy Radyakin

Comment

Adam Reiremo

Join Date: Jun 2020

Posts: 1
#7

12 Jun 2020, 06:24

If someone drops by this old question:

bys citing (cat): egen distinct_cat=total(cat!=cat[_n-1])
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#8

12 Jun 2020, 07:35

#7 I flag the advice in the help for egen

Explicit subscripting (using _N and _n), which is commonly used with generate, should not be used with egen; see subscripting.

In practice that will work, so long as you don't mind counting distinct kinds of missing value when they occur, but that is fortuitous as well as fortunate.

Last edited by Nick Cox; 12 Jun 2020, 07:48.
Comment

Announcement

Stata: Count distinct values of a variable by another one?

Comment

Comment

Comment

Comment

Comment

Comment

Comment