Question: how to count occurrence with Regex grouped

Simon Schmidt

Join Date: Mar 2020

Posts: 1
#1

Question: how to count occurrence with Regex grouped

07 Mar 2020, 08:13

Dear all!

I want to count the occourences of a variable based on a regex expression - counted by a group of variables.

E.g.:
ID1 ID2 text result

12 23 Hello 1

12 23 Bye 1

99 23 Hello 1

I have the two combining ID's "ID1" and "ID2" for a group, want to compare variable text with "Hello" and want a new column (result) with the numbers of "Hello"'s in this group.

My first idea was:
egen result=count(regexm(text, "Hello")), by(ID1 ID2)

or

egen result=count(text == "*Hello*"), by(ID1 ID2)

but both isn't working ...

Can you please help me?

Kind Regards
Simon
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

07 Mar 2020, 11:30

Welcome to Statalist.

The problem in your first example is that count() simply counts the number of non-missing values returned by regex(), regardless of whether they are 1 (a match) or 0 (no match) so your result for ID1 12 ID2 23 was 2 rather than 1. What you want is

Code:

egen result=sum(regexm(text, "Hello")), by(ID1 ID2)

Your second example imagines a wild-card string matching that simply is not part of Stata syntax when Stata is comparing two strings. An asterisk is no different than any other character in that context. But even if the comparison did what you hoped it would, you would still just be counting the number of times the expression is non-missing, not the number of times it is true (has value 1).
Comment

ID1	ID2	text	result
12	23	Hello	1
12	23	Bye	1
99	23	Hello	1

Announcement

Question: how to count occurrence with Regex grouped

Comment