Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question: how to count occurrence with Regex grouped

    Dear all!

    I want to count the occourences of a variable based on a regex expression - counted by a group of variables.

    E.g.:
    ID1 ID2 text result
    12 23 Hello 1
    12 23 Bye 1
    99 23 Hello 1
    I have the two combining ID's "ID1" and "ID2" for a group, want to compare variable text with "Hello" and want a new column (result) with the numbers of "Hello"'s in this group.

    My first idea was:
    egen result=count(regexm(text, "Hello")), by(ID1 ID2)

    or

    egen result=count(text == "*Hello*"), by(ID1 ID2)

    but both isn't working ...

    Can you please help me?

    Kind Regards
    Simon

  • #2
    Welcome to Statalist.

    The problem in your first example is that count() simply counts the number of non-missing values returned by regex(), regardless of whether they are 1 (a match) or 0 (no match) so your result for ID1 12 ID2 23 was 2 rather than 1. What you want is
    Code:
    egen result=sum(regexm(text, "Hello")), by(ID1 ID2)
    Your second example imagines a wild-card string matching that simply is not part of Stata syntax when Stata is comparing two strings. An asterisk is no different than any other character in that context. But even if the comparison did what you hoped it would, you would still just be counting the number of times the expression is non-missing, not the number of times it is true (has value 1).

    Comment

    Working...
    X