Count frequency of names

River Huang

Join Date: Mar 2016

Posts: 1908
#1

Count frequency of names

09 Apr 2019, 02:31

Dear All, Suppose that I have this dataset,

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str22 names "Albert-Bob-Charles" "Mary-John" "Max" end

I'd like to have a variable (say, n) denotes the number of names. So that the value of `n' would be 3,2,1 in the above case. Thanks for your suggestions.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10296

09 Apr 2019, 02:41

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str22 names
"Albert-Bob-Charles"
"Mary-John"        
"Max"              
end

gen names2= subinstr(names,"-"," ",.)
gen count= wordcount(names2)
l

Result:

Code:

. l

     +-------------------------------------------------+
     |              names               names2   count |
     |-------------------------------------------------|
  1. | Albert-Bob-Charles   Albert Bob Charles       3 |
  2. |          Mary-John            Mary John       2 |
  3. |                Max                  Max       1 |
     +-------------------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35807
#3

09 Apr 2019, 02:45

The number of names is the number of hyphens plus one.

Code:

gen n = length(names) - length(subinstr(names, "-", "", .)) + 1

To count the number of hyphens, we see how much the string length would be reduced if we remove them.

Documented at https://www.stata-journal.com/sjpdf....iclenum=dm0056

https://www.statalist.org/forums/for...tring-variable

etc.

EDIT: Andrew Musau's solution in #2 is fine so long as the names are not e.g. "Billy Bob" or LL Cool J".

Last edited by Nick Cox; 09 Apr 2019, 02:48.
2 likes
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#4

09 Apr 2019, 02:49

Many thanks, Andrew.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#5

09 Apr 2019, 02:51

Dear Nick, I see your point, and thanks.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10296

09 Apr 2019, 02:58

Nick, as always, is correct. The expression becomes messy to account for his concern in #3.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str22 names
"Albert-Bob-Charles"
"Mary-John"        
"Max"              
"LL Cool J-Bill"    
end

gen count= wordcount(subinstr(subinstr(names," ","",.), "-", " ", .))

Result:

Code:

. l

     +----------------------------+
     |              names   count |
     |----------------------------|
  1. | Albert-Bob-Charles       3 |
  2. |          Mary-John       2 |
  3. |                Max       1 |
  4. |     LL Cool J-Bill       2 |
     +----------------------------+

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35807
#7

09 Apr 2019, 03:28

Andrew Musau Thanks for #6. I suggest just "often" as "always" is far too much to claim!
1 like
Comment

Announcement

Count frequency of names

Comment

Comment

Comment

Comment

Comment

Comment