Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count frequency of names

    Dear All, Suppose that I have this dataset,
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str22 names
    "Albert-Bob-Charles"
    "Mary-John"        
    "Max"              
    end
    I'd like to have a variable (say, n) denotes the number of names. So that the value of `n' would be 3,2,1 in the above case. Thanks for your suggestions.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str22 names
    "Albert-Bob-Charles"
    "Mary-John"        
    "Max"              
    end
    
    gen names2= subinstr(names,"-"," ",.)
    gen count= wordcount(names2)
    l
    Result:

    Code:
    . l
    
         +-------------------------------------------------+
         |              names               names2   count |
         |-------------------------------------------------|
      1. | Albert-Bob-Charles   Albert Bob Charles       3 |
      2. |          Mary-John            Mary John       2 |
      3. |                Max                  Max       1 |
         +-------------------------------------------------+

    Comment


    • #3
      The number of names is the number of hyphens plus one.

      Code:
      gen n = length(names) - length(subinstr(names, "-", "", .)) + 1
      To count the number of hyphens, we see how much the string length would be reduced if we remove them.

      Documented at https://www.stata-journal.com/sjpdf....iclenum=dm0056

      https://www.statalist.org/forums/for...tring-variable

      etc.

      EDIT: Andrew Musau's solution in #2 is fine so long as the names are not e.g. "Billy Bob" or LL Cool J".
      Last edited by Nick Cox; 09 Apr 2019, 02:48.

      Comment


      • #4
        Many thanks, Andrew.
        Ho-Chuan (River) Huang
        Stata 19.0, MP(4)

        Comment


        • #5
          Dear Nick, I see your point, and thanks.
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment


          • #6
            Nick, as always, is correct. The expression becomes messy to account for his concern in #3.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str22 names
            "Albert-Bob-Charles"
            "Mary-John"        
            "Max"              
            "LL Cool J-Bill"    
            end
            
            gen count= wordcount(subinstr(subinstr(names," ","",.), "-", " ", .))
            Result:

            Code:
            . l
            
                 +----------------------------+
                 |              names   count |
                 |----------------------------|
              1. | Albert-Bob-Charles       3 |
              2. |          Mary-John       2 |
              3. |                Max       1 |
              4. |     LL Cool J-Bill       2 |
                 +----------------------------+

            Comment


            • #7
              Andrew Musau Thanks for #6. I suggest just "often" as "always" is far too much to claim!

              Comment

              Working...
              X