Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summation of a word mentioned

    Hello all,

    I have a panel dataset and would like to generate not variable that sums up how often a word (here "environmental") occurs in a row, i.e. a company uses it in a year.

    egen sum_environmental_metric = anycount(cluster*), values(environmental)

    That was my command attempt, but obviously it tells me that my variable list (cluster*) consists of string variables.
    I would like to leave this as it is, since the word mentions should be counted, is there an alternative to this I just want to sum up how often a predefined word is mentioned?

    Thanks a lot
    Chris

  • #2
    Here is an example of a loop that does what you want. You will need to add some more elements if you wish to account for capitalization and punctuation characters that may delimit words. Also note that the function -strpos()- will count the words environmentalist and environmentalism in this instance as they contain "environmental". Here is an example for specifying a specific word: https://www.statalist.org/forums/for...g-observations

    Code:
    clear
    input str10(color1 color2 color3 color4)
    "orange" "blue" "yellow" "blue"
    "blue" "green" "red" "white"
    end
    
    ds, has(type string)
    gen wanted=0
    foreach var in `r(varlist)'{
        replace wanted= wanted + strpos(`var', "blue")
    }
    Res.:

    Code:
    . l
    
         +--------------------------------------------+
         | color1   color2   color3   color4   wanted |
         |--------------------------------------------|
      1. | orange     blue   yellow     blue        2 |
      2. |   blue    green      red    white        1 |
         +--------------------------------------------+
    Last edited by Andrew Musau; 25 Mar 2022, 08:55.

    Comment

    Working...
    X