Summation of a word mentioned

albert master

Join Date: Mar 2022

Posts: 23
#1

Summation of a word mentioned

25 Mar 2022, 08:21

Hello all,

I have a panel dataset and would like to generate not variable that sums up how often a word (here "environmental") occurs in a row, i.e. a company uses it in a year.

egen sum_environmental_metric = anycount(cluster*), values(environmental)

That was my command attempt, but obviously it tells me that my variable list (cluster*) consists of string variables.
I would like to leave this as it is, since the word mentions should be counted, is there an alternative to this I just want to sum up how often a predefined word is mentioned?

Thanks a lot
Chris
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 9957
#2

25 Mar 2022, 08:51

Here is an example of a loop that does what you want. You will need to add some more elements if you wish to account for capitalization and punctuation characters that may delimit words. Also note that the function -strpos()- will count the words environmentalist and environmentalism in this instance as they contain "environmental". Here is an example for specifying a specific word: https://www.statalist.org/forums/for...g-observations

Code:

clear input str10(color1 color2 color3 color4) "orange" "blue" "yellow" "blue" "blue" "green" "red" "white" end ds, has(type string) gen wanted=0 foreach var in `r(varlist)'{ replace wanted= wanted + strpos(`var', "blue") }

Res.:

Code:

. l +--------------------------------------------+ | color1 color2 color3 color4 wanted | |--------------------------------------------| 1. | orange blue yellow blue 2 | 2. | blue green red white 1 | +--------------------------------------------+

Last edited by Andrew Musau; 25 Mar 2022, 08:55.
1 like
Comment

Announcement

Summation of a word mentioned

Comment