Hi everyone,
I have a question about online review analysis. I want to count how many reviews there are for each of the characters in each episode of a show. For instance, I have 5 characters whose names are Jack, Lisa, Kyle, Frank, and Mandy. I want to count in each episode, how many online reviews include the name "Jack" etc. The original example of the data structure is as below:
I want to get a data structure as below in the end:
Currently, the code that I'm using is as below, but I think it's not very efficient. I want to see if there is any way to further improve the efficiency of the code since I have many shows and each show has different characters, sometimes can be up to 20. It's very hard to code them manually.
Please let me know if you have any thoughts. Thank you and look forward to your reply.
I have a question about online review analysis. I want to count how many reviews there are for each of the characters in each episode of a show. For instance, I have 5 characters whose names are Jack, Lisa, Kyle, Frank, and Mandy. I want to count in each episode, how many online reviews include the name "Jack" etc. The original example of the data structure is as below:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str1 show_name byte episode str27 reviews "a" 1 "Jack did well!" "a" 1 "I like this one" "a" 1 "Good" "a" 1 "What's this?" "a" 1 "Lisa is angry loll" "a" 1 "Tired of the show" "a" 1 "Not as good as the last one" "a" 2 "Lisa killed" "a" 2 "Kyle is upset" "a" 2 "Jannifer looks good" "a" 2 "Lisa looks young" "a" 2 "Starving" "a" 2 "Kyle is back!" "a" 2 "Like Jack " "a" 2 "Lisa!!!!" end
I want to get a data structure as below in the end:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str1 show_name byte episode str5 id byte comment "a" 1 "Jack" 1 "a" 2 "Jack" 1 "a" 1 "Lisa" 1 "a" 2 "Lisa" 3 "a" 1 "Kyle" 0 "a" 2 "Kyle" 2 "a" 1 "Frank" 0 "a" 2 "Frank" 0 "a" 1 "Mandy" 0 "a" 2 "Mandy" 0 end
gen Count_Jack=0
gen Count_Lisa=0
gen Count_Kyle=0
gen Count_Frank=0
gen Count_Mandy=0
replace Count_Jack=1 if ustrpos(reviews, "Jack")>0
replace Count_Lisa=1 if ustrpos(reviews, "Lisa")>0
replace Count_Kyle=1 if ustrpos(reviews, "Kyle")>0
replace Count_Frank=1 if ustrpos(reviews, "Frank")>0
replace Count_Mandy=1 if ustrpos(reviews, "Mandy")>0
bysort episode: egen comment_Jack=total(Count_Jack)
bysort episode: egen comment_Lisa=total(Count_Lisa)
bysort episode: egen comment_Kyle=total(Count_Kyle)
bysort episode: egen comment_Frank=total(Count_Frank)
bysort episode: egen comment_Mandy=total(Count_Mandy)
by episode, sort: gen nvals = _n == 1
keep if nvals==1
keep show_name episode comment*
reshape long comment_, i(show_name episode) j(ID) string
gen Count_Lisa=0
gen Count_Kyle=0
gen Count_Frank=0
gen Count_Mandy=0
replace Count_Jack=1 if ustrpos(reviews, "Jack")>0
replace Count_Lisa=1 if ustrpos(reviews, "Lisa")>0
replace Count_Kyle=1 if ustrpos(reviews, "Kyle")>0
replace Count_Frank=1 if ustrpos(reviews, "Frank")>0
replace Count_Mandy=1 if ustrpos(reviews, "Mandy")>0
bysort episode: egen comment_Jack=total(Count_Jack)
bysort episode: egen comment_Lisa=total(Count_Lisa)
bysort episode: egen comment_Kyle=total(Count_Kyle)
bysort episode: egen comment_Frank=total(Count_Frank)
bysort episode: egen comment_Mandy=total(Count_Mandy)
by episode, sort: gen nvals = _n == 1
keep if nvals==1
keep show_name episode comment*
reshape long comment_, i(show_name episode) j(ID) string
Comment