I have a string variable that, for each person, contains a short phrase, for example: "new video post." For each word in each phrase, I need to create a binary variable that indicates that the person's phrase contained (1) or didn't contain (0) the word. The names of the binary variables would be the names of the words. Can assume that the words in the phrase are all cleanly separated by a blank space.
So, for this person, new=1, video=1, post=1, and all other binary variables=0. If there were, say, 1000 unique words over all the phrases from all the people, there would be a total of 1000 binary variables, each corresponding to a unique word.
Trying to figure out the most efficient way to do this in Stata.
So, for this person, new=1, video=1, post=1, and all other binary variables=0. If there were, say, 1000 unique words over all the phrases from all the people, there would be a total of 1000 binary variables, each corresponding to a unique word.
Trying to figure out the most efficient way to do this in Stata.
Comment