Hi, So I have a data set that describes the different kinds of graffiti identified at various business locations. The graffiti types are "pen", "spray paint", "pencil", "paint", "etched", and "chalk".
In some data points, multiple types of graffiti, in any order, may be listed and separated by a " , " [comma]. Some entry points don't have any data in them.
For example:
I figure that if I want to identify each of these in such a way that I could quantify the occurrence of each type of graffiti, I'd want to use the
command.
So I have been using the following link as a reference.
This resource is helpful to an extent, but differs from what I need to do because in the example provided about court cases, it is splitting the case variable by different variations of "versus". And where the phrases on either side of the "versus" expression varies by case, the phrases on either side of my commas is different, but also repeat.
And in some cases, one term is distinct from another term, even though they use the same words: So..... 'Paint' is different from 'Spray Paint'.
I want to assign each type of graffiti a categorical number, where; 1 = "Pen", 2 = "Spray Paint", 3 = "Pencil'...and so on....and then I want to create a variable, or variables, that allows me to measure each type of graffiti in order to quantify their occurrence.
But I'm just confused...and the example reference in the link above doesn't quite help me to do that.
So....basically I need to quantify the occurrence of each type of graffiti by splitting the graffiti types variable. But the example I have to help me understand how to do that is doing something completely different from what I need, and is not as helpful.
But I'm not quite sure how to do this....please help?
In some data points, multiple types of graffiti, in any order, may be listed and separated by a " , " [comma]. Some entry points don't have any data in them.
For example:
1 | Etched |
2 | Etched |
3 | Pen |
4 | Etched , Pen |
5 | Etched |
6 | Chalk , Paint |
7 | |
8 | Spray Paint |
Code:
split
So I have been using the following link as a reference.
This resource is helpful to an extent, but differs from what I need to do because in the example provided about court cases, it is splitting the case variable by different variations of "versus". And where the phrases on either side of the "versus" expression varies by case, the phrases on either side of my commas is different, but also repeat.
And in some cases, one term is distinct from another term, even though they use the same words: So..... 'Paint' is different from 'Spray Paint'.
I want to assign each type of graffiti a categorical number, where; 1 = "Pen", 2 = "Spray Paint", 3 = "Pencil'...and so on....and then I want to create a variable, or variables, that allows me to measure each type of graffiti in order to quantify their occurrence.
But I'm just confused...and the example reference in the link above doesn't quite help me to do that.
So....basically I need to quantify the occurrence of each type of graffiti by splitting the graffiti types variable. But the example I have to help me understand how to do that is doing something completely different from what I need, and is not as helpful.
But I'm not quite sure how to do this....please help?
Comment