Hello STATA users,
I have a dataset created from a google form which can be filled out in either English and Spanish. This means that I have multiple variables that have both English and Spanish values even though they may be talking about the same thing. They are all string variables. For example:
. tab employmentstatus
Which of these options best describes |
your current employment situation? | Freq. Percent Cum.
----------------------------------------+-----------------------------------
Ama/o de Casa | 57 1.48 1.48
Deshabilitado | 26 0.67 2.15
Disabled | 202 5.24 7.40
Empleado tiempo completo | 86 2.23 9.63
Empleado tiempo parcial | 199 5.16 14.79
Employed (full-time) | 396 10.28 25.07
Employed (part-time) | 427 11.08 36.15
Estudiante | 1 0.03 36.18
Homemaker | 60 1.56 37.74
Jubilado | 14 0.36 38.10
Out of work or unable to work due to .. | 1,504 39.03 77.13
Retired | 53 1.38 78.51
Self-employed | 256 6.64 85.15
Sin trabajo por razones relacionadas .. | 416 10.80 95.95
Student | 100 2.60 98.55
Trabajo/a por propia cuenta | 56 1.45 100.00
----------------------------------------+-----------------------------------
Total | 3,853 100.00
In this example, "Ama/o de Casa" is synonymous with "Homemaker", "Deshabilitado" is synonymous with "Disabled", "Empleado tiempo completo" is synonymous with "Employed (full-time)", etc. Each. I want to be able to combine these synonymous responses into one response with a numerical value.
How do I create a categorical numerical variable called "emp_status" that gives all "Ama/o de Casa" AND "Homemaker" responses a value of 1, all "Deshabilitado" AND "Disabled" responses a value of 2, and so forth?
I tried something like:
rename employmentstatus emp_stat
replace emp_status = 1 if emp_status == "Employed (full-time)" | emp_status == "Empleado tiempo completo"
replace emp_status = 2 if emp_status == "Employed (part-time)" | emp_status == "Empleado tiempo parcial"
but that doesn't work because they are string variables.
I also tried the -encode- command:
encode employmentstatus, gen(emp_stat) label(1,2)
but that creates separate labels for each unique value and doesn't allow me to define which value receives which label.
Any ideas?
Thanks,
Ian Gabriel
I have a dataset created from a google form which can be filled out in either English and Spanish. This means that I have multiple variables that have both English and Spanish values even though they may be talking about the same thing. They are all string variables. For example:
. tab employmentstatus
Which of these options best describes |
your current employment situation? | Freq. Percent Cum.
----------------------------------------+-----------------------------------
Ama/o de Casa | 57 1.48 1.48
Deshabilitado | 26 0.67 2.15
Disabled | 202 5.24 7.40
Empleado tiempo completo | 86 2.23 9.63
Empleado tiempo parcial | 199 5.16 14.79
Employed (full-time) | 396 10.28 25.07
Employed (part-time) | 427 11.08 36.15
Estudiante | 1 0.03 36.18
Homemaker | 60 1.56 37.74
Jubilado | 14 0.36 38.10
Out of work or unable to work due to .. | 1,504 39.03 77.13
Retired | 53 1.38 78.51
Self-employed | 256 6.64 85.15
Sin trabajo por razones relacionadas .. | 416 10.80 95.95
Student | 100 2.60 98.55
Trabajo/a por propia cuenta | 56 1.45 100.00
----------------------------------------+-----------------------------------
Total | 3,853 100.00
In this example, "Ama/o de Casa" is synonymous with "Homemaker", "Deshabilitado" is synonymous with "Disabled", "Empleado tiempo completo" is synonymous with "Employed (full-time)", etc. Each. I want to be able to combine these synonymous responses into one response with a numerical value.
How do I create a categorical numerical variable called "emp_status" that gives all "Ama/o de Casa" AND "Homemaker" responses a value of 1, all "Deshabilitado" AND "Disabled" responses a value of 2, and so forth?
I tried something like:
rename employmentstatus emp_stat
replace emp_status = 1 if emp_status == "Employed (full-time)" | emp_status == "Empleado tiempo completo"
replace emp_status = 2 if emp_status == "Employed (part-time)" | emp_status == "Empleado tiempo parcial"
but that doesn't work because they are string variables.
I also tried the -encode- command:
encode employmentstatus, gen(emp_stat) label(1,2)
but that creates separate labels for each unique value and doesn't allow me to define which value receives which label.
Any ideas?
Thanks,
Ian Gabriel
Comment