Hi everyone,
I have some questions regarding some data I'm trying to analyse, and am a bit new to this.
I have collected some data (general demographic, and some scale based choose 1-10 or 1-100 questions) and have 51 observations.
For example, my ethnicity categories are white, asian, black, mixed and other. Since the observations are so few, I thought I could group the categories, but I was unsure how to do this as a dummy.
Originally I had put:
generate ethnic_cat = .
replace ethnic_cat = 1 if ethnic == "White"
replace ethnic_cat = 2 if ethnic == "Black/African/Caribbean"
replace ethnic_cat = 3 if ethnic == "Asian (Indian, Pakistani, Bangladeshi, Chinese, any other Asian background)"
replace ethnic_cat = 4 if ethnic == "Mixed two or more ethnic groups"
replace ethnic_cat = 5 if ethnic == "Other (Arab or any others)"
I now think this is wrong because I think I need to make a dummy for each variable and can't equate each category to numbers.
I had done the exact same for income too:
generate income_cat = .
replace income_cat = 1 if income == "Less than 20,000"
replace income_cat = 2 if income == "20,000-39,999"
replace income_cat = 3 if income == "40,000-59,999"
replace income_cat = 4 if income == "60,000-99,999"
replace income_cat = 5 if income == "More than 100,000"
But think I need to do dummy 1 if less than 20,000 0 if otherwise, and repeat for each category - can someone explain the rationale for this if this is right? I didn't want to group income brackets together because I feel there is a very high difference in responses between some of the groups.
For the scale-based questions, the question e.g. asks how you rate the importance of clothing quality from 1-10. Can I just keep this as numerical, or do I need to do something with it - I've seen something on this forum for likert scales e.g. likert =0 takes value when likert scale is 0 or something similar for every single number - but not sure if this is needed.
Thanks for your help!
I have some questions regarding some data I'm trying to analyse, and am a bit new to this.
I have collected some data (general demographic, and some scale based choose 1-10 or 1-100 questions) and have 51 observations.
For example, my ethnicity categories are white, asian, black, mixed and other. Since the observations are so few, I thought I could group the categories, but I was unsure how to do this as a dummy.
Originally I had put:
generate ethnic_cat = .
replace ethnic_cat = 1 if ethnic == "White"
replace ethnic_cat = 2 if ethnic == "Black/African/Caribbean"
replace ethnic_cat = 3 if ethnic == "Asian (Indian, Pakistani, Bangladeshi, Chinese, any other Asian background)"
replace ethnic_cat = 4 if ethnic == "Mixed two or more ethnic groups"
replace ethnic_cat = 5 if ethnic == "Other (Arab or any others)"
I now think this is wrong because I think I need to make a dummy for each variable and can't equate each category to numbers.
I had done the exact same for income too:
generate income_cat = .
replace income_cat = 1 if income == "Less than 20,000"
replace income_cat = 2 if income == "20,000-39,999"
replace income_cat = 3 if income == "40,000-59,999"
replace income_cat = 4 if income == "60,000-99,999"
replace income_cat = 5 if income == "More than 100,000"
But think I need to do dummy 1 if less than 20,000 0 if otherwise, and repeat for each category - can someone explain the rationale for this if this is right? I didn't want to group income brackets together because I feel there is a very high difference in responses between some of the groups.
For the scale-based questions, the question e.g. asks how you rate the importance of clothing quality from 1-10. Can I just keep this as numerical, or do I need to do something with it - I've seen something on this forum for likert scales e.g. likert =0 takes value when likert scale is 0 or something similar for every single number - but not sure if this is needed.
Thanks for your help!
Comment