Adding Questions from a questionnaire to Stata (ethnicity, likert scale 1-10 answer questions, income brackets)

Riya Ramesh

Join Date: Mar 2025

Posts: 1
#1

Adding Questions from a questionnaire to Stata (ethnicity, likert scale 1-10 answer questions, income brackets)

12 Mar 2025, 06:00

Hi everyone,

I have some questions regarding some data I'm trying to analyse, and am a bit new to this.

I have collected some data (general demographic, and some scale based choose 1-10 or 1-100 questions) and have 51 observations.

For example, my ethnicity categories are white, asian, black, mixed and other. Since the observations are so few, I thought I could group the categories, but I was unsure how to do this as a dummy.

Originally I had put:

generate ethnic_cat = .
replace ethnic_cat = 1 if ethnic == "White"
replace ethnic_cat = 2 if ethnic == "Black/African/Caribbean"
replace ethnic_cat = 3 if ethnic == "Asian (Indian, Pakistani, Bangladeshi, Chinese, any other Asian background)"
replace ethnic_cat = 4 if ethnic == "Mixed two or more ethnic groups"
replace ethnic_cat = 5 if ethnic == "Other (Arab or any others)"

I now think this is wrong because I think I need to make a dummy for each variable and can't equate each category to numbers.

I had done the exact same for income too:

generate income_cat = .
replace income_cat = 1 if income == "Less than 20,000"
replace income_cat = 2 if income == "20,000-39,999"
replace income_cat = 3 if income == "40,000-59,999"
replace income_cat = 4 if income == "60,000-99,999"
replace income_cat = 5 if income == "More than 100,000"

But think I need to do dummy 1 if less than 20,000 0 if otherwise, and repeat for each category - can someone explain the rationale for this if this is right? I didn't want to group income brackets together because I feel there is a very high difference in responses between some of the groups.

For the scale-based questions, the question e.g. asks how you rate the importance of clothing quality from 1-10. Can I just keep this as numerical, or do I need to do something with it - I've seen something on this forum for likert scales e.g. likert =0 takes value when likert scale is 0 or something similar for every single number - but not sure if this is needed.

Thanks for your help!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

12 Mar 2025, 07:12

generate ethnic_cat = .
replace ethnic_cat = 1 if ethnic == "White"
replace ethnic_cat = 2 if ethnic == "Black/African/Caribbean"
replace ethnic_cat = 3 if ethnic == "Asian (Indian, Pakistani, Bangladeshi, Chinese, any other Asian background)"
replace ethnic_cat = 4 if ethnic == "Mixed two or more ethnic groups"
replace ethnic_cat = 5 if ethnic == "Other (Arab or any others)"

I now think this is wrong because I think I need to make a dummy for each variable and can't equate each category to numbers.

No problem. You can get a set of indicator variables (a.k.a. dummy variables) on the fly using factor variable notation. You can define your own indicators, but factor variable notation remains the top trick. More at

Code:

help fvvarlist

https://journals.sagepub.com/doi/pdf...36867X19830921 contains a bundle of tips and tricks, including some discouragement of the term dummy variables, and conveying a strong inclination to talk about indicator variables.

All too true story: A quantitative researcher presented an analysis to a mixed audience and was asked how gender had been quantified. The reply, "Oh, gender is just a dummy variable" provoked an explosion of indignation, as it was taken to imply that the researcher did not take it seriously.

I would have done it this way:

Code:

label def ethnic_cat 1 "White" 2 "Black/African/Caribbean" 3 "Asian (Indian, Pakistani, Bangladeshi, Chinese, any other Asian background)" 4 "Mixed two or more ethnic groups" 5 "Other (Arab or any others)" encode ethnic, gen(ethnic_cat) label(ethnic_cat)

and then you have the best of both worlds, a series of numbers and associated value labels explaining them.

Same story for income.

For Likert scales, or items for the fastidious (NB: not likert, please, as Likert was a person), use the numbers that people were allowed to choose. If they were associated with text, the text becomes the value labels (e.g. 1 "Strongly disagree").
1 like
Comment

Announcement

Adding Questions from a questionnaire to Stata (ethnicity, likert scale 1-10 answer questions, income brackets)

Comment