Combining three byte variables into one

Thea Black

Join Date: Mar 2020

Posts: 13
#1

Combining three byte variables into one

01 Mar 2020, 05:56

I am currently trying to replicate some of the results of the Charles and Guryan 2008 paper 'Prejudice and wages'. One of the variables they construct is called racpeers and in the appendix it says it is 'an aggregation of three questions about whether you would object to sending your kids to a school that had few/half/most black students.'

It is constructed from the following three yes/no variables (answers to whether you object sending your kids to a school with _ students of another race' ALL BYTE VARIABLES

1) racfew
- object?
yes
no

2)rachaf
-object?
yes
no

3|)racmost
-object?
yes
no

All three are byte variables with 1 assigned to yes and 2 to no. In the paper, they assign the lowest number to least racist views and the highest to most racist (starting from 1 and increasing)

From what I understand, the views from least racist to most racist should be:
LEAST RACIST:
1- dont object to racmost
2- don't object to rachalf
3-don't object to racfew
4-object to racmost
5- object to rachaf
6- object to racfew
MOST RACIST

How would you make this one variable out of the three as I mention above? The data is taken from multiple waves of GSS so not everyone in the dataset answers these questions.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10194
#2

01 Mar 2020, 06:44

an aggregation of three questions about whether you would object to sending your kids to a school that had few/half/most black students.

As long as one question is not weighted more than the others, you can sum or take the mean

Code:

gen wanted= racfew+rachalf+racmost gen wanted2=wanted/3

Here, wanted will vary from 3 (1+1+1) to 6 (2+2+2), and wanted 2 from 1 to 2, with possible non integer values. You could also get wanted to vary from 0 to 3 by subtracting 3.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

01 Mar 2020, 06:51

Generally speaking, egen with group() function make create a categorical variable out of a series of binary variables, for example.

That said, ordering the categories will depend on the classification strategy itself. In the example in #1, the desired classification seems to lack matching with the binary variables.

In short, in order to group those 3 binary variables, we’d get something like no-no-no, no-no-yes, etc.

An alternative, though, would be - generate - a variable according to the lowest condition, then - replace if - in order to create categories.

Last edited by Marcos Almeida; 01 Mar 2020, 06:55.

Best regards,

Marcos
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

01 Mar 2020, 10:27

I do not understand how this description works.

Code:

LEAST RACIST: 1- dont object to racmost 2- don't object to rachalf 3-don't object to racfew 4-object to racmost 5- object to rachaf 6- object to racfew MOST RACIST

I assume the objective is to calculate the highest (most racism) score given these rules.

racfew rachalf racmost result

yes yes yes 6

yes yes no 6

yes no yes 6

yes no no 6

no yes yes 5

no yes no 5

no no yes 4

no no no 1

As you can see, for the eight possibilities we only get four distinct results.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10194
#5

01 Mar 2020, 11:42

A quick glance of the paper available from JSTOR reveals that the aggregation is nothing more than averaging the scores across the questions, except that you need to normalize the means and standard deviations in a way that is described. As explained in #2, any other means of aggregation implies that you assign different weights to the questions, and there is no evidence of this in the authors' description.

Much of our analysis involves comparing levels of prejudice across individuals and across geographic areas. To render these comparisons feasible, it is obviously necessary that we somehow combine the disparate prejudice responses into a unidimensional prejudice index. We do this by first creating an individual-level index for each GSS respondent and then by aggregating this individual-level index in various ways at the state and census division levels. The individual-level prejudice index is based on an average of responses to different GSS prejudice questions. To ensure that the response to each question is measured on the same scale and weighted equally in the index, we normalize the mean and standard deviation of each of the GSS prejudice questions. Then, for each GSS respondent, we compute the average of his or her normalized response to each question.
1 like
Comment

racfew	rachalf	racmost	result
yes	yes	yes	6
yes	yes	no	6
yes	no	yes	6
yes	no	no	6
no	yes	yes	5
no	yes	no	5
no	no	yes	4
no	no	no	1

Announcement

Combining three byte variables into one

Comment

Comment

Comment

Comment