Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining three byte variables into one

    I am currently trying to replicate some of the results of the Charles and Guryan 2008 paper 'Prejudice and wages'. One of the variables they construct is called racpeers and in the appendix it says it is 'an aggregation of three questions about whether you would object to sending your kids to a school that had few/half/most black students.'

    It is constructed from the following three yes/no variables (answers to whether you object sending your kids to a school with _ students of another race' ALL BYTE VARIABLES

    1) racfew
    - object?
    yes
    no

    2)rachaf
    -object?
    yes
    no

    3|)racmost
    -object?
    yes
    no

    All three are byte variables with 1 assigned to yes and 2 to no. In the paper, they assign the lowest number to least racist views and the highest to most racist (starting from 1 and increasing)

    From what I understand, the views from least racist to most racist should be:
    LEAST RACIST:
    1- dont object to racmost
    2- don't object to rachalf
    3-don't object to racfew
    4-object to racmost
    5- object to rachaf
    6- object to racfew
    MOST RACIST

    How would you make this one variable out of the three as I mention above? The data is taken from multiple waves of GSS so not everyone in the dataset answers these questions.


  • #2
    an aggregation of three questions about whether you would object to sending your kids to a school that had few/half/most black students.
    As long as one question is not weighted more than the others, you can sum or take the mean

    Code:
    gen wanted= racfew+rachalf+racmost
    gen wanted2=wanted/3
    Here, wanted will vary from 3 (1+1+1) to 6 (2+2+2), and wanted 2 from 1 to 2, with possible non integer values. You could also get wanted to vary from 0 to 3 by subtracting 3.

    Comment


    • #3
      Generally speaking, egen with group() function make create a categorical variable out of a series of binary variables, for example.

      That said, ordering the categories will depend on the classification strategy itself. In the example in #1, the desired classification seems to lack matching with the binary variables.

      In short, in order to group those 3 binary variables, we’d get something like no-no-no, no-no-yes, etc.

      An alternative, though, would be - generate - a variable according to the lowest condition, then - replace if - in order to create categories.
      Last edited by Marcos Almeida; 01 Mar 2020, 06:55.
      Best regards,

      Marcos

      Comment


      • #4
        I do not understand how this description works.

        Code:
        LEAST RACIST:
        1- dont object to racmost
        2- don't object to rachalf
        3-don't object to racfew
        4-object to racmost
        5- object to rachaf
        6- object to racfew
        MOST RACIST
        I assume the objective is to calculate the highest (most racism) score given these rules.
        racfew rachalf racmost result
        yes yes yes 6
        yes yes no 6
        yes no yes 6
        yes no no 6
        no yes yes 5
        no yes no 5
        no no yes 4
        no no no 1
        As you can see, for the eight possibilities we only get four distinct results.

        Comment


        • #5
          A quick glance of the paper available from JSTOR reveals that the aggregation is nothing more than averaging the scores across the questions, except that you need to normalize the means and standard deviations in a way that is described. As explained in #2, any other means of aggregation implies that you assign different weights to the questions, and there is no evidence of this in the authors' description.

          Much of our analysis involves comparing levels of prejudice across individuals and across geographic areas. To render these comparisons feasible, it is obviously necessary that we somehow combine the disparate prejudice responses into a unidimensional prejudice index. We do this by first creating an individual-level index for each GSS respondent and then by aggregating this individual-level index in various ways at the state and census division levels. The individual-level prejudice index is based on an average of responses to different GSS prejudice questions. To ensure that the response to each question is measured on the same scale and weighted equally in the index, we normalize the mean and standard deviation of each of the GSS prejudice questions. Then, for each GSS respondent, we compute the average of his or her normalized response to each question.

          Comment

          Working...
          X