Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Recoding

    Hello,

    I have the variable gender which was originally coded as 1 = male and 2 = female. I went in and changed the code with the following: recode gender (1=0) (2=1). This led to it coming out as: male = 1 and 0 (this has no label attached to it because I made a permanent mistake).

    My goal is to code male = 0 and female = 1 with the label "male and female" attached to each value and was wondering how I can achieve this.

  • #2
    Code:
    recode gender (1=0 "male") (2=1 "female"), generate(newgender)
    drop newgender
    rename newgender gender

    Comment


    • #3
      Originally posted by William Lisowski View Post
      Code:
      recode gender (1=0 "male") (2=1 "female"), generate(newgender)
      drop newgender
      rename newgender gender
      Thank you for the response, I ran into an issue with this though. I followed your code exactly but ran into the following error: variable newgender not found...I guess it makes sense since i dropped the variable "newgender".

      Comment


      • #4
        Sorry, I obviously meant to drop gender so I could rename newgender to gender.

        That's the sort of error that creeps in when you can't test your code because the question didn't come with example data and "along with answering this question I want to make up data to test my answer" said nobody ever. Take a look at the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It is particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

        The more you help others understand your problem, the more likely others are to be able to help you solve your problem. And to give answers free of easily caught errors.

        Comment


        • #5
          To #2: Recoding first into a new variable, then drop the old variable and rename the new variable to the old variable name has the advantage that you can attach labels to the recoded values immediately within the -recode- command. But using -elabel- (from SSC) (assuming the original variable has value labels already attached) makes it much more simpler.

          An additional comment concerns the naming of a variable as "gender" or "sex": To my mind this is bad style. It would be better to always give variables names that point to the higher values, e.g. if female is larger than male the variable should be named "female" (and if you like you can label the variable as "gender").If, for example, you correlate female with size and the correlation is positive you could immediately interpret the positive correlation as "females tend to be larger than males" which would not be possible if gender correlates positively with size. To avoid a lengthy discussion: Of course, all this is no longer so easy if you have 3 categories, such as "male", "female", "non-binary".

          Here code showing three variants of which I would prefer the last:
          Code:
          cap which elabel
          if _rc ssc install elabel  // install -elabel- from SSC if necessary
          
          * ------------------------------------------------------------------------------
          * Version 1 using the recode option gen(), drop, and rename:
          clear
          input gender
            1
            2
            2
          end
          lab def gender 1 "male" 2 "female"
          lab val gender gender
          
          list if gender==2
          
          recode gender (1=0 "male") (2=1 "female"), gen(newgender)
          drop gender
          rename newgender gender
          
          list if gender==1
          
          * ------------------------------------------------------------------------------
          * Version 2 using -elabel-:
          clear
          input gender
            1
            2
            2
          end
          lab def gender 1 "male" 2 "female"
          lab val gender gender
          
          list if gender==2
          
          recode gender (1=0) (2=1)
          elabel recode gender (1=0) (2=1)
          
          list if gender==1
          
          * ------------------------------------------------------------------------------
          * even better:
          
          clear
          input female
            1
            2
            2
          end
          lab def female 1 "male" 2 "female"
          lab val female female
          list if female==2
          
          recode female (1=0) (2=1)
          elabel recode female (1=0) (2=1)
          
          list if female  // simpler and more readable

          Comment


          • #6
            On #3 I can add horror stories of presenters being asked which way round their gender variable was coded and not being able to remember.

            It's worth underlining the Stata stock example that in the auto dataset the indicator variable present is called foreign. So, the convention for naming an indicator for the state coded 1 has been around a long time. (Early textbook or paper invocations would be welcome.)

            Not completely unrelated is an uphill struggle that some of us have -- perhaps futile in the case of economics where this habit may be far too socialised -- to get people to avoid the term dummy variable. Again, I have too many horror stories of presentations to a mixed audience (here meaning a range from strongly quantitative types to the opposite) in which puzzled questions lead to puzzled answers of the form "oh, X is just a dummy variable" leading in turn to angry explosions of the form that X is not being taken seriously. The understanding of what a dummy variable is may not be shared. I've not heard any stories involving the term indicator variable. This may sound a small, indeed pedantic, point but anyone who has witnessed the anger or incredulity this terminology can produce -- accidentally to be sure -- won't need much persuasion.

            The use of (0, 1) indicators for states that can't easily or comfortably be reduced to binary opposites is an even larger question I stop short of.

            Comment


            • #7
              If you prefer the notation [0, 1] or {0, 1} for indicating that the possible values are 0 or 1 you have good arguments on your side.

              Comment


              • #8
                I made a permanent change to renaming my variable already.
                Code:
                rename BIO_SEX2 gender
                So when I went to change it to
                Code:
                recode gender (1=0 "male") (2=1 "female"), generate(gender)
                drop gender
                rename gender newgender
                it would not let me since my variable for gender was already defined.

                Any suggestions?

                Comment


                • #9
                  Code:
                  recode gender (1=0 "male") (2=1 "female"), generate(newgender)
                  drop gender
                  rename gender newgender

                  Comment


                  • #10
                    The biological sex is sex, not gender (I know that these concepts are hotly discussed, but personally I insist on differentiating between sex and the concept of gender role). If you rename your variable once you can rename it a second time (see my suggestion in #5, version 3).

                    The -gen()- option of -recode- requires the name of a new (non existing) variable. Have a look at -help recode- (reading the help for a command should always be the first thing to do if you encounter errors or unexpected results and you are sure that there is no typo).

                    Comment


                    • #11
                      I think #9 is the wrong way round.

                      Code:
                      recode gender (1=0 "male") (2=1 "female"), generate(newgender)  
                      drop gender  
                      rename newgender gender

                      Comment


                      • #12
                        In the absence of any example data that unambiguously shows us exactly what was intended to be the input that the answer was to be applied to, my posts were intended to use as input the original coding of the variable that had "1 = male and 2 = female" for which "the goal is to code male = 0 and female = 1 with the label "male and female" attached to each value". In essence, I addressed the question "what should I have done instead of what I did" expecting the task would be redone. I'm not much inclined to "having headed down the wrong path, how do I get from where I'm at to where I want to be" since in this case the answer would have been the trivially obvious "define a new value label, or edit the one you have".

                        To be honest, at this point I don't have any idea what is to be transformed into the desired result. My mistake in post #2 lay in giving an answer with insufficient information from post #1. At this point I'm bailing out, with the following advice for subsequent topics.

                        kyle coran - Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It is particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

                        The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

                        Comment


                        • #13
                          #12: Nick could answer, but as far as I can see he did not think that the coding is the wrong way round but the renaming.

                          Comment

                          Working...
                          X