Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Encode string to numeric variable

    Hi everyone,
    I have variable Gender in my data. I run encode command to convert "Gender" to numeric variable.
    encode Gender, Gen(sex)

    input str1 Gender
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "M"
    "F"
    "M"
    "M"
    "M"
    "M"
    "F"
    "F"

    as you can see based on (dataex below ) it is not string anymore but when I brows my data it shows 0 for "M" and F for "F". I can't change F to 1. When I use replace command it gives me dismatch error. Could you please help me find the problem? why F is still shown F after encode??
    * Example generated by -dataex-.
    clear
    input str1 Gender long sex
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "F" 1
    "M" 0
    "M" 0
    "M" 0
    "M" 0
    "F" 1
    "F" 1

  • #2
    For second dataex command, can you repost the whole chunk of output rather than cutting it off at about 28 cases?

    Also, can you run "codebook Sex" and post the output as well?

    My guess is that I don't believe you ran "encode" correctly because encode does not automatically code items starting from 0. It starts from 1 instead. So, something else must have done in between and it'd be great if you can completely show the codes on what you did. Otherwise it'd be impossible for us to guess.

    To directly get to what you wanted, try:
    Code:
    generate gender2 = 0 if Sex == "M"
    replace gender2 = 1 if Sex == "F"
    Lastly, gender and sex are two different concepts. It's undesirable to use these terms as if they are interchangeable. On top being inaccurate, it may also be offensive to some audience.
    Last edited by Ken Chui; 31 Jul 2023, 11:26.

    Comment


    • #3
      Code:
      label define gender 0 "M" 1 "F"
      encode Gender , generate(gender2) label(gender)

      Comment


      • #4
        Yes, sure.

        input str1 Gender
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "F"
        "M"
        "M"
        "M"
        "M"
        "F"
        "F"
        "M"
        "M"
        "M"
        "M"
        "M"
        "F"
        "M"
        "M"
        "M"
        "F"
        "M"
        "F"
        "M"
        "M"
        "F"
        "M"
        "M"
        "F"
        "M"
        "M"
        "M"
        "F"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "F"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "F"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        "M"
        end
        [/CODE]
        ------------------ copy up to and including the previous line ------------------

        Listed 100 out of 268186 observations


        I run encode Gender, gen(sex)
        codebook sex

        -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        sex Gender
        -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        Type: Numeric (long)
        Label: sex

        Range: [1,2] Units: 1
        Unique values: 2 Missing .: 0/268,186

        Tabulation: Freq. Numeric Label
        30,136 1 F
        238,050 2 M


        replace sex=0 if sex==2

        input str1 Gender long sex
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "F" 1
        "M" 0
        "M" 0
        "F" 1
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "F" 1
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "F" 1
        "M" 0
        "M" 0
        "M" 0
        "F" 1
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "F" 1
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "F" 1
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        "M" 0
        end
        label values sex sex
        label def sex 1 "F", modify
        [/CODE]
        ------------------ copy up to and including the previous line ------------------

        Listed 100 out of 268186 observations

        codebook sex
        sex Gender
        -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

        Type: Numeric (long)
        Label: sex, but 1 nonmissing value is not labeled

        Range: [0,1] Units: 1
        Unique values: 2 Missing .: 0/268,186

        Tabulation: Freq. Numeric Label
        238,050 0
        30,136 1 F



        Comment


        • #5
          Thanks.

          That "replace sex=0 if sex==2" broke the variable.

          The original coding scheme is to label 1 as "F" and 2 as "M". Your replace command replaced all "2" with "0". Because 0 does not have a corresponding label, it's showing up as "0". However, "1" is still labeled as "F", due to the labeling scheme being active.

          For your case, I'd suggest do away with "encode" because you just want to use the numeric 0 and 1. Adding another numeric label "0" and "1" over it will be useless. Better just to generate it using generate and replace. Here are a couple methods:

          Code:
          clear
          input str1 v1
          "F"
          "M"
          end
          
          encode v1, gen(v2)
          codebook v2
          
          * Method 1
          generate v3 = 1 if v1 == "F"
          replace v3 = 0 if v1 == "M"
          
          * Method 2: predefine the labeling scheme (Suggested in #3)
          label define gender 0 "M" 1 "F"
          encode v1 , generate(v4) label(gender)
          
          * Method 2.1: If you don't need the label, remove the scheme:
          gen v5 = v4
          variable values v5
          
          * Method 3: Generate with condition
          generate v6 = (v1 == "F") if inlist(v1, "F", "M")
          label values v5
          Last edited by Ken Chui; 31 Jul 2023, 13:25.

          Comment


          • #6
            Thank you so much for your help.

            Comment


            • #7
              Originally posted by Ken Chui View Post
              For your case, I'd suggest do away with "encode" because you just want to use the numeric 0 and 1. Adding another numeric label "0" and "1" over it will be useless.
              I agree that numeric value labels are not useful. In fact, they can sometimes cause problems. However, not having value labels at all is just as bad an idea. How long do you think you will remember that 0 means "M" (male, I suppose) and 1 means "F"? A week? A month? A year? It does not matter. I guarantee that one day you will get back to your unlabeled dataset and will not know what those numeric values mean. With binary variables, you might get away by choosing more telling names, e.g., female instead of gender (or sex, or even worse v5). It is still much better to have value labels.

              Comment

              Working...
              X