Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing binary variables with values No(1)/Yes(2) or No(2)/Yes(1) to equal No(0)/Yes(1)

    Hello,

    I used sencode (https://ideas.repec.org/c/boc/bocode/s417701.html) to encode and replace many string variables with values of either ("Yes", "No", or "Unknown"). First I changed "Unknown" to equal "" and then I used sencode to change the Yes/No responses to numeric values with labels. I used sencode so that I could replace the variables instead of having to generate new variables and then dropping the old ones, which would be required with encode. However, unlike encode where "No" will always equal 1 and "Yes" will always equal 2 based on their alphabetical order, sencode applies the 1/2 values based on sequential order of appearance in the data. Therefore, in some of my variables No = 1 and Yes = 2 and in others No = 2 and Yes = 1. It would be a great feature if sencode had an "alphabetical" option to mimic the way encode works. I'd like to change all of these variables to have values/labels of No = 0 and Yes = 1. I planned to use the code below once I finished, but now it would mislabel the variables where 1 = Yes and 2 = No. Does anyone have a trick for applying a version of the code below or something similar that will account for the different values for Yes and No across variables (either 1 and 2 or 2 and 1, respectively)?

    Code:
    findname, all(inlist(@, ., 1, 2)) local(valchange) 
    label define yesno 0 "No" 1 "Yes" 
    foreach v of local valchange { 
        replace `v' = 0 if `v' == 2
        label val `v' yesno
    }
    Thank you very much for your time and help.

    Tom

  • #2
    The -sencode- actually has the option that you called. Try the gsort option, and in the code below the stringvariable is the variable you wan to manipulate:
    Code:
    sencode stringvariable, replace gsort(stringvariable)

    Comment


    • #3
      Oh, ok, thanks Chen Samulsion. I tried it and it worked. I thought gsort sorted in ascending or descending order by frequency of observations not by order of the string. Thanks so much for your help!

      Comment


      • #4
        And there's an extension which could help you bypass codes in #1 to change value labels.
        Code:
        foreach v of varlist those_stringvariables {
         sencode `v', replace gsort(`v') label(yesno)
         replace `v'=`v'-1
        }
        label define yesno 0 "No" 1 "Yes" 2 "", modify

        Comment


        • #5
          #1

          encode where "No" will always equal 1 and "Yes" will always equal 2 based on their alphabetical order,
          That is only true if you use the default. You have to go beyond the help to the manual entry for this to become clear. But let's be fair: encode nowhere claims to have inbuilt intelligence to let it discern what you really need. Faced with variables that are Yes or No, an experienced researcher will (should) advise that you're better off with a (0, 1) indicator variable. Faced with scales from Strongly disagree to Strongly agree, beginning researchers should see that encoding by alphabetical order is a terrible idea. That is why the label() option exists.

          StataCorp don't like writing negative documentation, and readers tend to like it even less: they would be the first to complain that the documentation is shouting at them or patronising them or treating them like an idiot. The help for encode already has a stern warning not to use it when destring is the solution. Adding a warning that the default of using alphabetical order can really mess you up would, I imagine, be unpopular too.

          A even more general point is about replacing your original variables. That is a dangerous thing to do unless you're really sure you want it. encode and decode won't let you replace. When I was involved in the early days of destring and tostring, we were intensely mindful of making it difficult for users to make a bad decision. sencode is a command written by a very smart user for smart users, but its replace option is to be used circumspectly.

          See also multencode from SSC.
          Last edited by Nick Cox; 10 Jan 2023, 03:23.

          Comment


          • #6
            encode and decode won't let you replace. When I was involved in the early days of destring and tostring, we were intensely mindful of making it difficult for users to make a bad decision.
            I eventually know about why encode and decode has no replace option. Thank you for explanation.

            Comment


            • #7
              daniel klein 's -encoder- and -encoderall- commands can also do what O.P. requests. They are available from SSC, and using them also requires the -elabel- suite (also by Daniel Klein and also available at SSC). -encoder-, when the -label()- option is left unspecified, use alphabetical order, just like -encode-, and it automatically replaces the original string variable. On top of that, it has a -setzero- option that causes the lowest value in the label to be 0 not 1. So it is ideal for no/yes variables.

              I normally endorse a cautious approach to replacing the original variables.* But the operation of encoding and replacing (which is what -encoder- and -encoderall- do) is simple, does not lose information, and is easy to reverse if one later has regrets. Moreover, the string versions of categorical variables are seldom useful for anything. So I have, in recent years, taken to using -encoder- and -encoderall- instead of -encode- nearly all the time.

              *I also have license to be a bit less cautious because my routine practice is to preserve a read-only copy of the original dataset(s) in every project. So even in the worst case scenario where an early error in the workflow is discovered late, everything can be fixed and redone from scratch.

              Comment


              • #8
                Slight correction to Clyde's post: encoder and encoderall (both SSC) are not mine; they are written by David Tannenbaum. The commands do use my elabel (SSC) though.

                The warnings about replacing variables in mind, a basic approach, not relying on community-contributed commands, could be

                Code:
                label define yesno 0 "No" 1 "Yes"
                
                tempvar tmp
                
                foreach var in varlist {
                    
                    encode `var' , generate(`tmp') label(yesno)
                    order `tmp' , after(`var')
                    drop `var'
                    rename `tmp' `var'
                
                }
                Last edited by daniel klein; 11 Jan 2023, 08:57.

                Comment

                Working...
                X