Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • encode a string variable according to subsrt

    Hi, I have a string variable, vq9_breakupdivorce, which has only 2 string values.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str23 vq9_breakupdivorce
    "NO TO: Break up/divorce"
    "NO TO: Break up/divorce"
    "Break up/divorce"       
    "Break up/divorce"       
    "NO TO: Break up/divorce"
    end
    I want to encode it to 0 and 1. And 0 denotes NO TO: and 1 denotes otherwise. How to use substr? Because I have multiple such variables, and all of them are in this form (NO TO: = 0).

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str23 vq9_breakupdivorce
    "NO TO: Break up/divorce"
    "NO TO: Break up/divorce"
    "Break up/divorce"      
    "Break up/divorce"      
    "NO TO: Break up/divorce"
    end
    
    label define    no_to   0   "NO TO:"    1   "Otherwise"
    
    foreach v of varlist vq9_breakupdivorce {
        gen byte n`v':no_to = substr(`v', 1, 6) == "NO TO:"
    }
    And if you have several such variables, just add them to the variable list in the -foreach- command.

    Comment


    • #3
      After running your codes, I find 1 means otherwise, which means "NO TO: Break up/divorce" in the original variable. I want those who report no break up are 0, and those who report break up are 1.

      Also, I'm interesting your code here since you don't use encode. Could you give me a tutorial of "gen byte nvar"? When I google it, I can only got gen _n and gen _N.

      Comment


      • #4
        Oh, sorry I got it backwards. Change the -gen- command to
        Code:
        gen byte n`v':no_to = substr(`v', 1, 6) != "NO TO:"
        As for an explanation of the command, let's break it into pieces. I assume you are familiar with -gen-. If not, you really need to stop everything and go read the [GS] Getting Started and [U] User's Guide portions of the PDF manuals that are part of your Stata installation. That will give you an overview of the most commonly used commands, without which nothing productive can be accomplished in Stata.

        The specification of byte as a storage type is optional. It's a habit of mine that I acquired decades back when memory was scarce and expensive. I was vigilant about always assigning the smallest possible amount of storage to every variable. In the modern world, this isn't usually necessary unless you are working with very large datasets that push the memory limits of your computer. All that -byte- does here is tell Stata to use only a single byte per observation to store this variable. As I say, it's an economy that's probably unnecessary and you can omit it if you like.

        Evidently the nvar is the name of the variable to be created,. The :no_to part tells Stata, when the variable has been created, to apply the no_to value label to the result. Tthe = is self-explanatory.

        The expression to the right of the =, -substr(var, 1, 6) != "NO TO:"- is a logical expression. The != operator in the middle asks Stata to compare substr(var, 1, 6) with "NO TO:". If they are unequal, the result is 1, and if they are equal the result is 0.

        So putting it altogether, -gen byte nvar:no_to = substr(var, 1, 6) != "NO TO:"- tells Stata to compare the first 6 characters of var to "NO TO:", and create a new variable, nvar, whose value is 1 if they are unequal and 0 if they are equal, and then, finally, to apply the no_to value label to nvar.

        The reason I did not use -encode- for this task is that, by default, -encode- starts with numeric values of 1 and works up from there. To force -encode- to start with 0, you have to first define the value label accordingly and tell -encode- to use that value label. Well, at that point, since this is just a dichotomous variable, all that remains is to create the 0/1 variable based on the logical condition specified--so using -encode- doesn't really save any time or effort at that point. (If we were creating a many-valued variable, there would be some advantage to using -encode-.)

        Comment

        Working...
        X