Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Customizing value labels with encode or related commands

    Hello:
    I am trying to encode str var into an encoded var_code, which is typically a simple command... IF you let Stata choose the order that it assigns the number-value label match for you. However, I have been trying to customize this and define which number I would like to represent which string--but have had no luck, other than manually using...

    Where var contains observations Truck, Car, Boat, Plane:

    Code:
    gen var_code=.
    replace rec_code=1 if var == "Truck"
    replace rec_code=2 if var == "Boat"
    replace rec_code=3 if var == "Plane"
    replace rec_code=4 if var == "Car"
    Then I manually add value labels to those new numbers generated as the original representation in var.

    So I can do this--it's fine, however, if I have 250 different observations in the original var, customizing the encoding originally to the value labels desired would save redundancy in having to separately convert to number then recreate the desired "string label". And manually entering 250 different value labels is laborious.

    Thanks for assistance.

  • #2
    Install labmask from the Stata Journal, authored by Nick Cox.

    Code:
    findit labmask
    After installing

    Code:
    help labmask

    Comment


    • #3
      I am not sure what you are asking and what you want. But did you see that there is a -label- option with -encode-?

      Here is an example how it can be used:

      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . keep in 1/3
      (71 observations deleted)
      
      . keep make
      
      . label define mylabel 10 "AMC Concord" 20 "AMC Pacer" 30 "AMC Spirit"
      
      . encode make, gen(nummake) label(mylabel)
      
      . list
      
           +---------------------------+
           | make              nummake |
           |---------------------------|
        1. | AMC Concord   AMC Concord |
        2. | AMC Pacer       AMC Pacer |
        3. | AMC Spirit     AMC Spirit |
           +---------------------------+
      
      . list, nol
      
           +-----------------------+
           | make          nummake |
           |-----------------------|
        1. | AMC Concord        10 |
        2. | AMC Pacer          20 |
        3. | AMC Spirit         30 |
           +-----------------------+
      
      .
      If I did not use the label option, -encode- would do the following:

      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . keep in 1/3
      (71 observations deleted)
      
      . keep make
      
      . encode make, gen(nummake)
      
      . list
      
           +---------------------------+
           | make              nummake |
           |---------------------------|
        1. | AMC Concord   AMC Concord |
        2. | AMC Pacer       AMC Pacer |
        3. | AMC Spirit     AMC Spirit |
           +---------------------------+
      
      . list, nol
      
           +-----------------------+
           | make          nummake |
           |-----------------------|
        1. | AMC Concord         1 |
        2. | AMC Pacer           2 |
        3. | AMC Spirit          3 |
           +-----------------------+

      Comment


      • #4
        Users (other than Joro Kolev) often overlook that encode lets you define value labels in advance that are then applied. The code in #1 then becomes:

        Code:
        label define mylabel ///
            1 "Truck"        ///
            2 "Boat"         ///
            3 "Plane"        ///
            4 "Car"
            
        encode str_var , generate(var_code) label(mylabel)

        labmsk (SJ, 8-2) is great; it will only help here if you already have the numeric codes aligned with the string values or the string values happen to be sorted in the order in which you want them. In the latter situation, you need

        Code:
        generate  `c(obs_t)' code_var = _n
        labmask code_var , values(str_var)
        Alternatives in this situation include sencode and elabel (both SSC).
        Last edited by daniel klein; 06 Feb 2021, 04:47. Reason: crossed with #3

        Comment


        • #5
          Here's one way to do it.

          You can always reduce var to a separate data set, e.g. by

          Code:
          contract var
          and ignore or

          Code:
          drop _freq

          Now manually enter 1 2 3 4 ... according to taste as a corresponding new numeric variable. Now merge back with the original dataset and then use labmask (Stata Journal) to associate the two.


          Code:
          . webuse census, clear
          (1980 Census data by state)
          
          . 
          . * I need a string variable to show the principle 
          . decode region, gen(str_region)
          
          . 
          . save mycensus, replace 
          file mycensus.dta saved
          
          . 
          . contract str_region
          
          . drop _freq
          
          . list 
          
               +----------+
               | str_re~n |
               |----------|
            1. |  N Cntrl |
            2. |       NE |
            3. |    South |
            4. |     West |
               +----------+
          
          . 
          . gen wanted = 1 in 4
          (3 missing values generated)
          
          . replace wanted = 2 in 3
          (1 real change made)
          
          . replace wanted = 3 in 2
          (1 real change made)
          
          . replace wanted = 4 in 1
          (1 real change made)
          
          . 
          . list 
          
               +-------------------+
               | str_re~n   wanted |
               |-------------------|
            1. |  N Cntrl        4 |
            2. |       NE        3 |
            3. |    South        2 |
            4. |     West        1 |
               +-------------------+
          
          . merge 1:m str_region using mycensus
          (label cenreg already defined)
          
              Result                           # of obs.
              -----------------------------------------
              not matched                             0
              matched                                50  (_merge==3)
              -----------------------------------------
          
          . labmask wanted, values(str_region)
          
          . 
          . tab wanted 
          
               wanted |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                 West |         13       26.00       26.00
                South |         16       32.00       58.00
                   NE |          9       18.00       76.00
              N Cntrl |         12       24.00      100.00
          ------------+-----------------------------------
                Total |         50      100.00
          Although you're seeing above the code for entering a new variable you can do it in the Editor and Stata will echo the corresponding commands.

          Comment


          • #6
            #5 was long in the writing, given some distractions. Hence I didn't see #2 #3 #4 until afterwards. But what we tell you three times is true https://en.wikiquote.org/wiki/The_Hunting_of_the_Snark

            Comment


            • #7
              Thank you all!!! I also did not realize the option to use value labels that I had created ahead of time and command 'encode' to use them! This, and your other techniques described worked here.
              In my sample set, I had 5 out of 25 str that I wanted represented by specific numbers in the new code variable (1-5), and I was able to create those value labels ahead of time, then when running encode, it assigned those specifically, then labeled any remaining strings from 6 ,7, 9...25 in alphabetical order, as is its default.

              Comment

              Working...
              X