Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing the order of values using encode

    Like a number of others, I'm trying to change the order of the values when converting from a string to numeric variable using encode(). I've read the manual for encode and sencode, a number of the tutorials and Q/A pairs on STATALIST (e.g., this one) about how to do this using encode(varname), label(val_lable). I seem to have a fundamental misunderstanding how the encode(varname), label(labelname) command works.

    Encode without the label option converts the string in alphabetic order into integers as expected.

    Code:
    . encode(primary_utility), generate(purpose_1)
    . tab purpose_1
    
                  purpose_1 |      Freq.     Percent        Cum.
    ------------------------+-----------------------------------
    build_new_relationships |        502       11.24       11.24
               coordination |        353        7.90       19.14
          emotional_support |        234        5.24       24.37
                  entertain |        659       14.75       39.12
      existing_relationship |        493       11.03       50.16
        information_support |      1,240       27.75       77.91
      others_please_specify |        155        3.47       81.38
                  promotion |        337        7.54       88.92
                transaction |        495       11.08      100.00
    ------------------------+-----------------------------------
                      Total |      4,468      100.00
    
    . tab purpose_1, nolabel
    
      purpose_1 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |        502       11.24       11.24
              2 |        353        7.90       19.14
              3 |        234        5.24       24.37
              4 |        659       14.75       39.12
              5 |        493       11.03       50.16
              6 |      1,240       27.75       77.91
              7 |        155        3.47       81.38
              8 |        337        7.54       88.92
              9 |        495       11.08      100.00
    ------------+-----------------------------------
          Total |      4,468      100.00
    However, when I use encode with the label option, I don't understand the resulting order. I enter the value labels in alphabet order ala the raw encode command, and preface each value label with a number to indicate the position in which it should appear. But the resulting order is not what I expected. In the example below, I don't understand why the integer "1" wasn't used and why the integers 10 - 12 got created with no text labels attached to them even though there are only 9 values. I'd appreciate any help in helping me to understand how encode with the label option works and what I'm doing wrong.


    Code:
    label define lbl_purpose 3 "new_relationships"  4 "coordination" 2 "emotional_support" ///
    8 "entertain" 1 "existing_relationship" 5 "information_support" 9 "other"  6  "promotion" ///
    7  "transaction"
    
    encode  primary_utility, gen(purpose_2)  label(lbl_purpose)
    
    . tab purpose_2
    
                purpose_2 |      Freq.     Percent        Cum.
    ----------------------+-----------------------------------
        emotional_support |        234        5.24        5.24
             coordination |        493       11.03       16.27
      information_support |        659       14.75       31.02
                promotion |      1,240       27.75       58.77
              transaction |        337        7.54       66.32
                entertain |        495       11.08       77.39
                       10 |        502       11.24       88.63
                       11 |        353        7.90       96.53
                       12 |        155        3.47      100.00
    ----------------------+-----------------------------------
                    Total |      4,468      100.00
    . tab purpose_2, nolabel
    
      purpose_2 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              2 |        234        5.24        5.24
              4 |        493       11.03       16.27
              5 |        659       14.75       31.02
              6 |      1,240       27.75       58.77
              7 |        337        7.54       66.32
              8 |        495       11.08       77.39
             10 |        502       11.24       88.63
             11 |        353        7.90       96.53
             12 |        155        3.47      100.00
    ------------+-----------------------------------
          Total |      4,468      100.00
    [/CODE]
    This is the order I wanted:

    Code:
    1 "existing_relationship"
    2 "emotional_support"
    3 "new_relationships"
    4 "coordination"
    5 "information_support"
    6  "promotion"
    7  "transaction"
    8 "entertain"
    9 "other"
    Last edited by robertekraut; 07 Apr 2022, 16:13.

  • #2
    Well, hat may be the order you want, but according to the tabulation you show in #1 there are no observations with "new_relationships" or "other". The data has "build_new_relationships," but you cannot expect Stata to treat that as the same as "new_relationships." Similarly if you think that Stata will see "other" and aggregate all heretofore unmentioned values of purpose_1 into a single category and label it "other," that is not going to happen. You will have to do that yourself first.

    So that explains where two things went wrong. But you have a third problem because Stata added three distinct values, 10, 11, and 12, to the numeric variable, and we have only accounted for two so far. I notice that in your tabulation of purpose_2, "existing_relationship" does not appear, even though it is associated with 1 in your label and we see it in the output of tab purpose_1. My best guess here is that in purpose_1, the value "existing_relationship" either has some blank padding (e.g. maybe Stata has it as " existing_relationship" or "existing_relationship "), or has some non-printing character embedded somewhere in the visible existing_relationship displayed. You would not be able to see that with your eyes. Had you provided example data using the -dataex- command, it would be possible to pursue this aspect of things further. Suffice it to say, what looks like "existing_relationship" in purpose_1 is, somehow, not actually that, and you need to chase that down. (Less likely, it may be that in creating the label you introduced a non-printing character into "existing_label." If you are using some exotic editor for your code, or if you copied it from some other file, that could happen. But if you were just typing it into Stata's do-file editor, it's hard to see how that could be the case.)

    In the future, when asking for help with code, show example data. And when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
    Last edited by Clyde Schechter; 07 Apr 2022, 17:51.

    Comment


    • #3
      Probably some of the strings are different from the labels that you specify. Spaces and capitalization are usually the main culprits. You can start addressing these using

      Code:
      replace primary_utility= lower(trim(primary_utility))
      Otherwise, if everything is OK, a variable with the specified order should be created.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str25 primary_utility
      "coordination"        
      "other"                
      "transaction"          
      "transaction"          
      "coordination"        
      "new_relationships"    
      "existing_relationship"
      "emotional_support"    
      "entertain"            
      "entertain"            
      "existing_relationship"
      "promotion"            
      "new_relationships"    
      "coordination"        
      "transaction"          
      end
      
      label define lbl_purpose 1 "existing_relationship" 2 "emotional_support" 3 "new_relationships"  ///
      4 "coordination" 5 "information_support"6 "promotion"  7 "transaction" 8 "entertain" 9 "other", modify
      
      encode primary_utility, gen(purpose_2)  label(lbl_purpose)
      tab purpose_2
      tab purpose_2, nolab
      Res.:

      Code:
      . tab purpose_2
      
                  purpose_2 |      Freq.     Percent        Cum.
      ----------------------+-----------------------------------
      existing_relationship |          2       13.33       13.33
          emotional_support |          1        6.67       20.00
          new_relationships |          2       13.33       33.33
               coordination |          3       20.00       53.33
                  promotion |          1        6.67       60.00
                transaction |          3       20.00       80.00
                  entertain |          2       13.33       93.33
                      other |          1        6.67      100.00
      ----------------------+-----------------------------------
                      Total |         15      100.00
      
      . tab purpose_2, nolab
      
        purpose_2 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                1 |          2       13.33       13.33
                2 |          1        6.67       20.00
                3 |          2       13.33       33.33
                4 |          3       20.00       53.33
                6 |          1        6.67       60.00
                7 |          3       20.00       80.00
                8 |          2       13.33       93.33
                9 |          1        6.67      100.00
      ------------+-----------------------------------
            Total |         15      100.00
      N.b. Crossed with #2.

      Comment


      • #4
        Thank you both for your replies. They helped me understand that using label with encode maps the output order based on a match to the original text in the string variable rather than the alphabetic serial position generated from encode. Changing the label definition to exactly match the original text solved my problem. Sorry to be so slow in understanding.

        Code:
         label define lbl_purpose 3 "build_new_relationships"  4 "coordination" 2 " emotional_support" ///
        8 "entertain" 1 "existing_relationship" 5 "information_support" ///
        9 "others_please_specify"  6  "promotion" 7  "transaction"

        Comment

        Working...
        X