Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • converting alphanumeric values into numbers

    Hi there
    I have an alphanumeric string variable where the first character is always a letter while the remaining characters are always numerals (e.g. C10, G198, Z109).
    I want to convert these values into purely numeric values, simply by converting the first character into a number so that A=1, B=2, C=3 etc.
    So, for example,
    C10 becomes 310
    G198 becomes 7198
    Z109 becomes 26109
    etc..
    Any tips on an efficient way to do this would be hugely appreciated!

    Thanks

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 have
    "C10"
    "G198"
    "Z109"
    end
    
    gen int num_prefix = (strpos("`c(ALPHA)'", substr(have, 1, 1)) + 1)/2
    gen want = real(string(num_prefix, "%2.0f") + substr(have, 2, .))
    
    list
    Note: Requires that there be only a single letter at the beginning, with everything else numeric. And that letter has to be upper case.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Last edited by Clyde Schechter; 01 May 2018, 11:05.

    Comment


    • #3
      The coding scheme proposed in post #1, which gives no constraint on the letters and digits, can lead to ambiguous coded values.
      Code:
           +-------------------------+
           | have   num_pr~x    want |
           |-------------------------|
        1. |  C10          3     310 |
        2. | G198          7    7198 |
        3. | Z109         26   26109 |
        4. | A123          1    1123 |
        5. |  K23         11    1123 |
           +-------------------------+
      I would code A as 11, B as 12, and so forth, so that the first two digits of the number always represent the letter.
      Last edited by William Lisowski; 01 May 2018, 13:48.

      Comment


      • #4
        Many thanks for this!

        Comment


        • #5
          I have to be curious. These look like identifiers. If so, why is an all numeric identifier an improvement? If not, what are they?

          Comment


          • #6
            Hi Nick
            They're ICD codes - I need them to be numeric to be compatible with other pre-existing code.
            Many thanks again for everyone's input.

            Comment


            • #7
              So, if these are ICD codes (ICD 10 it appears) I would be more inclined to modify the earlier code, taking advantage of Stata's very useful -icd10- suite of commands, rather than mess with the codes themselves. As William points out in #3, what you asked for is not a one-to-one mapping and will collapse some distinct codes to the same transform. Rather than deal with homebrew codes, it is probably better to update the program to deal with modern ICD coding.

              Comment

              Working...
              X