Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to convert alphanumeric values into numbers

    Hello,

    I have an alphanumeric string variable that have different characters for each observation. Eg: 678-xyz, 678-qrs, 677-abcd, 20, 100 . There are only these 3 types of suffixes. Some values have no suffix. Values without any suffix run from 1 to 620. Values with suffix run from 621 to 1200.

    I want to convert these values to a numeric format as follows:
    • 10678 (for values that have -xyz suffix) i.e. add a prefix of 1 or 10 in place of -xyz suffix -to make it a 5 digit number.
    • 20678 (for values with -qrs suffix) i.e. add a prefix of 2 or 20 in place of -qrs suffix -to make it a 5 digit number.
    • 30677 (for vallues with -abcd suffix) i.e. add a prefix of 3 or 30 in place of -qrs suffix -to make it a 5 digit number.
    • 40020, 40100 (for values without any suffix such as 20, 100 etc)- add a prefix of 4 or 40 or 400 or 4000 to make it a 5-digit number.
    Any tips on an efficient way to do this would be hugely appreciated!

    Thanks

  • #2
    The Statalist FAQ give good avice on posting, including giving data examples
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 strvar
    "1"        
    "620"      
    "678-xyz"  
    "678-qrs"  
    "677-abcd"
    "1200-abcd"
    end
    one way can be to use nested cond():
    Code:
    gen numpart = real(substr(strvar, 1, -1 + strpos(strvar, "-")))
    
    gen new2 = ///
        cond(!mi(real(strvar)) , real(strvar) + 4*10^4, ///
        cond(strpos(strvar, "xyz")  ,  1*10^4 + numpart, ///
        cond(strpos(strvar, "qrs")  ,  2*10^4 + numpart, ///  
        cond(strpos(strvar, "abcd") ,  3*10^4 + numpart, . ))))
    
    drop numpart
    or without making a variable for the numeric part:
    Code:
    gen new1 = ///
        cond(!mi(real(strvar)) , real(strvar) + 4*10^4, ///
        cond(strpos(strvar, "xyz")  ,  1*10^4 + real(substr(strvar, 1, -1 + strpos(strvar, "-"))), ///
        cond(strpos(strvar, "qrs")  ,  2*10^4 + real(substr(strvar, 1, -1 + strpos(strvar, "-"))), ///  
        cond(strpos(strvar, "abcd") ,  3*10^4 + real(substr(strvar, 1, -1 + strpos(strvar, "-"))),. ))))
    Code:
    . list , sep(10)
    
         +---------------------------+
         |    strvar    new2    new1 |
         |---------------------------|
      1. |         1   40001   40001 |
      2. |       620   40620   40620 |
      3. |   678-xyz   10678   10678 |
      4. |   678-qrs   20678   20678 |
      5. |  677-abcd   30677   30677 |
      6. | 1200-abcd   31200   31200 |
         +---------------------------+
    Last edited by Bjarte Aagnes; 11 Apr 2020, 09:52.

    Comment


    • #3
      Thank you for replying Bjarte Aagnes . When I use this command:

      gen numpart = real(substr(strvar, 1, -1 + strpos(strvar, "-")))

      2 missing values are generated for the observations without the alphabetical values. I am not sure if I am unable to proceed to the second command (gen new2=///...) due to the missing values, but it says invalid syntax r(198): command cond is unrecognized r(199);. Same problem encountered when I typed the gen new1= /// command. Is it because I am using stata 14? Not sure what the problem is. I am pretty new to STATA so I am a bit lost.
      Last edited by Sanjay Sharma; 13 Apr 2020, 11:47.

      Comment


      • #4
        2 missing values are generated for the observations without the alphabetical values. Bjarte Aagnes Can this be because the missing variables did not have "-"?

        Comment


        • #5
          Copy the code to a do file and mark and run all lines, not line by line.
          Code:
          gen numpart = real(substr(strvar, 1, -1 + strpos(strvar, "-")))
          
          gen new2 = ///
              cond(!mi(real(strvar)) , real(strvar) + 4*10^4, ///
              cond(strpos(strvar, "xyz")  ,  1*10^4 + numpart, ///
              cond(strpos(strvar, "qrs")  ,  2*10^4 + numpart, ///  
              cond(strpos(strvar, "abcd") ,  3*10^4 + numpart, . ))))
          
          drop numpart
          The /// split long lines, here one generate command, across multiple lines in the do-file, see -help comments-

          Comment


          • #6
            Thank you for the codes. I did as you said. Ran all lines at once. but it is still showing invalid syntax.

            Click image for larger version

Name:	Screen Shot 2020-04-13 at 7.27.25 PM.png
Views:	1
Size:	448.4 KB
ID:	1546537

            Comment


            • #7
              Sorry for the confusion. I did copy the code and entered it in the do file, instead of typing it in the command box. And it worked! Thank you very Bjarte Aagnes very much.

              Comment


              • #8
                Hi! I have a similar question with Sanjay Sharma 's. I want to create numeric values from alphanumeric string observations shown below:

                0111A
                0112B2
                0112B2
                0123C9

                What code could I use?

                Thank you very much in advance.

                Comment


                • #9
                  #8 What numbers do you want to see -- and to use?

                  Code:
                  encode strvar, gen(numvar)
                  might be enough, but otherwise the answer depends on the goal.

                  Comment

                  Working...
                  X