Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Concatenating string variables in alphabetical order.

    I have a string variable that contains drug names. I've cleaned these names and split each drug into its own string variable:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str20(drug1 drug2)
    "CISPLATIN" "VINCRISTINE"
    "VINCRISTINE" "CISPLATIN"
    "VINCRISTINE" "AFATINIB"
    end
    Some of the rows are identical but the drugs are in different orders. I'm looking for a way of concatenating these separated strings such that the drugs on each row are ordered alphabetically.

    Any idea?
    Last edited by Craig Knott; 03 Oct 2018, 07:41.

  • #2
    concat() doesn't offer the hook you need, so you must sort the variables rowwise first rowsort from the Stata Journal is a candidate.

    SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
    (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
    Q1/09 SJ 9(1):137--157
    shows how to exploit functions, egen functions, and Mata
    for working rowwise; rowsort and rowranks are introduced

    Code:
    . * Example generated by -dataex-. To install: ssc install dataex
    . clear
    
    . input str20(drug1 drug2)
    
                        drug1                 drug2
      1. "CISPLATIN" "VINCRISTINE"
      2. "VINCRISTINE" "CISPLATIN"
      3. "VINCRISTINE" "AFATINIB"
      4. end
    
    . 
    . rowsort drug?, gen(new1 new2) 
    
    . 
    . egen all = concat(new?), p(" ")  
    
    . 
    . list new? all
    
         +-------------------------------------------------+
         |      new1          new2                     all |
         |-------------------------------------------------|
      1. | CISPLATIN   VINCRISTINE   CISPLATIN VINCRISTINE |
      2. | CISPLATIN   VINCRISTINE   CISPLATIN VINCRISTINE |
      3. |  AFATINIB   VINCRISTINE    AFATINIB VINCRISTINE |
         +-------------------------------------------------+

    Comment


    • #3
      As I was creating the example below I just saw Nick's code in #2 where he already comes up with the rowsort solution that I arrived at after contemplating how unweildy the -cond() approach would get with more than 2 drugs. The only thing my example adds is something that was out of scope in the OP - that capitalization matters for sorting alphabetically because the ascii code for the first letter is used to sort the strings (so 'a' is larger than 'A'). This example still might be useful to some though Nick's provides the right solution.


      Code:
      clear
      input str20(drug1 drug2)
      "CISPLATIN" "VINCRISTINE" 
      "VINCRISTINE" "CISPLATIN"
      "VINCRISTINE" "AFATINIB"
      "Test" "answer"
      "test" "Answer"
      "ci1" " ciz "
      end
      
      g x = cond(drug1>drug2, 1, 0, .) //simple example
       
      g combined = cond(drug1>drug2, drug2+" "+drug1, cond(drug2>drug1, drug1+" "+drug2, "", ""))
      
      g combined2 = cond(upper(drug1)>upper(drug2), drug2+" "+drug1, cond(upper(drug2)>upper(drug1), drug1+" "+drug2, "", ""))
      **but this is complicated with more than 2 drugs
      
      **again beware the upper vs. lowercase
      rowsort drug1 drug2, g(rowsorted1 rowsorted2)  //installed via -findit- from SSC
      g combined3 = rowsorted1+" "+rowsorted2
      
      l comb*
      Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

      Comment


      • #4
        Worked a charm after getting rid of some pesky spaces. Much appreciated.

        Comment


        • #5
          I apologize in advance if I've missed something in the documentation that would explain this, but when I run Nick Cox's code above, I get a "string variables not allowed in varlist" error. The rowsort help file likewise suggests that only numeric values be used, so I don't understand how the code above can work. Is there some other ado file I need to install first in order for rowsort to function as it does above?

          Comment


          • #6
            Never mind. I tried installing the pr0046 package instead of the rowsort package, and now the code above works as expected. Sorry for the interruption.

            Comment


            • #7
              As explained in #2 you need the version from the Stata Journal — not that from SSC.

              Comment

              Working...
              X