Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Recommendation for code to pull parts of string out of a long string field

    Hi Stata folks,

    I have a question, and am not even sure how to search on it (google and the search function here haven't helped - but most likely because I don't know how to articulate my question concisely!).

    I have a long text field of author names from manuscripts, a variable called 'author'. My data set has about 600 records (each record is one manuscript). The names within the variable 'author' are separated by semicolons ';' and I'd like to be able to create X new variables, each with the first author, second author, third author, etc. where X = the longest list (for example, generate a variable named: author_number_`x'). The most authors I have listed in my data set is one paper with 73 authors - a large scientific network publication.

    Then from this list, I would like to create another variable, "last_author" that takes the name from the last, non-missing data point from author_number_`x'.

    Perhaps my coffee was inadequate today, but I'm stuck. Any suggestions? Much thanks in advance.

  • #2
    Code:
    help split
    Provide a dataex example for code suggestions.

    Comment


    • #3
      Fantastic! That was the command I was looking for:

      . split author, p(

      Worked great.

      Any idea how to find the last author? The author number varies a fair bit.

      Here's a sample of my data (apologies, I'm not sure the best way to post data as an example, I hope this is helpful?)
      author
      A. Abaasa; G. Asiki; J. Mpendo; J. Levin; J. Seeley; L. Nielsen; A. Ssetaala; A. Nanvubya; J. De Bont; P. Kaleebu; A. Kamali
      A. Abaasa; G. Asiki; A. Obuku Ekii; J. Wanyenze; P. Pala; J. v. D. G; P. Corstjens; P. Hughes; S. Ding; G. Pantaleo; P. Kaleebu; A. M. Elliott; A. Kamali
      A. Abaasa; G. Asiki; M. A. Price; E. Ruzagira; F. Kibengo; U. Bahemuka; P. E. Fast; A. Kamali
      A. Abaasa; C. Hendrix; M. Gandhi; P. Anderson; A. Kamali; F. Kibengo; E. J. Sanders; G. Mutua; N. N. Bumpus; F. Priddy; J. E. Haberer
      A. Abaasa; Y. Mayanja; G. Asiki; M. A. Price; P. E. Fast; E. Ruzagira; P. Kaleebu; J. Todd
      A. Abaasa; S. Nash; Y. Mayanja; M. Price; P. E. Fast; A. Kamali; P. Kaleebu; J. Todd
      A. Abaasa; S. Nash; Y. Mayanja; M. A. Price; P. E. Fast; P. Kaleebu; J. Todd
      A. Abaasa; J. Todd; Y. Mayanja; M. Price; P. E. Fast; P. Kaleebu; S. Nash
      A. Abaasa; J. Todd; S. Nash; Y. Mayanja; P. Kaleebu; P. E. Fast; M. Price
      A. M. Abaasa; G. Asiki; J. Levin; U. Bahemuka; E. Ruzagira; F. M. Kibengo; J. Mulondo; J. Ndibazza; M. A. Price; P. Fast; A. Kamali
      R. Abujaber; P. R. Shea; P. J. McLaren; S. Lakhi; J. Gilmour; S. Allen; J. Fellay; E. J. Hollox; S. H. I. V. C. S. Iavi Africa Hiv Prevention Partnership
      M. M. Addo; M. Altfeld; D. M. Brainard; A. Rathod; A. Piechocka-Trocha; U. Fideli; J. Mulenga; E. Shutes; D. M. Alvino; E. Hunter; S. A. Allen; B. D. Walker

      Comment


      • #4
        ha. I like how the forum transformed my code into an emoji - but hopefully from context it's clear.

        Comment


        • #5
          Code:
          clear
          input strL author
          "A. Abaasa; G. Asiki; J. Mpendo; J. Levin; J. Seeley; L. Nielsen; A. Ssetaala; A. Nanvubya; J. De Bont; P. Kaleebu; A. Kamali"
          "A. Abaasa; C. Hendrix; M. Gandhi; P. Anderson; A. Kamali; F. Kibengo; E. J. Sanders; G. Mutua; N. N. Bumpus; F. Priddy; J. E. Haberer"
          "A. Abaasa; Y. Mayanja; G. Asiki; M. A. Price; P. E. Fast; E. Ruzagira; P. Kaleebu; J. Todd"
          end
          
          gen wanted= ustrregexra(author, ".*;\s+(.*)$", "$1")
          Res.:

          Code:
          . l
          
               +-------------------------------------------------------------------------------------------------------------------------------------------------------+
               |                                                                                                                                author          wanted |
               |-------------------------------------------------------------------------------------------------------------------------------------------------------|
            1. |          A. Abaasa; G. Asiki; J. Mpendo; J. Levin; J. Seeley; L. Nielsen; A. Ssetaala; A. Nanvubya; J. De Bont; P. Kaleebu; A. Kamali       A. Kamali |
            2. | A. Abaasa; C. Hendrix; M. Gandhi; P. Anderson; A. Kamali; F. Kibengo; E. J. Sanders; G. Mutua; N. N. Bumpus; F. Priddy; J. E. Haberer   J. E. Haberer |
            3. |                                            A. Abaasa; Y. Mayanja; G. Asiki; M. A. Price; P. E. Fast; E. Ruzagira; P. Kaleebu; J. Todd         J. Todd |
               +-------------------------------------------------------------------------------------------------------------------------------------------------------+

          Comment


          • #6
            Thank you so much, this is perfect!

            Comment

            Working...
            X