Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting date string from a string variable containing other misc. text

    Hi

    I have a string variable that sometimes contains a date in the same format (DD Mon YY). The rest of the string differs.

    Do you have any tips on how to extract this date? I was thinking along the lines of finding the first number & then extracting the number of characters, or using regular expressions for the date format, but I haven't managed to succeed yet.

    In the dataex example below, "status" shows examples of the string variable & "desiredvar" shows what I would like to extract.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str40 status str9 desiredvar float date
    "Ongoing: stopped on 03 Feb 21"          "03 Feb 21" 22314
    "N/A"                                    ""              .
    "Discontinued 06 Apr 21 - LTFU"          "06 Apr 21" 22376
    "N/A"                                    ""              .
    "N/A"                                    ""              .
    "Ongoing: stopped on 12 May 21"          "12 May 21" 22412
    "N/A"                                    ""              .
    "Discontinued 16 Apr 21 - ran out"       "16 Apr 21" 22386
    "Ongoing: stopped on 08 Apr 21"          "08 Apr 21" 22378
    "Ongoing: stopped on 06 Jul 21"          "06 Jul 21" 22467
    "Discontinued 06 Apr 21 (adverse event)" "06 Apr 21" 22376
    "Ongoing: stopped on 23 Nov 21 (EOS)"    "23 Nov 21" 22607
    end
    format %td date
    Any pointers would be greatly appreciated!



  • #2
    This works with your data example using moss from SSC:


    Code:
    moss status, match("([0-9][0-9])") regex prefix(num)
    drop numcount numpos? 
    moss status, match("(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)") regex prefix(month)
    drop monthcount monthpos?

    Comment


    • #3
      Works perfectly - thank you, Nick

      Comment


      • #4
        Also:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str40 status str9 desiredvar float date
        "Ongoing: stopped on 03 Feb 21"          "03 Feb 21" 22314
        "N/A"                                    ""              .
        "Discontinued 06 Apr 21 - LTFU"          "06 Apr 21" 22376
        "N/A"                                    ""              .
        "N/A"                                    ""              .
        "Ongoing: stopped on 12 May 21"          "12 May 21" 22412
        "N/A"                                    ""              .
        "Discontinued 16 Apr 21 - ran out"       "16 Apr 21" 22386
        "Ongoing: stopped on 08 Apr 21"          "08 Apr 21" 22378
        "Ongoing: stopped on 06 Jul 21"          "06 Jul 21" 22467
        "Discontinued 06 Apr 21 (adverse event)" "06 Apr 21" 22376
        "Ongoing: stopped on 23 Nov 21 (EOS)"    "23 Nov 21" 22607
        end
        format date %td
        
        gen Date= date(ustrregexra(status, ".*(\d{2}\s\w{3}\s\d{2}).*", "$1"), "DM20Y")
        format Date %td
        Res.:

        Code:
        . l, sep(0)
        
             +----------------------------------------------------------------------------+
             |                                 status   desired~r        date        Date |
             |----------------------------------------------------------------------------|
          1. |          Ongoing: stopped on 03 Feb 21   03 Feb 21   03feb2021   03feb2021 |
          2. |                                    N/A                       .           . |
          3. |          Discontinued 06 Apr 21 - LTFU   06 Apr 21   06apr2021   06apr2021 |
          4. |                                    N/A                       .           . |
          5. |                                    N/A                       .           . |
          6. |          Ongoing: stopped on 12 May 21   12 May 21   12may2021   12may2021 |
          7. |                                    N/A                       .           . |
          8. |       Discontinued 16 Apr 21 - ran out   16 Apr 21   16apr2021   16apr2021 |
          9. |          Ongoing: stopped on 08 Apr 21   08 Apr 21   08apr2021   08apr2021 |
         10. |          Ongoing: stopped on 06 Jul 21   06 Jul 21   06jul2021   06jul2021 |
         11. | Discontinued 06 Apr 21 (adverse event)   06 Apr 21   06apr2021   06apr2021 |
         12. |    Ongoing: stopped on 23 Nov 21 (EOS)   23 Nov 21   23nov2021   23nov2021 |
             +----------------------------------------------------------------------------+

        Comment

        Working...
        X