Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Flagging where values in strings change between rows.

    I'm currently playing around with a longitudinal pharmaceutical dataset. Within the dataset, patient drug regimens are listed sequentially by row. Each row includes a string variable that describes all of the drugs in each regimen (e.g. 'PACLITAXEL + CARBOPLATIN").

    In instances where a patient has multiple regimens, I'm attempting to flag various changes in treatment. By way of an example, these include where:
    • One or more drugs from regimen1 are continued at regimen2, with a new drug also initiated at regimen2.
    Patient Regimen number Drug string
    1 1 PACLITAXEL + CARBOPLATIN
    1 2 CARBOPLATIN + MITOMYCIN
    Any ideas on the best way to create such a flag? I'm struggling.

  • #2
    Please use dataex as requested in FAQ Advice #12 to show data. Something like

    Code:
    bysort Patient (Regimen) : gen flag = Drug != Drug[_n-1]
    answers the title question, but it's entirely literal and will (for example) flag "A + B" as different from "B + A" and will also be sensitive to differences between upper and lower case, in leading, trailing and embedded spaces and in any small differences in abbreviation or spelling, whether correct or incorrect.

    Comment


    • #3
      Sadly, as you state, that will only flag any difference as opposed to a specific type of difference.

      If it helps, I add some example data below, with one of the regimens to be flagged highlighted in bold:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float makeid str50 regimen_drugs str24 regimen_drugs_1 str21 regimen_drugs_2 str12 regimen_drugs_3
       1 "DOCETAXEL + NINTEDANIB"            "DOCETAXEL"   "NINTEDANIB"  ""           
       2 "CARBOPLATIN + PACLITAXEL"          "CARBOPLATIN" "PACLITAXEL"  ""           
       2 "BEVACIZUMAB + CARBO + GEMCITABINE" "BEVACIZUMAB" "CARBOPLATIN" "GEMCITABINE"
       2 "PACLITAXEL"                        "PACLITAXEL"  ""            ""           
       3 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
       3 "VINFLUNINE"                        "VINFLUNINE"  ""            ""           
       4 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
       5 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
       6 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
       6 "ERLOTINIB"                         "ERLOTINIB"   ""            ""           
       7 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
       8 "CISPLATIN + PEMETREXED"            "CISPLATIN"   "PEMETREXED"  ""           
       8 "DOCETAXEL"                         "DOCETAXEL"   ""            ""           
       8 "ERLOTINIB"                         "ERLOTINIB"   ""            ""           
       9 "CISPLATIN + PEMETREXED"            "CISPLATIN"   "PEMETREXED"  ""           
      10 "CISPLATIN + VINORELBINE"           "CISPLATIN"   "VINORELBINE" ""           
      11 "CISPLATIN + GEMCITABINE"           "CISPLATIN"   "GEMCITABINE" ""           
      12 "CARBOPLATIN + PACLITAXEL"          "CARBOPLATIN" "PACLITAXEL"  ""           
      12 "CARBOPLATIN + PACLITAXEL"          "CARBOPLATIN" "PACLITAXEL"  ""           
      13 "CISPLATIN + VINORELBINE"           "CISPLATIN"   "VINORELBINE" ""           
      14 "CARBOPLATIN + ETOPOSIDE"           "CARBOPLATIN" "ETOPOSIDE"   ""           
      14 "CARBOPLATIN + ETOPOSIDE"           "CARBOPLATIN" "ETOPOSIDE"   ""           
      15 "CISPLATIN + PEMETREXED"            "CISPLATIN"   "PEMETREXED"  ""           
      16 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      17 "CARBOPLATIN + PACLITAXEL"          "CARBOPLATIN" "PACLITAXEL"  ""           
      18 "CARBOPLATIN + PACLITAXEL"          "CARBOPLATIN" "PACLITAXEL"  ""           
      19 "PAZOPANIB"                         "PAZOPANIB"   ""            ""           
      19 "AXITINIB"                          "AXITINIB"    ""            ""           
      20 "CISPLATIN + VINORELBINE"           "CISPLATIN"   "VINORELBINE" ""           
      21 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      22 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
      23 "CISPLATIN + GEMCITABINE"           "CISPLATIN"   "GEMCITABINE" ""           
      23 "CISPLATIN + GEMCITABINE"           "CISPLATIN"   "GEMCITABINE" ""           
      24 "CISPLATIN + PEMETREXED"            "CISPLATIN"   "PEMETREXED"  ""           
      25 "CARBOPLATIN + PACLITAXEL"          "CARBOPLATIN" "PACLITAXEL"  ""           
      26 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      27 "CISPLATIN + VINORELBINE"           "CISPLATIN"   "VINORELBINE" ""           
      28 "CARBOPLATIN + VINORELBINE"         "CARBOPLATIN" "VINORELBINE" ""           
      29 "CISPLATIN + ETOPOSIDE"             "CISPLATIN"   "ETOPOSIDE"   ""           
      30 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
      31 "GEFITINIB"                         "GEFITINIB"   ""            ""           
      32 "CISPLATIN + ETOPOSIDE"             "CISPLATIN"   "ETOPOSIDE"   ""           
      33 "CARBOPLATIN"                       "CARBOPLATIN" ""            ""           
      33 "CARBOPLATIN"                       "CARBOPLATIN" ""            ""           
      34 "CARBOPLATIN + VINORELBINE"         "CARBOPLATIN" "VINORELBINE" ""           
      35 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      35 "DOCETAXEL"                         "DOCETAXEL"   ""            ""           
      36 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
      37 "CISPLATIN + VINORELBINE"           "CISPLATIN"   "VINORELBINE" ""           
      38 "CISPLATIN + VINORELBINE"           "CISPLATIN"   "VINORELBINE" ""           
      38 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      39 "CISPLATIN + VINORELBINE + RT"      "CISPLATIN"   "VINORELBINE" ""           
      39 "CISPLATIN + VINORELBINE + RT"      "CISPLATIN"   "VINORELBINE" ""           
      39 "CISPLATIN + VINORELBINE + RT"      "CISPLATIN"   "VINORELBINE" ""           
      39 "CISPLATIN + VINORELBINE + RT"      "CISPLATIN"   "VINORELBINE" ""           
      40 "CISPLATIN + VINORELBINE"           "CISPLATIN"   "VINORELBINE" ""           
      41 "CISPLATIN + VINORELBINE"           "CISPLATIN"   "VINORELBINE" ""           
      41 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
      42 "CARBOPLATIN + PACLITAXEL"          "CARBOPLATIN" "PACLITAXEL"  ""           
      43 "CARBOPLATIN + VINORELBINE"         "CARBOPLATIN" "VINORELBINE" ""           
      44 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
      45 "CARBOPLATIN + VINORELBINE"         "CARBOPLATIN" "VINORELBINE" ""           
      46 "CISPLATIN + GEMCITABINE"           "CISPLATIN"   "GEMCITABINE" ""           
      46 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      47 "CISPLATIN + GEMCITABINE"           "CISPLATIN"   "GEMCITABINE" ""           
      48 "CARBOPLATIN + VINORELBINE"         "CARBOPLATIN" "VINORELBINE" ""           
      49 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      50 "CARBOPLATIN + PACLITAXEL"          "CARBOPLATIN" "PACLITAXEL"  ""           
      51 "CARBOPLATIN + ETOPOSIDE"           "CARBOPLATIN" "ETOPOSIDE"   ""           
      52 "CISPLATIN + PEMETREXED"            "CISPLATIN"   "PEMETREXED"  ""           
      53 "CISPLATIN + PEMETREXED"            "CISPLATIN"   "PEMETREXED"  ""           
      54 "CARBOPLATIN + VINORELBINE"         "CARBOPLATIN" "VINORELBINE" ""           
      55 "CARBOPLATIN + PACLITAXEL"          "CARBOPLATIN" "PACLITAXEL"  ""           
      55 "CARBOPLATIN"                       "CARBOPLATIN" ""            ""           
      56 "GEFITINIB"                         "GEFITINIB"   ""            ""           
      57 "CISPLATIN + VINORELBINE"           "CISPLATIN"   "VINORELBINE" ""           
      58 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      59 "CISPLATIN + VINORELBINE"           "CISPLATIN"   "VINORELBINE" ""           
      60 "CISPLATIN + ETOPOSIDE"             "CISPLATIN"   "ETOPOSIDE"   ""           
      61 "PEMETREXED"                        "PEMETREXED"  ""            ""           
      62 "CISPLATIN + PEMETREXED"            "CISPLATIN"   "PEMETREXED"  ""           
      62 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
      63 "CARBOPLATIN + ETOPOSIDE"           "CARBOPLATIN" "ETOPOSIDE"   ""           
      64 "CARBOPLATIN + VINORELBINE"         "CARBOPLATIN" "VINORELBINE" ""           
      64 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
      64 "DOCETAXEL"                         "DOCETAXEL"   ""            ""           
      65 "CISPLATIN + PEMETREXED"            "CISPLATIN"   "PEMETREXED"  ""           
      66 "CARBOPLATIN + VINORELBINE"         "CARBOPLATIN" "VINORELBINE" ""           
      67 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      68 "CARBOPLATIN + VINORELBINE"         "CARBOPLATIN" "VINORELBINE" ""           
      69 "CISPLATIN + PEMETREXED"            "CISPLATIN"   "PEMETREXED"  ""           
      70 "CISPLATIN + GEMCITABINE"           "CISPLATIN"   "GEMCITABINE" ""           
      70 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      71 "CISPLATIN + GEMCITABINE"           "CISPLATIN"   "GEMCITABINE" ""           
      72 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      72 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      73 "GEMCITABINE + CARBOPLATIN"         "GEMCITABINE" "CARBOPLATIN" ""           
      74 "CARBOPLATIN + PEMETREXED"          "CARBOPLATIN" "PEMETREXED"  ""           
      75 "CARBOPLATIN + PACLITAXEL"          "CARBOPLATIN" "PACLITAXEL"  ""           
      75 "CARBOPLATIN"                       "CARBOPLATIN" ""            ""           
      end

      Comment


      • #4
        Thanks for the example, but unfortunately I have no idea what to suggest here. You can check specifically for exactly those strings before and after, but otherwise I don't know what you're seeking more generally.

        Comment


        • #5
          Here's a solution with a fairly roundabout way of sorting the drugs alphabetically and then comparing the names of each number of the drugs.
          This might fail in case of leading or trailing spaces, misspelling of drug names, and possibly capitalization (I'm not sure how Stata orders lower case vs upper case letters).
          I should also note your 'split' did not seem to have worked perfectly for id's 39, with one of the drugs not listed separately.

          Code:
          bysort makeid: gen regimen_no = _n
          reshape long regimen_drugs_, i(makeid regimen_no) j(drug_no)
          drop if regimen_drugs_==""
          sort makeid regimen_no regimen_drugs_
          bysort makeid regimen_no: replace drug_no = _n
          reshape wide regimen_drugs_, i(makeid regimen_no) j(drug_no)
          
          bysort makeid (regimen_no): gen flag = 0
          bysort makeid (regimen_no): replace flag = 1 if regimen_drugs_1 != regimen_drugs_1[_n-1]
          bysort makeid (regimen_no): replace flag = 1 if regimen_drugs_2 != regimen_drugs_2[_n-1]
          bysort makeid (regimen_no): replace flag = 1 if regimen_drugs_3 != regimen_drugs_3[_n-1]
          bysort makeid (regimen_no): replace flag = 0 if _n==1

          Comment


          • #6
            Given the explanation in #1, I think Jorrit's solution is not gonna work for Craig’s expectation. For example, it would give out (unexpected) flags for the cases wherein all regimens are different from the last time (new: d e vs old: a b c) or the cases wherein no new regimen(s) are initiated (new: b c vs old: a b c).

            If this understanding of mine is correct, the below code might be working. It should also be noted that the code is still processing on the assumption that there are no space(s) in the names of any single regiment.

            Code:
            egen regimens_now =concat(regimen_drugs_*), p(" ")
            bys makeid: gen regimens_before=regimens_now[_n-1]
            gen flag=0
            forval i=1/`=_N' {
                local now  =regimens_now[`i']
                local before = regimens_before[`i']
                local dup: list now & before
                local newadded: list now - dup
                qui replace flag= ("`dup'"!="") & ("`newadded'"!="") in `i'
            }

            Comment


            • #7
              Originally posted by Romalpa Akzo View Post
              Given the explanation in #1, I think Jorrit's solution is not gonna work for Craig’s expectation. For example, it would give out (unexpected) flags for the cases wherein all regimens are different from the last time (new: d e vs old: a b c) or the cases wherein no new regimen(s) are initiated (new: b c vs old: a b c).

              If this understanding of mine is correct, the below code might be working. It should also be noted that the code is still processing on the assumption that there are no space(s) in the names of any single regiment.

              Code:
              egen regimens_now =concat(regimen_drugs_*), p(" ")
              bys makeid: gen regimens_before=regimens_now[_n-1]
              gen flag=0
              forval i=1/`=_N' {
              local now =regimens_now[`i']
              local before = regimens_before[`i']
              local dup: list now & before
              local newadded: list now - dup
              qui replace flag= ("`dup'"!="") & ("`newadded'"!="") in `i'
              }
              This seems to have worked like a charm.

              I've never used -local- before (macros?). Do you mind explaining what each line is doing? Particularly highlighting where it's distinguishing whether or not there's a match between individual drugs contained in regimens_now and regimens_before.

              I'm keen to modify your code to next capture instances where (i) all drugs are still prescribed at the next regimen, and (ii) one or more new drugs are initiated.
              Last edited by Craig Knott; 10 Jul 2018, 02:32.

              Comment


              • #8
                You should try and go through the instruction:
                Code:
                help macrolists

                Comment


                • #9
                  It seems the macro above doesn't flag instances where there's a complete change in treatment between regimens, such as a switch from ‘CARBOPLATIN + PEMETREXED’ to ‘DOCETAXEL + NINTEDANIB’.

                  I'm currently reading through the help file and thought I might be able to have the code create variables at each step so I can better follow what the macro is doing, but can't seem to manage it. Replacing 'local' with global' didn't work.

                  I'll keep beavering away, but any tips or suggestions in the meantime would be a huge help.

                  Comment


                  • #10
                    1. The flag generated by my suggestion at #6 is serving for your target quoted in #1:
                    One or more drugs from regimen1 are continued at regimen2, with a new drug also initiated at regimen2.
                    2. Since you might need some other kinds of flag, it would be necessary for you to read over the instruction of -macro lists- to make appropriate selection for what you need.

                    3. The below code just gives you some hint, wherein, flag2 is tag for the complete changes, flag3 is for your mentioning at #7
                    where (i) all drugs are still prescribed at the next regimen, and (ii) one or more new drugs are initiated.
                    Code:
                    egen regimens_now =concat(regimen_drugs_*), p(" ")
                    bys makeid: gen regimens_before=regimens_now[_n-1]
                    
                    gen flag=0 //keepsome + newadded
                    gen flag2=0 //all change
                    gen flag3=0 //keepall + newadded
                    
                    forval i=1/`=_N' {
                    local now =regimens_now[`i']
                    local before = regimens_before[`i']
                    
                    local dup: list now & before 
                    local newadded: list now - dup
                    local keepall: list before===dup
                    
                    qui replace flag= ("`dup'" != "") & ("`newadded'" != "") in `i'
                    qui replace flag2=("`dup'" == "") in `i'
                    qui replace flag3 = (`keepall'==1) & ("`dup'" != "") & ("`newadded'" != "") in `i'
                    }

                    Comment

                    Working...
                    X