Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapse vs drop

    I am trying to collapse this data of all treatment lines to include only the first treatments for the CLL RT time points. Each patient is one id listing first treatment drugs, time points for CLL and RT treatment time points as well as time to first treatment (cllttft). The string drug columns are preventing the collapse columns. I need keep only one row per id including id and clltxn through cllttft
    tried variations of drop and collapse but I cant get it right. Guess I have to encode the strings to get this to work?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id str5 dis str10 drug str12 linedate float linedate2 long dis2 float(N n clltxn) str9 clltx1drug float(clldotx1 rttxn) str6 rttx1drug float rtdotx1 str10 clldodx float(clldodxfinal cllttft)
    1 "CLL" "BR"        "1/1/2010"   18263 1 2 1 . "BR"        18263 . ""           . "5/5/2007"   17291        81
    1 "CLL" "RCHOP"     "1/23/2011"  18650 1 2 2 2 ""              . . ""           . "5/5/2007"   17291         .
    1 "RT"  "RCHOP"     "12/1/2012"  19328 2 2 1 . ""              . . "RCHOP"  19328 "5/5/2007"   17291         .
    1 "RT"  "SCT"       "5/9/2015"   20217 2 2 2 . ""              . 2 ""           . "5/5/2007"   17291         .
    2 "CLL" "Ibrutinib" "5/5/2015"   20213 1 1 1 1 "Ibrutinib" 20213 . ""           . "12/11/2010" 18607 133.83333
    2 "RT"  "R-CHOP"    "5/5/2015"   20213 2 6 1 . ""              . . "R-CHOP" 20213 "12/11/2010" 18607         .
    2 "RT"  "BR"        "8/23/2015"  20323 2 6 2 . ""              . . ""           . "12/11/2010" 18607         .
    2 "RT"  "Ibri-CHOP" "5/15/2015"  20223 2 6 3 . ""              . . ""           . "12/11/2010" 18607         .
    2 "RT"  "SCT"       "3/5/2017"   20883 2 6 4 . ""              . . ""           . "12/11/2010" 18607         .
    2 "RT"  "CART"      "4/4/2009"   17991 2 6 5 . ""              . . ""           . "12/11/2010" 18607         .
    2 "RT"  "CART-allo" "12/23/2007" 17523 2 6 6 . ""              . 6 ""           . "12/11/2010" 18607         .
    end
    format %td linedate2
    format %td clldodxfinal
    label values dis2 dis2
    label def dis2 1 "CLL", modify
    label def dis2 2 "RT", modify
    What I tried:

    gen linedate2 = date(linedate, "MD20Y")
    format linedate2 %td
    encode dis, gen(dis2)
    bysort id dis2: generate N = _N
    bysort id dis2: generate n = _n


    gen clltxn = n if (dis2 == 1) & (n == N)
    gen clltx1drug = drug if (dis2 == 1) & (n == 1)
    gen clldotx1 = linedate2 if (dis2 == 1) & (n == 1)

    gen rttxn = n if (dis2 == 2) & (n == N)
    gen rttx1drug = drug if (dis2 == 2) & (n == 1)
    gen rtdotx1 = linedate2 if (dis2 == 2) & (n == 1)

    gen clldodx = "5/5/2007" if id == 1
    replace clldodx = "12/11/2010" if id == 2
    gen clldodxfinal = date(clldodx, "MD20Y")
    format clldodxfinal %td

    gen cllttft = (clldotx1 - clldodxfinal)/12 if !missing(clldotx1)

    collapse clltxn clltx1drug rttxn rttx1drug rtdotx1 clldodx clldodxfinal cllttft, by(id)

  • #2
    I need keep only one row per id including id and clltxn through cllttft
    I'm not sure I understand what you want. But maybe it's this:

    Code:
    collapse (firstnm) clltxn-cllttft, by(id)
    This is possible because, at least in your example, all of the variables clltxn-clltfft have only a single non-missing value among all the observations (rows) with the same id.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I'm not sure I understand what you want. But maybe it's this:

      Code:
      collapse (firstnm) clltxn-cllttft, by(id)
      This is possible because, at least in your example, all of the variables clltxn-clltfft have only a single non-missing value among all the observations (rows) with the same id.
      yes. Thats exactly what I wanted. Not sure why collapse code did not want to run. Your code did it. Thanks. I looked up options for collapse and noted the firstnm and max. I did not know you could put in range of variables in collapse. Makes the code small and neat.

      Comment

      Working...
      X