Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating start and stop time-variables for continuous time dependent exposure and running a Cox model

    Hi All

    I hope you can help me with this query. I have attached a simplified dataset with all the variables needed (entrydate and exitdate are the start and end of the followup period and time is time at risk). Three individuals had outcome and three didn't. Exposure is a time varying covariate (repeated measure) and I want to create sequential start and stop time-variables corresponding to the exposures for each individual before running a Cox model. However, I am completely stuck with that and need help with coding to create these additional variables and run the Cox model afterwards. I have searched the forum and other sites, but I didn't find what I needed. The data looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id float(entrydate exitdate exposure exposuredate time outcome outcomedate)
    1 20490 21185 51 20643 695 0     .
    1 20490 21185 53 20951 695 0     .
    2 20402 21185 50 20426 783 0     .
    2 20402 21185 52 20646 783 0     .
    2 20402 21185 47 20667 783 0     .
    3 20573 21185 47 20612 612 0     .
    3 20573 21185 48 20955 612 0     .
    3 20573 21185 47 21062 612 0     .
    4 20430 21111 50 20579 681 1 21111
    4 20430 21111 49 20726 681 1 21111
    4 20430 21111 47 20950 681 1 21111
    5 19640 20473 42 19950 833 1 20473
    5 19640 20473 39 20048 833 1 20473
    5 19640 20473 43 20431 833 1 20473
    6 18441 18483 79 18445  42 1 18483
    6 18441 18483 78 18445  42 1 18483
    end
    format %tdDD/NN/CCYY entrydate
    format %tdDD/NN/CCYY exitdate
    format %tdDD/NN/CCYY exposuredate
    format %tdDD/NN/CCYY outcomedate

    Please help
    Thanks

  • #2
    Hi again

    I realised I would need a baseline exposure reading to be carried forward as a starting point before the first repeated exposure. I have added this baseline measure (exposure_0) taken on the entrydate in the new dataset below. So the question remains as above but with the added baseline exposure (perhaps an extra row per individual will be needed to incorporate this when creating start, stop variables?)

    New data looks like this:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id float(exposure_0 entrydate exitdate exposure exposuredate time outcome outcomedate)
    1 43 20490 21185 51 20643 695 0     .
    1 43 20490 21185 53 20951 695 0     .
    2 66 20402 21185 50 20426 783 0     .
    2 66 20402 21185 52 20646 783 0     .
    2 66 20402 21185 47 20667 783 0     .
    3 77 20573 21185 47 20612 612 0     .
    3 77 20573 21185 48 20955 612 0     .
    3 77 20573 21185 47 21062 612 0     .
    4 60 20430 21111 50 20579 681 1 21111
    4 60 20430 21111 49 20726 681 1 21111
    4 60 20430 21111 47 20950 681 1 21111
    5 56 19640 20473 42 19950 833 1 20473
    5 56 19640 20473 39 20048 833 1 20473
    5 56 19640 20473 43 20431 833 1 20473
    6 45 18441 18483 79 18445  42 1 18483
    6 45 18441 18483 78 18445  42 1 18483
    end
    format %tdDD/NN/CCYY entrydate
    format %tdDD/NN/CCYY exitdate
    format %tdDD/NN/CCYY exposuredate
    format %tdDD/NN/CCYY outcomedate

    Am grateful for any help.
    Last edited by David Allen; 09 Nov 2019, 16:44.

    Comment


    • #3
      So this is a complicated organization of this data that does not lend itself readily to use in Stata's survival analysis programs. It requires some major revisions.

      First, let me call your attention to an error in your data, which the code I show does not attempt to fix. For id 6, there are two observations with exposuredate = 18445, and, on top of that, the exposure values are different. If this person actually had exposure measured twice on the same date, you need to combine the results in some way. It is more likely, I guess, that one of the two exposure dates is an error, however.

      You have multiple events for each person. So the layout that Stata needs is one observation for each time interval during which everything remains constant, including a separate observation for entering and exiting the study. In this layout, the outcome should be designated as occurring only in the final observation. For those who never experienced an outcome, the variable should be coded 0 for all of his/her observations. I think the code below does what you want:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte id float(exposure_0 entrydate exitdate exposure exposuredate time outcome outcomedate)
      1 43 20490 21185 51 20643 695 0     .
      1 43 20490 21185 53 20951 695 0     .
      2 66 20402 21185 50 20426 783 0     .
      2 66 20402 21185 52 20646 783 0     .
      2 66 20402 21185 47 20667 783 0     .
      3 77 20573 21185 47 20612 612 0     .
      3 77 20573 21185 48 20955 612 0     .
      3 77 20573 21185 47 21062 612 0     .
      4 60 20430 21111 50 20579 681 1 21111
      4 60 20430 21111 49 20726 681 1 21111
      4 60 20430 21111 47 20950 681 1 21111
      5 56 19640 20473 42 19950 833 1 20473
      5 56 19640 20473 39 20048 833 1 20473
      5 56 19640 20473 43 20431 833 1 20473
      6 45 18441 18483 79 18445  42 1 18483
      6 45 18441 18483 78 18445  42 1 18483
      end
      format %tdDD/NN/CCYY entrydate
      format %tdDD/NN/CCYY exitdate
      format %tdDD/NN/CCYY exposuredate
      format %tdDD/NN/CCYY outcomedate
      
      assert outcomedate == exitdate if outcome
      assert missing(outcomedate) if !outcome
      drop outcomedate
      
      
      by id (exposuredate), sort: gen expander = cond(_n == 1 | _n == _N, 2, 1)
      expand expander
      drop expander
      by id (exposuredate), sort: replace exposure = exposure_0 if _n == 1
      gen date = exposuredate
      format date %tdDD/NN/CCYY
      by id (exposuredate): replace date = entrydate if _n == 1
      by id (exposuredate): replace date = exitdate if _n == _N
      by id (exposuredate): replace outcome = 0 if _n < _N
      by id (exposuredate): gen elapsed_days = date - date[1]
      
      stset elapsed_days, id(id) fail(outcome==1)

      Comment


      • #4
        Thank you so much Clyde. This definitely seem to do the trick, and the start, stop times follow on as they should (I need to go through each line carefully to make sure that I understand what it does!). I am assuming that after stset, the stcox specification doesn't change at all. i.e: stcox independent_vars (age gender deprivation etc...)

        Best regards
        David

        Comment


        • #5
          Yes, just use -stcox- in the usual way.

          Comment

          Working...
          X