Cox regression with adjustment for time varying covariates

pierre martin

Join Date: Nov 2017

Posts: 63
#1

Cox regression with adjustment for time varying covariates

24 Feb 2021, 12:05

Hi all,

I am trying to figure out how to implement a Cox survival analysis with time varying covariates. I am studying the association between a binary covariate X and death (Y). My dataset is with 1 line for each patient and I have in column my binary covariate X, the time to death (0, 6, 12, 18, 24 months), the status death or not and several other covariates that did not vary with time.

But I would like to adjust for a new binary variable yes no but this variable can take different values at each time point. It can be yes 1 at baseline then 0 at 6 months then again 1 at 18 months. The time at which it varies is the same that the time to death which is the visit actually.

I am not sure how to do that, especially how my dataset should look like long wide ? code : tvc?

Someone to help me?

Many many thanks!

PM
Tags: None

pierre martin

Join Date: Nov 2017
Posts: 63

24 Feb 2021, 12:47

I just wanted to clarify that this is the dataset I am working on.

Patients are followed every 6 months, I am studying the association of a specific pathology at baseline on death (that is recorded every 6 months at each clinical visit). I have adjustment covariates that did not change over time like gender, race etc. and another one which is a specific drug and patients can take it at all times or just at baseline or only start at 1 year etc.

Without considering drug use, my classic cox model was after
stset timetoevent, failture(event)

stcox pathology age gender race

Now that I want to take into account drug use that varies over time, I don't really know how to prepare my dataset and how to write my code. tvc? stsplit

Thank you so much,
Hope this is clearer.

Best
PM

patient id	time to event	event	pathology	age	gender	race	drug at baseline	drug at 6 months	drug at 12 months	drug at 18 months
1	0	0	1	25	male	white	1	0	1	1
2	12	1	0	36	female	white	0	0	1	1
3	12	0	0	48	female	black	0	0	0	0
4	18	0	0	41	female	black	1	1	1	1
5	36	1	0	25	male	white	0	1	1	0
6	12	1	1	66	male	white	1	1	1	0

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#3

25 Feb 2021, 00:27

Pierre:
usually, the -tvc()-option is used to estimate the changing effect over time of the same variable (usually, a continous one): an example can be the decay in plasmatic concentration of a given drug after its assumption (that follows a known theoretical function, e.g., log) (see -stcox- entry in Stata .pdf manual for more details.
In your case, it seems that patients are administered the same or different therapies at different point in time. Hence, I would consider a categorical predictor for each drug/time combination.

Kind regards,
Carlo
(Stata 19.0)
Comment
pierre martin

Join Date: Nov 2017

Posts: 63
#4

25 Feb 2021, 01:22

Hi Carlo,

Thank you so much for your answer! Yes, patients are administered or not a specific drug (the same) at each clinical visit. Some of them will receive it every 6 months while others just at baseline and year 1 for example. Yes, I read about tvc but I do not know how to properly consider my discrete variable that varies yes no over time.

Maybe I could just work on a wide dataset instead of long with several lines (for each clinical visit) per patient and a column for this specific drug with 0 or 1 depending on the line (visit) and then the other variables such as age, gender etc. would be replicated within the same patient. Maybe running a cox model with something like id(patient) would work?

Thank you again +++
Best,
Pierre
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#5

25 Feb 2021, 01:38

Pierre:
I would forget the -tvc(9- option altogether and follow the dataset layout proposed in use https://www.stata-press.com/data/r16/drugtr.

Kind regards,
Carlo
(Stata 19.0)
Comment
pierre martin

Join Date: Nov 2017

Posts: 63
#6

25 Feb 2021, 03:17

Thank you Carlo but it looks like the link does not work (page not found). Could you please send it to me again?

Thanks a lot!

Pierre
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#7

25 Feb 2021, 03:31

The link in #5 doesn't work for me.

I find your example hard to follow. It appears ID=1 is censored at time 0 (timetoevent=0 and event=0) yet took the drug at 12 and 18 months. ID=2 was dead at 12 months but took the drug at 18 months.

The way to structure such data is to have one observation for each 6 month period. That is, potentially 3 observations for each individual.

You have discrete time. That is, if I understand correctly, you will know if a patient has dies in a 6-month interval but not the exact time of death within that interval. The Cox model assumes continuous time so may not be the most suitable analytic approach.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#8

25 Feb 2021, 03:47

Sorry for the previous broken link.
The following one works for me:

Code:

use http://www.stata-press.com/data/r16/drugtr.dta

Kind regards,
Carlo
(Stata 19.0)
Comment
pierre martin

Join Date: Nov 2017

Posts: 63
#9

25 Feb 2021, 03:49

Hi Paul,

It was just to give you an example of the architecture of my data and actually my event is not death but incident stroke! Don't worry about this!

Yes I think the only way is to have my dataset in wide with several observations per patient, one for each clinical visit (every 6 months). Yes, the time is discrete. Incident stroke is assessed every clinical visit (every 6 months) as well as drug consumption which is the only variable that I will consider time varying.

I found something in the STATA manual for cox regression with discrete time varying covariates that might be helpful and looks like this

stset t1, failure(died) id(id)

I think I just have to specify id(patient) if I work in a wide dataset with several lines per patient.

What do you think? The link does not work for me either.

Thank you!
Best,
Pierre
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#10

25 Feb 2021, 04:19

Yes, that's the general approach.

This is called long, not wide.

I suggest you talk to someone familiar with survival analysis to discuss the implications of considering a continuous time model for discrete data. My suggestion would be to use a discrete-time model (e.g., complementary log-log). If you google "Stata discrete time survival models" you will no doubt get help on not only the models but how to set up the data.
1 like
Comment
pierre martin

Join Date: Nov 2017

Posts: 63
#11

25 Feb 2021, 05:18

Thank you so much, Paul. I really appreciate your time.

I will do that!

Best,
Pierre
Comment
pierre martin

Join Date: Nov 2017

Posts: 63
#12

25 Feb 2021, 09:22

Thank you, Carlo for the link too!
Thank you all!
Best,

Pierre
Comment

pierre martin

Join Date: Nov 2017
Posts: 63

#13

25 Feb 2021, 11:30

Hi everyone,

So I have worked on this again and now my data look like this. Patients have a clinical visit every 6 months.We study the association of a pathology X baseline on incident stroke. At each clinical visit, a specific treatment is recorded (yes the patient take it at this visit or no). We have age and we our outcome is incident stroke that can occur at any visit. So for example patient 2 has a stroke at the visit at 18 months. He/she does not have the pathology X at baseline and he/she has does not have a specific treatment at baseline or month 6 but he/she starts taking it a month 12 and keeps it at month 18.

I want to study the association between the pathology X at baseline and the risk of incident stroke, after adjustment for confounders that does not vary like age and others and that vary like treatment.

I am having a really hard time with stset to prepare the data for the cox regression.

How would you do this? I understand that at some point I have to use id(patient) but there is my time to event which is clear but there is also another "time" which is the month of the clinical visit where the treatment (that varies is assessed) and actually the presence of stroke that is also assessed at each visit.

What do you think?

Thank you ever so much for your help!

Pierre

patient id	month of the visit	pathology X at baseline	treatment	age	time to incident stroke	incident stroke
1	0	1	1	52	12	0
1	6	1	0	52	12	0
1	12	1	0	52	12	0
1	18	1	1	52	12	0
2	0	0	0	48	18	1
2	6	0	0	48	18	1
2	12	0	1	48	18	1
2	18	0	1	48	18	1

Comment

Paul Dickman

Join Date: Apr 2014

Posts: 294
#14

25 Feb 2021, 23:33

You need to have an enter and exit time for each observation (i.e., each segment of time). The outcome variable should be specific to that segment of time (0 if no stroke during that segment and 1 if stroke). Similarly, the treatment variable will be specific to the observation. The records for a patient who had a stroke at 18 months will be something like this. Note that the outcome variable is 0 (no stroke) at the end of the first two intervals since there was no stroke during these intervals.

Code:

id enter exit stroke treat 1 0 6 0 1 1 6 12 0 0 1 12 18 1 0

Typically, the first segment of time would be 0-6 months. The treatment for this observation will be the treatment to which the patient was exposed during these 6 months. For example, the treatment at baseline might be assumed to apply to the period 0-6 months and the treatment allocated at the 6-month visit applies to the period 6-12 months. How you set this up for your data depends on the clinical considerations (e.g., how the drug acts) but from a statistical perspective you should think of segments of time and the treatment that applies to each segment of time. In the above example, the patient only received the treatment at baseline. Time-invariant variables (e.g., pathology at baseline) will be constant for all observations within a patient. There are plenty of examples online of how to do this.
1 like
Comment
pierre martin

Join Date: Nov 2017

Posts: 63
#15

27 Feb 2021, 07:28

Thank you so much, Paul. This is incredibly helpful and I am preparing my data this way.

Just a last question, for patients who were lost to follow-up for example at month 12 and who did not develop the outcome, then they will have missing values? How to make the difference between a subject who do not develop the outcome until month 18 for example but has a follow-up from a subject who is lost to follow-up at month 6 and did not develop the outcome either. Because I have to have segments of time for my total follow up! Sometimes the status outcome is unknown but the status of my time varying variable is known.

So I guess, missing values for the outcome variable after the last visit we know their status?

Sorry, it must be a basic question but it i would like to be sure I am doing the right way.

Best,
Pierre
Comment

Announcement