Question about cumulative incidence plot

Zihan Dong

Join Date: Feb 2021

Posts: 44
#1

Question about cumulative incidence plot

16 Mar 2021, 18:41

Dear Stata specialists,

I have some questions of how to draw cumulative incidence plot using Stata.

What I have is typical survival data:
outcome: development of CVD (0/1)

exposure: RA (0/1)

censoring: death during follow-up, emigration during follow-up or end of study (0/1)

follow-up time in years

I wanna draw a cumulative incidence plot and calculate the cumulative incidence at year 1, 5 and 10. I have seen people use this kind of coding to draw the plot:

stset time, f(status == 1)
stcompet ci = ci, compet1(2)
gen si_ci2 = ci if status == 2
gen aids_ci2 = ci if status == 1
gen aids_s2 = 1 - aids_ci2
label variable aids_s2 "Aids-free survival (corrected)"
label variable aids_s "Aids-free survival (naive)"
label variable si_ci2 "Cum. Inc. of SI (corrected)"
label variable si_ci "Cum. Inc. of SI (naive)"
graph tw line aids_s2 si_ci2 aids_s si_ci time if time <= 13, ///
lcolor(navy maroon ltblue erose)

What I'm confused about is that in my study death during follow-up is the only competing event, and I classified it into censor. Then what should I specify in -stcompet- ? Or this coding does not apply to my data?

Does anyone have any suggestion about this? If possible, I also would like to know how to generate life table along with cumulative incidence plot.

Thank you in advance!

Best,
Z.
Tags: None
Paul Dickman

Join Date: Apr 2014

Posts: 294
#2

17 Mar 2021, 04:25

I suggest you clarify (and understand the assumptions and interpretation of) the quantity you wish to estimate before moving to how to do it in Stata.

Using non-technical language, -stcompet- gives you "real world probabilities" of the outcome in the presence of competing risks. If death is a competing risk, for example, then death due to other causes will prevent CVD. If those with RA are more likely to die of other causes then, all else equal, those with RA will be less likely to develop CVD. This may be what you want if your goal is to know how many individuals will develop CVD so you can allocate resources to their care.

If, however, you censor the survival times of those who die then you are estimating the probability of developing CVD in each group under the hypothetical scenario that nobody dies or emigrates. This would be more relevant if you are interested in studying if there is a direct association between RA and CVD. Be aware that there are assumptions involved in censoring those who die and you should discuss them with a statistician/epidemiologist. I'm guessing this is what you want. To get probabilities of event you can use:

Code:

sts graph, failure by(RA)
Comment
Zihan Dong

Join Date: Feb 2021

Posts: 44
#3

17 Mar 2021, 14:05

Originally posted by Paul Dickman View Post

I suggest you clarify (and understand the assumptions and interpretation of) the quantity you wish to estimate before moving to how to do it in Stata.

Using non-technical language, -stcompet- gives you "real world probabilities" of the outcome in the presence of competing risks. If death is a competing risk, for example, then death due to other causes will prevent CVD. If those with RA are more likely to die of other causes then, all else equal, those with RA will be less likely to develop CVD. This may be what you want if your goal is to know how many individuals will develop CVD so you can allocate resources to their care.

If, however, you censor the survival times of those who die then you are estimating the probability of developing CVD in each group under the hypothetical scenario that nobody dies or emigrates. This would be more relevant if you are interested in studying if there is a direct association between RA and CVD. Be aware that there are assumptions involved in censoring those who die and you should discuss them with a statistician/epidemiologist. I'm guessing this is what you want. To get probabilities of event you can use:

Code:

sts graph, failure by(RA)

Hi Paul,

Thank you very much for your thorough answer. I think according to your explaination, my case is to estimate the "net survival".

One thing that I'd like to confirm is that by giving the coding, do you mean the complement of KM curve is equal to the cumulative incidence plot?

Also another thing that I'd like to ask is that for my main analysis, due to violation of proportional hazard assumption, I was suggested to use flexible parametric model to explore the hazard function over time for each RA group the HR over time. To do that, I was told to use time since followup as a time varying confounder, which makes me very confused. Could you please also explain a little bit about that?

As I'm totally new to flexible parametric model, I'm sorry if the question was too basic... Thank you in advance!
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#4

18 Mar 2021, 05:51

Originally posted by Zihan Dong View Post

One thing that I'd like to confirm is that by giving the coding, do you mean the complement of KM curve is equal to the cumulative incidence plot?

Yes, except I suggest calling it the "net probability of CVD" or "cumulative net probability of CVD". "cumulative incidence" refers to the crude, rather than net, probability.

Originally posted by Zihan Dong View Post

Also another thing that I'd like to ask is that for my main analysis, due to violation of proportional hazard assumption, I was suggested to use flexible parametric model to explore the hazard function over time for each RA group the HR over time. To do that, I was told to use time since followup as a time varying confounder, which makes me very confused. Could you please also explain a little bit about that?

I would assume that what they meant was "model a particular confounder (e.g., age) as a time-varying confounder". If they were referring to RA (an exposure rather than a confounder) then I would call it "allowing for a time-varying effect of RA". Conceptually, you are modelling an interaction between time and RA. You can model such interactions in a Cox model, but it's easier to do with flexible parametric models. Here's a tutorial where I estimate the HR for sex as a function of time. Paul Lambert has some good tutorials on his page and his book is a great reference.

Choice of timescale is not obvious here. Attained age and time since RA diagnosis are both of interest.

As an aside. The terminology net/crude is not used universally. If any of your colleagues question it I suggest referring them to the article on competing risks in the Encylopedia of Biostatistics (Wiley).

The Crude Probability: the probability of death from a specific cause in the presence of all other risks acting on the population. This is also referred to as absolute risk. An example of a crude probability is the answer to the question: What is the chance that a woman will die of breast cancer between ages 40 and 60?

The Net Probability: the probability of death if a specific risk is the only risk acting on a population, or conversely, the probability of death if a specific cause is eliminated from the population. For example, what is the chance of surviving to age 60 if cancer were the only cause of death?
Comment
Zihan Dong

Join Date: Feb 2021

Posts: 44
#5

18 Mar 2021, 10:49

Originally posted by Paul Dickman View Post

Yes, except I suggest calling it the "net probability of CVD" or "cumulative net probability of CVD". "cumulative incidence" refers to the crude, rather than net, probability.

I would assume that what they meant was "model a particular confounder (e.g., age) as a time-varying confounder". If they were referring to RA (an exposure rather than a confounder) then I would call it "allowing for a time-varying effect of RA". Conceptually, you are modelling an interaction between time and RA. You can model such interactions in a Cox model, but it's easier to do with flexible parametric models. Here's a tutorial where I estimate the HR for sex as a function of time. Paul Lambert has some good tutorials on his page and his book is a great reference.

Choice of timescale is not obvious here. Attained age and time since RA diagnosis are both of interest.

As an aside. The terminology net/crude is not used universally. If any of your colleagues question it I suggest referring them to the article on competing risks in the Encylopedia of Biostatistics (Wiley).

Thank you very much!

It makes more sense to say "treating confounders/exposure as time-varying variables".

Your blog and the books you recommended are very helpful as well. It's quite refreshing to know about flexible parametric model at the time Cox regression model is domiant and to have a clear distinguish between net and crude probability. Definitely will learn more them!
Comment

Announcement

Question about cumulative incidence plot

Comment

Comment

Comment

Comment