Cumulative incidence estimation in the presence of competing risks

Vittorio Fasulo

Join Date: Jan 2020

Posts: 10
#1

Cumulative incidence estimation in the presence of competing risks

10 Apr 2020, 17:53

Good evening,

In the last two days I'm dealing with "Cumulative incidence estimation in the presence of competing risks", but I'm having a hard time figuring it out.

I found some codes in STATA but still, I have some problems with the interpretation of the data.

-stset timecompetingAGVHD, f( statuscompetingAGVHD==1) scale(1)
- stcompet CIF1 = ci , compet1(2) compet2(3)

statuscompetingAGVHD AGVHD status competing
---------------------------------------------------------------------------------------------------------------------------------------------------

type: numeric (byte)
label: statuscompetingAGVHD

range: [1,3] units: 1
unique values: 3 missing .: 0/99

tabulation: Freq. Numeric Label
22 1 AGVHD
7 2 Death
70 3 None

It's a population of children underwent a transplant. I need to calculate the incidence of some complication but we have to deal with patients who died before the event could even happen.

- So if I do the straight % is not correct data. In theory, I should exclude (censored?) those who died before the end of the follow-up time, is that correct? Or do I have to exclude those who did not have any event?

- If I did something correct with the codes, at the end, I still do not understand how to interpret them, because STATA gives me a value of cumulative incidence per single patient, but I want the overall cumulative incidence or by group.

Thanks in advance for your help.

Vittorio
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#2

10 Apr 2020, 20:01

So the graft vs host disease is your outcome of interest. Death is a competing event: if a person dies, you cannot observe GVH disease that would have occurred later had they survived. But "None" is not a competing event: the occurrence of nothing does not in any way modify their chance of getting GVH, nor does it prevent you from observing it. So AGVHD is the primary failure, Death is the competing event, and None is not an event at all. The people with no event are simply censored at the end of their follow-up period. You should not omit anybody from the analysis unless they do not meet the appropriate inclusion criteria in your study protocol, or have an exclusion criterion in your protocol. It would be a particularly bad mistake to exclude people because of the occurrence of death or the non-occurrence of any event!

The code for your -stcompet- command should be

Code:

stcompet CIF1 = ci , compet1(2)
1 like
Comment
Vittorio Fasulo

Join Date: Jan 2020

Posts: 10
#3

14 Apr 2020, 12:40

Thanks for your answer and your very clear explanation. My problem now is after that I generated a variable CIF1, I can not interpret the result, because if I do a straight % should be 22/99 (22.1%) but then when tabulating CIF1 the max cumulative Incidence become 57%. Am I interpreting correctly the max cumulative incidence?

Also if I try to do by group it gives me an error:

stcompet CIF3 = ci , compet1(2) by( sex )
command levels7 is unrecognized

Thanks

Vittorio

Last edited by Vittorio Fasulo; 14 Apr 2020, 12:51.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#4

14 Apr 2020, 17:58

The straight percentage of those who experience the outcome associated with CF1 is almost unrelated to the maximum cumulative incidence. This discrepancy should not worry you. On the contrary, it would be a sure sign that something is wrong if they were the same.

The CIF is modeling an incidence rate that is not directly observed in the data precisely because cases of GVH that would have occurred otherwise, are never observed because death "gets in the way." So what the CIF is telling you is what the incidence of CIF would be if nobody died. Understandably it is higher than the actual observed probability of GVH.

I don't know why you are having that problem when you add -by(sex)- to your command. I have never seen that in my use of -stcompet-, and a review of the source code does not have any place where it attempts to use a command -levels7-. If you use the -dataex- command to post an example of your data which reproduces this problem, I'll attempt to troubleshoot it.
1 like
Comment

Vittorio Fasulo

Join Date: Jan 2020
Posts: 10

14 Apr 2020, 19:47

Hi thanks for you help.
1) I tried to skip the issue of "by" using "if" in a way to have two cumulative incidences in the two groups. Do you think it is the same of by or it is a mistake?
2) I used this code to create the curve but it seems that the groups are switched regarding the cumulative incidence compared to the value that I have when I tabulate them.

In any case here you can find -dataex-

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(GeneKnown statuscompetingAGVHD) int timecompetingAGVHD
1 3 1445
1 3 2108
0 4   50
1 3 2104
1 4  175
1 1    8
0 3 1850
1 3  500
1 3 1847
1 3 2088
1 3 2117
1 3 1865
1 3 1908
1 3  956
1 3 1637
1 1   63
0 1   22
0 2   82
1 3 1807
1 3 1910
1 1   32
1 3  451
0 3 2099
1 3 1280
1 3 1931
1 3 2125
1 3 2148
1 1 1212
1 4  111
0 3 1934
1 3 2058
1 4  693
0 1 2351
1 1   70
0 3 2010
0 4   57
1 1  150
1 3 1956
1 3 1958
1 1   14
1 3  816
1 2   15
1 3 1963
1 3 1739
1 3  521
1 3 1977
1 1   20
1 1   21
0 4   42
1 3 1448
1 3  994
1 4   78
1 3 2034
1 3 1524
1 3 1585
1 3 2021
1 3 1705
1 3 1877
1 3 2161
1 3 2075
1 2  198
0 3 2149
1 3 1481
1 1   34
1 3 1878
1 1   16
1 3 1996
1 3 2115
0 4  361
1 1   28
1 1   22
1 3  486
1 1  140
1 3 1898
1 3 1864
0 1   19
1 3 1305
1 3 1844
1 1   13
0 3 1987
1 3 2162
1 3  737
1 3 2116
1 3 1997
1 3 1973
1 3 1956
0 3 1580
1 3 1518
1 3 2503
1 3 1826
1 3 1867
1 3 1889
1 1   11
1 1   19
1 3 1537
1 1   56
1 1   11
0 2   65
1 4  139
end
label values GeneKnown GeneKnown
label def GeneKnown 0 "Unknown", modify
label def GeneKnown 1 "Known", modify

Thanks a lot!

Vittorio

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#6

15 Apr 2020, 15:47

Well, the example data you show doesn't have a sex variable, but if I use GeneKnown for the -by()- option, it runs just fine on my setup.

I can't emphasize enough that comparing your results to tabulations of outcome frequencies is completely inappropriate. There is no reason whatsoever to expect them to be similar, or even to be in the same order. If, for example, one of the outcomes occurs infrequently, but most of those occurrences are early, it will have a high CIF even though it is perhaps much less common than another outcome. Stop doing the tabulations!

I see nothing wrong with the outputs that I get from applying

Code:

stset timecompetingAGVHD, f( statuscompetingAGVHD==1) scale(1) stcompet CIF1 = ci , compet1(2) by(GeneKnown)

to your example data. They look just fine to me.

By the way, one important thing: you should not include a compet2(3) option in your stcompet command. status 3 corresponds to no event happening: this is not a competing risk. It is just censorship at the end of observation.
Comment
Vittorio Fasulo

Join Date: Jan 2020

Posts: 10
#7

15 Apr 2020, 19:39

Thank you very much.
Sorry at the end, I gave you a different variable, besides sex, because was more important to me.
I was doing some research and the error that I had "command levels7 is unrecognized" apparently was because I downloaded an old version of the extension "stcompet" now -by()- works.

Now my question is how to use the variable CIF1 that I created? Which one is the cumulative incidence that I should refer to?
Second, if I have to graph it what is the code that I have to use?

Thanks a lot, and sorry the competing risk, to me, is a difficult argument and is the first time that I'm dealing with it!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#8

16 Apr 2020, 10:50

Now my question is how to use the variable CIF1 that I created? Which one is the cumulative incidence that I should refer to?
Second, if I have to graph it what is the code that I have to use?

So consider what you have after you run

Code:

stset timecompetingAGVHD, f( statuscompetingAGVHD==1) scale(1) stcompet CIF1 = ci , compet1(2) by(GeneKnown)

The variable CIF1, perhaps somewhat confusingly, now contains four cumulative incidence functions. There is one cumulative incidence function for each combination of statuscompetingAGVHD and GeneKnown. These functions may be scattered around the data set depending on how your data are sorted. But this is simple to work with anyway. The key is that those observations for which statustcompetingAGVHD = 1 and GeneKnown = Unknown contain the cumulative incidence function for AGVHD among those with the gene unknown. Those observations for which statuscompetingAGVHD = 2 and GeneKnown = Unknown contain the cumulative incidence function for Death among those with the gene unknown. Similar reasoning applies to the combinations of statuscompeting and GeneKnown = Known. A cumulative incidence function is a function of time: it is the (predicted) cumulativfe incidence that one would observe of the given event if there were no competing events at that time. So you could do something like this to see the numbers:

Code:

by statuscompetingAGVHD GeneKnown (timecompetingAGVHD), sort: list timecompetingAGVHD CIF1 if inlist(statuscompetingAGVHD, 1, 2), noobs

or you could graph them as follows:

Code:

graph twoway line CIF1 timecompetingAGVHD if inlist(statuscompetingAGVHD, 1, 2), sort by(statuscompetingAGVHD GeneKnown)

These graphs would be somewhat primitive in appearance, and you might well want to tailor their appearance to your taste by using some of the many options available in -graph twoway-.
Comment
Vittorio Fasulo

Join Date: Jan 2020

Posts: 10
#9

16 Apr 2020, 14:26

Thanks one more time for your complete clarification.
Comment

Announcement

Cumulative incidence estimation in the presence of competing risks

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment