annual incidence rates using survival data

Jessica Harding

Join Date: Sep 2014

Posts: 22
#1

annual incidence rates using survival data

12 Mar 2018, 09:07

Hi stata forum,

I have a wide dataset. For each individual (all of whom have end-stage renal disease), I have hospitalization data available (event = yes/no, date of event).

I am interested in the rate of hospitalizations for amputations in this population over time (2000-2014), i.e. I want to compare incidence rates across years, adjusted/standardized for age.

I want to set the data as survival data.

stset dox, fail(amputation==1) origin(born) entry(first_se) scale(365.25) id(id)

where ‘dox’ (date of exit) is either end of follow-up time (31dec2014), date of first amputation, or date of death, whichever occurred first.

And ‘entry’ is the date they first entered my registry database (of people with known end-stage renal disease).

I am not sure what is the best approach moving forward to get incidence rates, per year, standardized by age to a reference population (using direct standardization method).

I want my end result to be incidence rates with 95%CI, reported per year. I don’t want to report Hazard ratios, or incidence rate ratios, or coefficients.

I am a bit lost as to which stata command to use as stplit, poisson, sptime all seem to not be able to do exactly what I want.

Any advice would be much appreciated.
Jess
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

12 Mar 2018, 09:33

The command you want is -dstdize-, but it does not use data that has been -stset-, and you may have to do some data management to get your data into the form you need for this command. See -help dstdie- and look at the worked examples in the PDF documentation.
Comment
Jessica Harding

Join Date: Sep 2014

Posts: 22
#3

12 Mar 2018, 14:24

Hi clyde,

Thanks for your quick response. Is it possible to do something like this (instead of dstdize)?

stset dox1, fail(amputation==1) origin(born) entry(first_se) scale(365.25) id(usrds_id)
gen _y = _t - _t0
stsplit _year, after(time=d(1/1/2000)) at(0(1)14) trim
replace _year=2000 + _year
poisson _d i._year _t, exp(_y)
margins _year, predict(ir)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

12 Mar 2018, 19:34

Yes, that will give you regression-adjusted incidence rates, which I think are probably the most commonly used approach, and the approach I generally prefer and use in my own work.

But you said in #1

I am not sure what is the best approach moving forward to get incidence rates, per year, standardized by age to a reference population (using direct standardization method).

and these results will not be that (unless the reference population age distribution is precisely the age distribution in your sample data).
Comment

Jessica Harding

Join Date: Sep 2014
Posts: 22

13 Mar 2018, 11:41

my apologies clyde. age-adjusted incidence rates is ok (rather than direct standardization).
I am now having some difficulty getting the margins command to work.

I have used the following commands.

stset dox1, fail(lea1==1) origin(born) entry (entry) scale(365.25) id(usrds_id)
stsplit _year, after(time=d(1/1/2000)) at(0(1)14) trim
replace _year=2000 + _year
gen _y=_t-_t0
poisson lea1 i._year _t0, exp(_y)

What I want:
Incidence rates of amputations, per 1,000 person-years, per calendar year, stratified by diabetes status(dm), with 95%CIs
Ideally, I want a table that has the following data in it:

	Diabetes			No diabetes
Year	N (events)	Person-years (PY)	Rate (95%CI) (per 1,000 PY)	N (events)	Person-years (PY)	Rate (95%CI) (per 1,000 PY)
2000
2001
2002

How do i achieve this using the poisson and margin codes?

Two additional questions:
1. Do i estimate _y AFTER i split the data? or before?
2. My dataset had approx. 2 million observations...which then increased to 10 million after i aplit by year. This then made themargin code very slow to run. Any solutions for this?

Many thanks in advance
Jess

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#6

13 Mar 2018, 13:07

There is no single command that will create a table like that in one step. But

Code:

margins year, predict(n)

will get you the N columns

Code:

margins year, predict(ir)

will get you the incidence rates.

Person-years will not come out of margins. You'll just have to add those up. Something like:

Code:

collapse (count) person_id, by(year)

As for disaggregating by diabetes, the simplest way, if you are not going to be testing contrasts between diabetics and non-diabetics, is to just do two separate regressions on different samples. Alternatively, you can do a single regression on the whole sample, with a diabetes # year interaction term:

Code:

poisson lea1 i._year##i.diabetes _t0, exp(_y) margins year#diabetes, predict(n) margins year#diabetes, predict(ir) collapse (count) person_id by(diabetes year)

Calculating _y after splitting the data makes sense.

I don't know any reasonable way to speed up -margins-. It's a computationally intensive process. You can save a little time by, in effect, writing your own -margins- program, but that's a real pain and the time you spend coding it and debugging it will probably outweigh the savings in run time (which will be only modest anyway).
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

15 Mar 2018, 11:16

Withdrawn to check calculation

Last edited by Steve Samuels; 15 Mar 2018, 11:36.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Steve Samuels

Join Date: Mar 2014
Posts: 1786

15 Mar 2018, 12:49

To finish this topic, here is a way of calculating person-years, illustrated with the breast cancer data set of 686 women. Only a fraction of a year is contributed if a woman exits during that year. Person-years are totaled for those women who did and did not receive hormone therapy. This version works for single entry data. I think that I've got it right this time.

Code:

sysuse brcancer, clear
gen ty = rectime/365.25 /* convert time to year scale */
rename censrec event

stset ty, failure(event) id(id)

/* stsplit to one year intervals */
stsplit tyear , every(1)
replace tyear=tyear+1      //start numbering tyear from 1
label var tyear "Whole Year of followup after split"

replace event=0 if event==.
sort id ty

/* calculate person years in each year of followup */

gen pyears = 1 if  ty==tyear
replace pyears = ty -(tyear-1) if tyear>ty

table  hormon, c(sum pyears) row

with result

Code:

-----------------------
hormonal  |
therapy   | sum(pyears)
----------+------------
        0 |    1276.608
        1 |     835.3703
          |
    Total |    2111.978
-----------------------

Last edited by Steve Samuels; 15 Mar 2018, 12:54.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#9

16 Mar 2018, 10:34

The Manual entry for stsplit shows simpler code for getting person years:

Code:

stsplit tyear, every(one) gen pyears= _t-_t0

stptime also calculates person-time and incidence rates, and can estimate SMRs with a reference population and jackknifed standard errors.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement