Month-specific incidence rates in STATA?

Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 140
#1

Month-specific incidence rates in STATA?

29 Mar 2024, 08:45

Dear all,
thanks in advance.

I have a dataset of 900 patients followed for one year, organized into two departments (oncology, cardiology).
Among these patients, I have information about their age, sex, admission_date, length_of_stay, discharge_date and mortality outcome at discharge (variable "outcome", which can have a value of 1 for deceased or 0 for alive).

Is there a way in STATA to calculate month-specific incidence rates? For example: the number of deaths in March divided by person-days in March.

I'm unsure if there's a way to account for the fact that some patients have admission days spread across multiple months.

Some time ago, if I remember correctly, I had used stpstime, but I was in a slightly different situation, and this command doesn't seem appropriate now.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id age sex admission_date length_of_stay discharge_date outcome) str10 department 1 54.61835 0 23364 17.857428 23381.86 1 "cardiology" 2 45.71597 0 23331 14.076464 23345.076 0 "oncology" 3 58.9018 0 23027 7.090492 23034.09 0 "oncology" 4 58.87616 0 23184 17.30037 23201.3 0 "oncology" 5 63.64021 0 23240 21.30667 23261.307 1 "cardiology" 6 57.74791 0 23065 25.892675 23090.89 0 "cardiology" 7 70.58585 1 23278 13.036489 23291.04 1 "oncology" 8 53.71004 1 23060 35.847523 23095.85 0 "cardiology" 9 58.72459 0 23030 22.15348 23052.154 1 "oncology" 10 39.22841 1 23078 19.79576 23097.795 0 "cardiology" end
Tags: epidemiology, incidence, mortality

Clyde Schechter

Join Date: Apr 2014
Posts: 30097

29 Mar 2024, 09:14

Code:

gen start_month = mofd(admission_date)
gen end_month = mofd(discharge_date)
expand end_month-start_month + 1
by id, sort: gen mdate = dofm(start_month + _n - 1)
format *month %tm
format mdate %td

gen admitted_days = min(floor(discharge_date), lastdayofmonth(mdate)) ///
    - max(admission_date, firstdayofmonth(mdate))
by id (mdate), sort: replace outcome = 0 if _n < _N // CAN ONLY DIE IN LAST MONTH
    
collapse (sum) admitted_days outcome, by(department mdate)
gen admitted_months = admitted_days/daysinmonth(mdate)
gen monthly_mortality_rate = outcome/admitted_months

You don't say as much, but I have assumed you want the rates separately by department. If that's not true, just remove mention of department from the -by()- option of the -collapse- command. Similarly, if you want sex-specific mortality rates, add sex to the -by() option of the -collapse- command.

Comment

Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 140
#3

29 Mar 2024, 13:19

Dear Clyde Schechter

Thanks as always for your invaluable help.

Yes, I would be interested, then, in addititon to the raw estimates of the incidence rates, in making a comparison to see if the incidence increases in any of the departments and if it increases in any particular month.
So, I would need to conduct a count regression (Poisson, negative binomial).
Moreover, I'd like to adjust for age or any other variable.
Therefore, I should expand the "collapse" command.

Code:

collapse (sum) admitted_days outcome (mean) age var var var, by(department mdate) xi: poisson outcome i.mdate i.department age...

I have limited data management skills in STATA and am trying to understand your code conceptually.
I have a question: is this script correct even for patients who have admission days spanning more than 2 months? For example, admission in January, discharge in April?

Finally, I would like to ask for your advice, if I'm not bothering you excessively.
To answer the question "does the incidence change between departments and months?", I would need too many estimates (considering that we would also need the interaction of 12 months*department). Do you think it would make sense to settle for raw estimates, and in a multivariate model consider time as a continuous variable (possibly using splines to capture non-linearity)?

Thank you again for your help.

Gianfranco
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#4

29 Mar 2024, 14:12

I have a question: is this script correct even for patients who have admission days spanning more than 2 months? For example, admission in January, discharge in April?

Yes, it will work no matter how long or short the stay, even extending into years.

Code:

collapse (sum) admitted_days outcome (mean) age var var var, by(department mdate) xi: poisson outcome i.mdate i.department age...

I don't think I would do it this way. Using mean age within a department and month is really not the best approach here. That's for (at least) two reasons: mortality is certainly non-linear in age, and the -poisson- model adds another layer of non-linearity into the mix. I would be inclined not to -collapse- the data and instead use the individual observations in the analysis. I would either use a spline to capture the non-linearity of the relationship to age, or I might make a categorical variable out of age, using a large enough number of narrow bins that mortality rates can be expected to be roughly constant over those age ranges. Whether this last approach is practical depends on whether your data set is large enough to support analysis with a discrete variable having a large number of levels. I would use the admitted_days variable as the -exposure()- option in the poisson regression. And since we have created repeated monthly observations for each person in the study, this becomes a multi-level model. So something like this:

Code:

mepoisson outcome i.mdate i.age_categories i.sex etc, exposure(admitted_days) || id: OR mepoisson outcome i.mdate age_spline_variables i.sex etc, exposure(admitted_days) || id:

I also question the idea of doing monthly estimates even in this way. Are you looking for seasonal effects that recur annually? If so, the variable you want is not mdate, but rather a different variable: -gen month_of_year = month(mdate)-. If you are not looking for seasonality, then generating a separate effect for each month is going to produce an enormous amount of output that will be difficult to interpret sensibly: there will be noisy fluctuations unless your data set is gargantuan, and no clear picture of what is going on. You are probably better off treating mdate as a continuous variable and representing its effects with a spline, or some other method of representing non-linearity, as well.

Note, by the way, that -xi:- is pretty much obsolete. Do learn factor-variable notation by reading -help fvvarlist- and use it instead. (There are a few old commands that do not support factor-variable notation and would require the use of -xi:-, but most of those have more modern equivalents that do support factor-variable notation. Really the core of the remaining use for -xi:- is with older user-written commands that have not been updated. So don't banish -xi- from your brain altogether, but consign it to a dusty corner somewhere.)
Comment
Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 140
#5

29 Mar 2024, 15:28

Thanks, Clyde Schechter , for the stimulating discussion.

Using mean age within a department and month is really not the best approach here. That's for (at least) two reasons: mortality is certainly non-linear in age, and the -poisson- model adds another layer of non-linearity into the mix

I completely agree.

Are you looking for seasonal effects that recur annually?

No, I'm seeking specific differences between THOSE months, or, continuously, throughout THAT year.

You are probably better off treating mdate as a continuous variable and representing its effects with a spline, or some other method of representing non-linearity, as well.

Thank you. It strikes me as odd to use a spline on mdate, which has 13 categories, but it may just be my limited experience with splines. Nevertheless, I appreciate and will follow your suggestion.

So, in the end, my code will be, without collapsing data:

Code:

gen start_month = mofd(admission_date) gen end_month = mofd(discharge_date) expand end_month-start_month + 1 by id, sort: gen mdate = dofm(start_month + _n - 1) format *month %tm format mdate %td gen admitted_days = min(floor(discharge_date), lastdayofmonth(mdate)) /// - max(admission_date, firstdayofmonth(mdate)) by id (mdate), sort: replace outcome = 0 if _n < _N

and then the model

Code:

mepoisson outcome i.mdate i.age_categories i.sex, exposure(admitted_days) || id:

I have one last question. Utilizing a Poisson regression on individual patient data will yield the incidence rate ratio (IRR). However, in your opinion, wouldn't it be more appropriate to use melogit, since my outcome, based on individual data, is a binary outcome (0/1)?

Gianfranco

Last edited by Gianfranco Di Gennaro; 29 Mar 2024, 16:24.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#6

29 Mar 2024, 17:37

I didn't understand, when I wrote #4, that you had only 13 months in your data set and that you are interested in differences among those 13 specific months. In that case, and given that you have 900 patients, you can indeed go ahead with i.mdate instead of treating month as continuous.

Concerning your last question, in #1 you wrote "Is there a way in STATA to calculate month-specific incidence rates?" Logistic regression cannot give you an incidence rate. It can give you an incidence probability, but that will fail to account for the fact that an admission may cover fractions of its first and final months. And -melogit- does not even have an -exposure()- option to put in such a variable. (And there is no sensible way it could have that.) For a proper incidence rate, you want a Poisson model.
Comment
Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 140
#7

04 Apr 2024, 15:38

Dear Clyde Schechter , I ran your code on my data and it worked perfectly.
Thanks again for your support.
Gianfranco
Comment
Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 140
#8

22 Feb 2025, 03:53

Dear Clyde Schechter , hope you're well.
I wrote a post about a situation that is an extension of the problem addressed here.

Can I ask your opinion on it? Obviously only if you can; I would hate to bother you.
I have tried several strategies, but none of them convince me and I would like to do everything possible to avoid gross errors.
The post is at the link:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1773241-estimating-hospital-mortality-with-recurring-patients-and-duplicates
Comment

Announcement

Month-specific incidence rates in STATA?

Comment

Comment

Comment

Comment

Comment

Comment

Comment