Calculating mid-year population for incidence/prevalence denominator

Matthew Milano

Join Date: Apr 2021

Posts: 37
#1

Calculating mid-year population for incidence/prevalence denominator

17 Jan 2022, 16:34

I need help to calculate the mid-year population for each year in a longitudinal dataset (2000-2019). This is so I can use it as a denominator for annual incidence and prevalence calculations. I have individual level population data that is very rich with information e.g. death dates, last date of data collection and transfer out date.

For each year I would like to calculate the population in the dataset that is alive and contributing data on the 1st July (midyear).

My data assumes that once an individual has entered the dataset they remain until the 'enddate'. For example an entry date 01jan2015 and enddate 01dec2019 will mean this individual contributes towards the midyear population in every year from 2015 through to 2019.

A random sample of my dataset below:

variable definitions:
entry = registration date within the dataset

year = year of entry

enddate = exit from dataset (death or no longer contributing data)

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float id int(entry year enddate) 1 14878 2000 21795 2 14916 2000 21756 3 14895 2000 21046 4 15141 2001 21007 5 15112 2001 15834 6 15328 2001 16300 7 15502 2002 21816 8 15425 2002 17205 9 15844 2003 15959 10 16229 2004 17834 11 16257 2004 20468 12 16100 2004 17576 13 16670 2005 20032 14 16749 2005 21817 15 17113 2006 21816 16 17146 2006 19967 17 17184 2007 21789 18 17335 2007 21474 19 17343 2007 21816 20 17583 2008 19916 21 17751 2008 19612 22 17563 2008 21784 23 17794 2008 21088 24 18099 2009 21817 25 17988 2009 20216 26 18291 2010 20388 27 18288 2010 21812 28 18266 2010 20716 29 18681 2011 21816 30 18784 2011 19862 31 18707 2011 21816 32 18884 2011 19298 33 19176 2012 21816 34 19004 2012 21816 35 19696 2013 21370 36 19402 2013 19569 37 19893 2014 21817 38 20025 2014 21815 39 20178 2015 21754 40 20219 2015 21817 41 20206 2015 21816 42 20471 2016 21020 43 20585 2016 21816 44 21137 2017 21816 45 20844 2017 21816 46 21383 2018 21755 47 21280 2018 21817 48 21664 2019 21817 49 21584 2019 21817 50 21775 2019 21803 end format %d entry format %td enddate

I would need the mid-year population on 1st July for each year separately.

Thank you.
Tags: cross-sectional study, incidence, mid-year population, prevalence
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

17 Jan 2022, 17:09

Code:

assert entry < enddate frame create denominators int year long mid_year_population forvalues y = 2015/2019 { count if inrange(td(1jul`y'), entry, enddate) frame post denominators (`y') (r(N)) } frame change denominators list, noobs clean abbrev(24)

At the end of this code, the results you asked for are displayed in the Results window, and they are also in the active data set in frame denominators, where you can work with them.

Note: Because it uses frames, this requires version 16 or later.
1 like
Comment
Matthew Milano

Join Date: Apr 2021

Posts: 37
#3

19 Jan 2022, 03:11

Thank you Clyde this worked perfectly and I can now use the midyear population for incidence rate and period prevalence.

I plan to use the -ststet- and -stptime- function to calculate incidence rate. Is there a similar function in stata for calculating period prevalence e.g. in 2015?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#4

19 Jan 2022, 11:29

Is there a similar function in stata for calculating period prevalence e.g. in 2015?

Not that I am aware of.
Comment

Matthew Milano

Join Date: Apr 2021
Posts: 37

23 Jan 2022, 15:34

Hi,
I am now trying to re-run my calculation of mid-year population as explained in #1. But stratify by gender (1=male, 0=female). Can the code provided in #2 be adapted to calculate the mid-year population by gender?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float id int(entry year enddate) float gender
 1 14878 2000 21795 1
 2 14916 2000 21756 1
 3 14895 2000 21046 0
 4 15141 2001 21007 1
 5 15112 2001 15834 0
 6 15328 2001 16300 0
 7 15502 2002 21816 1
 8 15425 2002 17205 1
 9 15844 2003 15959 1
10 16229 2004 17834 0
11 16257 2004 20468 1
12 16100 2004 17576 1
13 16670 2005 20032 0
14 16749 2005 21817 1
15 17113 2006 21816 1
16 17146 2006 19967 1
17 17184 2007 21789 1
18 17335 2007 21474 1
19 17343 2007 21816 0
20 17583 2008 19916 0
21 17751 2008 19612 0
22 17563 2008 21784 1
23 17794 2008 21088 1
24 18099 2009 21817 0
25 17988 2009 20216 1
26 18291 2010 20388 1
27 18288 2010 21812 1
28 18266 2010 20716 1
29 18681 2011 21816 1
30 18784 2011 19862 1
31 18707 2011 21816 1
32 18884 2011 19298 1
33 19176 2012 21816 1
34 19004 2012 21816 1
35 19696 2013 21370 0
36 19402 2013 19569 1
37 19893 2014 21817 1
38 20025 2014 21815 0
39 20178 2015 21754 1
40 20219 2015 21817 0
41 20206 2015 21816 1
42 20471 2016 21020 0
43 20585 2016 21816 1
44 21137 2017 21816 1
45 20844 2017 21816 1
46 21383 2018 21755 0
47 21280 2018 21817 1
48 21664 2019 21817 1
49 21584 2019 21817 0
50 21775 2019 21803 1
end
format %d entry
format %td enddate

Thank you.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#6

23 Jan 2022, 16:29

Your data example doesn't include a variable for gender, and I have to guess whether it is a string variable or numeric. I'm going to guess it's numeric, because that makes it simpler. If it's not, then I urge you to -encode- it to make it numeric--nearly everything you want to do with it afterward will be easier that way.

Code:

assert entry < enddate levelsof gender, local(genders) frame create denominators int gender year long mid_year_population forvalues y = 2015/2019 { foreach g of local genders { count if inrange(td(1jul`y'), entry, enddate) & gender == `g' frame post denominators (`g') (`y') (r(N)) } } frame change denominators list, noobs clean abbrev(24)

Changes shown in bold face.

Last edited by Clyde Schechter; 23 Jan 2022, 16:32.
Comment
Matthew Milano

Join Date: Apr 2021

Posts: 37
#7

24 Jan 2022, 06:17

Thank you Clyde
Comment

Announcement

Calculating mid-year population for incidence/prevalence denominator

Comment

Comment

Comment

Comment

Comment

Comment