I need help to calculate the mid-year population for each year in a longitudinal dataset (2000-2019). This is so I can use it as a denominator for annual incidence and prevalence calculations. I have individual level population data that is very rich with information e.g. death dates, last date of data collection and transfer out date.
For each year I would like to calculate the population in the dataset that is alive and contributing data on the 1st July (midyear).
My data assumes that once an individual has entered the dataset they remain until the 'enddate'. For example an entry date 01jan2015 and enddate 01dec2019 will mean this individual contributes towards the midyear population in every year from 2015 through to 2019.
A random sample of my dataset below:
variable definitions:
I would need the mid-year population on 1st July for each year separately.
Thank you.
For each year I would like to calculate the population in the dataset that is alive and contributing data on the 1st July (midyear).
My data assumes that once an individual has entered the dataset they remain until the 'enddate'. For example an entry date 01jan2015 and enddate 01dec2019 will mean this individual contributes towards the midyear population in every year from 2015 through to 2019.
A random sample of my dataset below:
variable definitions:
- entry = registration date within the dataset
- year = year of entry
- enddate = exit from dataset (death or no longer contributing data)
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float id int(entry year enddate) 1 14878 2000 21795 2 14916 2000 21756 3 14895 2000 21046 4 15141 2001 21007 5 15112 2001 15834 6 15328 2001 16300 7 15502 2002 21816 8 15425 2002 17205 9 15844 2003 15959 10 16229 2004 17834 11 16257 2004 20468 12 16100 2004 17576 13 16670 2005 20032 14 16749 2005 21817 15 17113 2006 21816 16 17146 2006 19967 17 17184 2007 21789 18 17335 2007 21474 19 17343 2007 21816 20 17583 2008 19916 21 17751 2008 19612 22 17563 2008 21784 23 17794 2008 21088 24 18099 2009 21817 25 17988 2009 20216 26 18291 2010 20388 27 18288 2010 21812 28 18266 2010 20716 29 18681 2011 21816 30 18784 2011 19862 31 18707 2011 21816 32 18884 2011 19298 33 19176 2012 21816 34 19004 2012 21816 35 19696 2013 21370 36 19402 2013 19569 37 19893 2014 21817 38 20025 2014 21815 39 20178 2015 21754 40 20219 2015 21817 41 20206 2015 21816 42 20471 2016 21020 43 20585 2016 21816 44 21137 2017 21816 45 20844 2017 21816 46 21383 2018 21755 47 21280 2018 21817 48 21664 2019 21817 49 21584 2019 21817 50 21775 2019 21803 end format %d entry format %td enddate
I would need the mid-year population on 1st July for each year separately.
Thank you.
Comment