Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating mid-year population for incidence/prevalence denominator

    I need help to calculate the mid-year population for each year in a longitudinal dataset (2000-2019). This is so I can use it as a denominator for annual incidence and prevalence calculations. I have individual level population data that is very rich with information e.g. death dates, last date of data collection and transfer out date.

    For each year I would like to calculate the population in the dataset that is alive and contributing data on the 1st July (midyear).

    My data assumes that once an individual has entered the dataset they remain until the 'enddate'. For example an entry date 01jan2015 and enddate 01dec2019 will mean this individual contributes towards the midyear population in every year from 2015 through to 2019.

    A random sample of my dataset below:

    variable definitions:
    • entry = registration date within the dataset
    • year = year of entry
    • enddate = exit from dataset (death or no longer contributing data)
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id int(entry year enddate)
     1 14878 2000 21795
     2 14916 2000 21756
     3 14895 2000 21046
     4 15141 2001 21007
     5 15112 2001 15834
     6 15328 2001 16300
     7 15502 2002 21816
     8 15425 2002 17205
     9 15844 2003 15959
    10 16229 2004 17834
    11 16257 2004 20468
    12 16100 2004 17576
    13 16670 2005 20032
    14 16749 2005 21817
    15 17113 2006 21816
    16 17146 2006 19967
    17 17184 2007 21789
    18 17335 2007 21474
    19 17343 2007 21816
    20 17583 2008 19916
    21 17751 2008 19612
    22 17563 2008 21784
    23 17794 2008 21088
    24 18099 2009 21817
    25 17988 2009 20216
    26 18291 2010 20388
    27 18288 2010 21812
    28 18266 2010 20716
    29 18681 2011 21816
    30 18784 2011 19862
    31 18707 2011 21816
    32 18884 2011 19298
    33 19176 2012 21816
    34 19004 2012 21816
    35 19696 2013 21370
    36 19402 2013 19569
    37 19893 2014 21817
    38 20025 2014 21815
    39 20178 2015 21754
    40 20219 2015 21817
    41 20206 2015 21816
    42 20471 2016 21020
    43 20585 2016 21816
    44 21137 2017 21816
    45 20844 2017 21816
    46 21383 2018 21755
    47 21280 2018 21817
    48 21664 2019 21817
    49 21584 2019 21817
    50 21775 2019 21803
    end
    format %d entry
    format %td enddate

    I would need the mid-year population on 1st July for each year separately.

    Thank you.

  • #2
    Code:
    assert entry < enddate
    
    frame create denominators int year long mid_year_population
    forvalues y = 2015/2019 {
        count if inrange(td(1jul`y'), entry, enddate)
        frame post denominators (`y') (r(N))
    }
    
    frame change denominators
    list, noobs clean abbrev(24)
    At the end of this code, the results you asked for are displayed in the Results window, and they are also in the active data set in frame denominators, where you can work with them.

    Note: Because it uses frames, this requires version 16 or later.

    Comment


    • #3
      Thank you Clyde this worked perfectly and I can now use the midyear population for incidence rate and period prevalence.

      I plan to use the -ststet- and -stptime- function to calculate incidence rate. Is there a similar function in stata for calculating period prevalence e.g. in 2015?

      Comment


      • #4
        Is there a similar function in stata for calculating period prevalence e.g. in 2015?
        Not that I am aware of.

        Comment


        • #5
          Hi,
          I am now trying to re-run my calculation of mid-year population as explained in #1. But stratify by gender (1=male, 0=female). Can the code provided in #2 be adapted to calculate the mid-year population by gender?

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float id int(entry year enddate) float gender
           1 14878 2000 21795 1
           2 14916 2000 21756 1
           3 14895 2000 21046 0
           4 15141 2001 21007 1
           5 15112 2001 15834 0
           6 15328 2001 16300 0
           7 15502 2002 21816 1
           8 15425 2002 17205 1
           9 15844 2003 15959 1
          10 16229 2004 17834 0
          11 16257 2004 20468 1
          12 16100 2004 17576 1
          13 16670 2005 20032 0
          14 16749 2005 21817 1
          15 17113 2006 21816 1
          16 17146 2006 19967 1
          17 17184 2007 21789 1
          18 17335 2007 21474 1
          19 17343 2007 21816 0
          20 17583 2008 19916 0
          21 17751 2008 19612 0
          22 17563 2008 21784 1
          23 17794 2008 21088 1
          24 18099 2009 21817 0
          25 17988 2009 20216 1
          26 18291 2010 20388 1
          27 18288 2010 21812 1
          28 18266 2010 20716 1
          29 18681 2011 21816 1
          30 18784 2011 19862 1
          31 18707 2011 21816 1
          32 18884 2011 19298 1
          33 19176 2012 21816 1
          34 19004 2012 21816 1
          35 19696 2013 21370 0
          36 19402 2013 19569 1
          37 19893 2014 21817 1
          38 20025 2014 21815 0
          39 20178 2015 21754 1
          40 20219 2015 21817 0
          41 20206 2015 21816 1
          42 20471 2016 21020 0
          43 20585 2016 21816 1
          44 21137 2017 21816 1
          45 20844 2017 21816 1
          46 21383 2018 21755 0
          47 21280 2018 21817 1
          48 21664 2019 21817 1
          49 21584 2019 21817 0
          50 21775 2019 21803 1
          end
          format %d entry
          format %td enddate
          Thank you.

          Comment


          • #6
            Your data example doesn't include a variable for gender, and I have to guess whether it is a string variable or numeric. I'm going to guess it's numeric, because that makes it simpler. If it's not, then I urge you to -encode- it to make it numeric--nearly everything you want to do with it afterward will be easier that way.


            Code:
            assert entry < enddate
            
            levelsof gender, local(genders)
            
            frame create denominators int gender year long mid_year_population
            forvalues y = 2015/2019 {
                foreach g of local genders {
                    count if inrange(td(1jul`y'), entry, enddate) & gender == `g'
                    frame post denominators (`g') (`y') (r(N))
                }
            }
            
            frame change denominators
            list, noobs clean abbrev(24)

            Changes shown in bold face.
            Last edited by Clyde Schechter; 23 Jan 2022, 17:32.

            Comment


            • #7
              Thank you Clyde

              Comment

              Working...
              X