Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying age of attritors

    Hello all,

    I'm working with longitudinal panel data -- 14,716 panelists total over 7 waves between 1984 and 2005. Waves are initially every 5 years and then every other year from 1999 forward. My sample is limited to adults age 20-65.

    I am conducting an attrition analysis, and am trying to figure out how many panelists attrited because they simply aged beyond my max number of years. Thanks to previous help from this forum, I have successfully identified which panelists are attritors. Now I would like to figure out which panelists were at my age threshold when they attrited.

    The wrinkle is that my waves occur every five years and then every other year. So, a panelist who attrited in 1989 due to age would have been between 61 and 65 years in this previous wave (1984). But an age-attriting panelist in 2003 would only have been 64-65 in 2001.

    Admittedly, I'm not sure where to start. Any suggestions are helpful.




  • #2
    If I understand the question correctly, the following code illustrates the approach you will need. It is written without benefit of example data, so it will, at a minimum, require adaptation on your part, and if my basic assumptions about how your data are organized are off base, it may not work at all.

    Code:
    by id (year), sort: gen aged_out = age[_N] >= cond(year[_N] <= 1994, 61, 64)
    In the future, when asking for help with code, always show example data. And use the -dataex- command to do that. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thanks! Apologies for lack of data. My only wrinkle with your code is that 65-year-olds in 2005 are not aged-out. If a panelist is greater than or equal to 64 years of age in 2005, they should not be labled "aged out." How would I account for this?

      Data below:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float id int year float(aged_out att1 agehead)
        4001 1984 1 1 61
        6001 1984 1 1 61
        7002 1989 1 1 63
       13001 1984 1 1 55
       13001 1989 1 1 60
       13001 1994 1 1 65
       14001 1984 1 1 56
       14001 1989 1 1 61
       16001 1984 1 1 61
       16002 1994 1 0 56
       16002 1999 1 0 61
       16002 2001 1 0 63
       16002 2003 1 0 65
       18002 1984 1 1 62
       19001 1984 1 1 56
       19001 1989 1 1 61
       23001 1984 1 1 65
       33002 1984 1 1 64
       36001 1984 1 1 57
       36001 1989 1 1 62
       40001 1984 1 1 51
       40001 1989 1 1 57
       40001 1994 1 1 62
       40178 2001 1 1 64
       41002 1994 1 0 56
       41002 1999 1 0 61
       41002 2001 1 0 63
       41002 2003 1 0 65
       41178 1994 1 1 60
       41178 1999 1 1 65
       46002 1984 1 1 65
       47001 1984 1 1 48
       47001 1989 1 1 53
       47001 1994 1 1 58
       47001 1999 1 1 63
       47001 2001 1 1 65
       50001 1984 1 1 48
       50001 1989 1 1 53
       50001 1994 1 1 58
       50001 1999 1 1 64
       50001 2001 1 1 65
       57001 1984 1 1 51
       57001 1989 1 1 56
       57001 1994 1 1 61
       63002 1984 1 1 59
       63002 1989 1 1 64
       65001 1984 1 1 62
       68002 1984 1 0 46
       68002 1989 1 0 51
       68002 1994 1 0 56
       68002 1999 1 0 61
       68002 2001 1 0 63
       68002 2003 1 0 65
       75001 1984 1 1 53
       75001 1989 1 1 59
       75001 1994 1 1 63
       79001 1984 1 1 61
       84001 1984 1 1 51
       84001 1989 1 1 56
       84001 1994 1 1 61
       84002 1994 1 1 58
       84002 1999 1 1 63
       84002 2001 1 1 65
       85001 1984 1 0 46
       85001 1989 1 0 51
       85001 1994 1 0 56
       85001 1999 1 0 61
       85001 2001 1 0 63
       85001 2003 1 0 65
       87001 1984 1 0 46
       87001 1989 1 0 51
       87001 1994 1 0 56
       87001 1999 1 0 62
       87001 2001 1 0 63
       87001 2003 1 0 65
       88001 1984 1 1 58
       88001 1989 1 1 63
       90001 1984 1 0 45
       90001 1989 1 0 50
       90001 1994 1 0 55
       90001 1999 1 0 60
       90001 2001 1 0 62
       90001 2003 1 0 64
       90170 1984 1 0 45
       90170 1989 1 0 50
       90170 1994 1 0 55
       90170 1999 1 0 60
       90170 2001 1 0 62
       90170 2003 1 0 64
       96001 1984 1 1 55
       96001 1989 1 1 60
       96001 1994 1 1 65
      105002 1984 1 1 63
      106001 1984 1 1 56
      106001 1989 1 1 61
      119001 1984 1 0 44
      119001 1994 1 0 54
      119001 1999 1 0 59
      119001 2001 1 0 61
      119001 2003 1 0 63
      end
      On a slightly related question, I have some panelists who are correctly labeled as "aged_out" but did not fall under my code for attritors ("att1") because they did not miss two consecutive waves. My code for att1 is:

      Code:
      bysort id (year): gen att1 = year[_N]<2003
      I'm not exactly sure how to include the additional condition that a person age 64+ in 2003 is also an attritor. I tried the following to no avail:

      Code:
       bysort id (year): gen att2 = year[_N]<2003 | year[_N]==2003 if agehead >= 64
      Any thoughts are much appreciated.

      Comment


      • #4
        I think the following code accommodates all of these changes:
        Code:
        isid id year, sort
        by id (year): egen age_in_2003 = max(cond(year == 2003, agehead, .))
        by id (year): gen byte att1 = (year[_N] < 2003) | inrange(age_in_2003, 64, .)
        by id (year): gen byte aged_out = att1 & agehead[_N] >= cond(year[_N] <= 1994, 61, 64)
        This includes, without explicit mention, that people who are aged >= 64 in 2005 are not aged out because these people are not classified as attritors, and this code makes being an attritor part of the definition of aging out.

        Comment


        • #5
          Worked perfectly! Ultimately, I wanted to figure out how many panelists attrited without aging out. I subsequently used the att1 & aged_out variables to generate a new variable, att2, using the following code:

          Code:
          by id (year): gen byte att2 = att1 if aged_out == 0
          replace att2=0 if att2==.
          This worked. I plan on then using the att2 variable in an probit model as part of an attrition analysis to see if attriting (without aging out) is related to my variables of interest.

          However, I would also like to present some summary statistics about the number of panelists who attrited (without aging out) in each year (in addition to the number of panelists that aged out in each year). My problem is that the variables I've generated so far (att1, att2, aged_out) label a particular id as an attritor (or aged out) in each year the id was in the study. How can I easily figure out the specific year after which each panelist attrited? I would also like to do the same for the aged_out panelists as well.

          Here is some data for the id, year, and att2 variables:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float id int year byte att2
           4031 1999 1
           4170 1989 1
           4170 1994 1
           4172 1984 1
           4172 1989 1
           4172 1994 1
           4172 1999 1
           4179 1994 1
           4188 1999 1
           4188 2001 1
           5001 1984 1
           5001 1989 1
           5001 1994 1
           5004 1999 1
           5171 1989 1
           5171 1994 1
           5176 2001 1
           6030 1994 1
           7034 1999 1
           7034 2001 1
           7177 1994 1
          10005 1984 1
          10005 1989 1
          10030 1999 1
          13004 1984 1
          14002 1984 1
          14002 1989 1
          14002 1994 1
          14002 1999 1
          14002 2001 1
          14003 1989 1
          14170 1984 1
          16005 1984 1
          18170 1994 1
          19004 1984 1
          20170 1984 1
          22002 1989 1
          22174 1994 1
          22174 1999 1
          22174 2001 1
          23170 1984 1
          39170 1994 1
          39175 2001 1
          40004 1994 1
          41003 1984 1
          41003 1989 1
          41003 1994 1
          41030 1999 1
          41176 1994 1
          41185 1994 1
          41185 1999 1
          45030 1994 1
          45177 1994 1
          45180 1999 1
          46172 1994 1
          46172 1999 1
          47030 1989 1
          47030 1994 1
          47173 1999 1
          51170 1989 1
          53001 1984 1
          53001 1989 1
          53001 1994 1
          53170 1999 1
          53172 1984 1
          53172 1989 1
          53212 2001 1
          57170 1984 1
          57173 1989 1
          57174 1989 1
          57175 1994 1
          57178 2001 1
          60001 1984 1
          60003 1984 1
          60003 1989 1
          60003 1994 1
          60170 1994 1
          60170 1999 1
          60170 2001 1
          61005 1984 1
          67003 1984 1
          67003 1989 1
          67004 1984 1
          69001 1994 1
          75170 1989 1
          79021 1994 1
          84006 1994 1
          84006 1999 1
          84006 2001 1
          84171 1984 1
          84171 1989 1
          84171 1994 1
          84171 1999 1
          84171 2001 1
          84173 1984 1
          84173 1989 1
          84178 1999 1
          85173 2001 1
          87003 1984 1
          87003 1994 1
          end

          Comment


          • #6
            Code:
            by id (year), sort: gen final_year = year[_N]
            by id (year): gen byte flag = (_n == _N)
            The variable final_year will now give you the last year that the person participated in the survey (regardless of whether they attrited, or aged out, or just reached the end.) The variable flag, will tag a single observation (the last one, actually) for each id, so that you can single those out for id-level descriptive statistics without multiple-counting. For example:
            Code:
            summ final_year if att1 & flag
            will give you summary statistics of the final year in the study for those who attrited, and will only count each id once.

            Comment


            • #7
              Wonderful! Thank you so much, Clyde for all your help. Little by little, I am gradually improving my STATA coding ability thanks to folks on the forum like you.

              Comment


              • #8
                How would your code in #6 look to identify the first year a panelist joined the study? I was thinking something along the lines of:

                Code:
                by id (year), sort: gen first_year = min(year)
                Except stata said I had invalid syntax...

                Comment


                • #9
                  min(year) will give you the first year a panelist joined the study, however it is an egen function,
                  Code:
                  by id (year), sort: egen first_year = min(year)

                  Comment


                  • #10
                    A somewhat more efficient (though it won't be noticeable unless your data set is enormous) way to get the first year is:
                    Code:
                    by id (year), sort: gen first_year = year[1]
                    This avoids the overhead associated with calling -egen-.

                    By the way, there is a -min()- function associated with -gen-. But it is actually more analogous to -egen, rowmin()- than it is to -egen, min()-. In fact, the -gen ... min()- function requires at least two arguments. -gen ... min(year)- will throw a syntax error.
                    Last edited by Clyde Schechter; 13 Jan 2022, 10:22.

                    Comment


                    • #11
                      Thank you both so much for your helpful comments! Here is my latest problem. For any given year, the total number of panelists should equal the previous year's total minus attritors plus joiners, or Year2 = Year1 - Att1 + Joiner.

                      Unfortunately, something is off:

                      Code:
                      . xttab year
                      
                                        Overall             Between            Within
                           year |    Freq.  Percent      Freq.  Percent        Percent
                      ----------+-----------------------------------------------------
                           1984 |    5841     13.31      5841     39.69          45.50
                           1989 |    6089     13.88      6089     41.38          35.79
                           1994 |    7119     16.23      7119     48.38          37.82
                           1999 |    5694     12.98      5694     38.69          25.70
                           2001 |    6097     13.90      6097     41.43          25.67
                           2003 |    6462     14.73      6462     43.91          28.35
                           2005 |    6566     14.97      6566     44.62          35.42
                      ----------+-----------------------------------------------------
                          Total |   43868    100.00     43868    298.10          33.55
                                                   (n = 14716)
                      
                      . tab final_year if att1 & flag
                      
                       final_year |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                             1984 |      1,315       18.55       18.55
                             1989 |      1,538       21.70       40.25
                             1994 |      2,985       42.11       82.36
                             1999 |        504        7.11       89.48
                             2001 |        639        9.02       98.49
                             2003 |        107        1.51      100.00
                      ------------+-----------------------------------
                            Total |      7,088      100.00
                      
                      . tab first_year if joiner & flag
                      
                       first_year |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                             1989 |      1,830       20.62       20.62
                             1994 |      2,531       28.52       49.14
                             1999 |      1,787       20.14       69.27
                             2001 |        953       10.74       80.01
                             2003 |        895       10.08       90.10
                             2005 |        879        9.90      100.00
                      ------------+-----------------------------------
                            Total |      8,875      100.00
                      Here, I have 5841 panelists in 1984. After 1984, 1,315 attrit and 1,830 join. However, 5841-1315+1830=6356, but 1989 has only 6089 members.

                      The code for first_year, final_year, and flag come from Clyde's posts #6 & #10. The code for att1 & joiner is below.

                      Code:
                       
                       bysort id (year): gen att1 = year[_N]<2003 bysort id (year): gen joiner = year[1]>1984
                      Finally, here is some sample data:
                      Code:
                       * Example generated by -dataex-. To install: ssc install dataex clear input float id int year byte att1 float(joiner first_year final_year) byte flag 4001 1984 1 0 1984 1984 1 4003 1984 0 0 1984 2005 0 4003 1989 0 0 1984 2005 0 4003 1994 0 0 1984 2005 0 4003 1999 0 0 1984 2005 0 4003 2001 0 0 1984 2005 0 4003 2003 0 0 1984 2005 0 4003 2005 0 0 1984 2005 1 4004 1999 0 1 1999 2005 0 4004 2001 0 1 1999 2005 0 4004 2003 0 1 1999 2005 0 4004 2005 0 1 1999 2005 1 4006 1984 0 0 1984 2005 0 4006 1994 0 0 1984 2005 0 4006 2003 0 0 1984 2005 0 4006 2005 0 0 1984 2005 1 4007 1999 0 1 1999 2005 0 4007 2001 0 1 1999 2005 0 4007 2005 0 1 1999 2005 1 4008 1989 0 1 1989 2005 0 4008 1999 0 1 1989 2005 0 4008 2001 0 1 1989 2005 0 4008 2003 0 1 1989 2005 0 4008 2005 0 1 1989 2005 1 4031 1999 1 1 1999 1999 1 4033 2001 0 1 2001 2005 0 4033 2003 0 1 2001 2005 0 4033 2005 0 1 2001 2005 1 4034 1999 0 1 1999 2005 0 4034 2001 0 1 1999 2005 0 4034 2003 0 1 1999 2005 0 4034 2005 0 1 1999 2005 1 4035 1999 0 1 1999 2005 0 4035 2001 0 1 1999 2005 0 4035 2005 0 1 1999 2005 1 4036 1999 0 1 1999 2005 0 4036 2005 0 1 1999 2005 1 4170 1989 1 1 1989 1994 0 4170 1994 1 1 1989 1994 1 4172 1984 1 0 1984 1999 0 4172 1989 1 0 1984 1999 0 4172 1994 1 0 1984 1999 0 4172 1999 1 0 1984 1999 1 4175 1989 0 1 1989 2003 0 4175 1994 0 1 1989 2003 0 4175 1999 0 1 1989 2003 0 4175 2001 0 1 1989 2003 0 4175 2003 0 1 1989 2003 1 4179 1994 1 1 1994 1994 1 4186 1994 0 1 1994 2003 0 4186 1999 0 1 1994 2003 0 4186 2001 0 1 1994 2003 0 4186 2003 0 1 1994 2003 1 4188 1999 1 1 1999 2001 0 4188 2001 1 1 1999 2001 1 4190 2001 0 1 2001 2003 0 4190 2003 0 1 2001 2003 1 4193 2003 0 1 2003 2003 1 4195 2001 0 1 2001 2005 0 4195 2003 0 1 2001 2005 0 4195 2005 0 1 2001 2005 1 4196 2003 0 1 2003 2005 0 4196 2005 0 1 2003 2005 1 5001 1984 1 0 1984 1994 0 5001 1989 1 0 1984 1994 0 5001 1994 1 0 1984 1994 1 5002 1999 0 1 1999 2005 0 5002 2001 0 1 1999 2005 0 5002 2003 0 1 1999 2005 0 5002 2005 0 1 1999 2005 1 5003 1984 0 0 1984 2005 0 5003 1989 0 0 1984 2005 0 5003 1994 0 0 1984 2005 0 5003 1999 0 0 1984 2005 0 5003 2001 0 0 1984 2005 0 5003 2003 0 0 1984 2005 0 5003 2005 0 0 1984 2005 1 5004 1999 1 1 1999 1999 1 5005 1994 0 1 1994 2005 0 5005 1999 0 1 1994 2005 0 5005 2003 0 1 1994 2005 0 5005 2005 0 1 1994 2005 1 5031 2005 0 1 2005 2005 1 5171 1989 1 1 1989 1994 0 5171 1994 1 1 1989 1994 1 5172 2003 0 1 2003 2003 1 5175 2001 0 1 2001 2005 0 5175 2003 0 1 2001 2005 0 5175 2005 0 1 2001 2005 1 5176 2001 1 1 2001 2001 1 6001 1984 1 0 1984 1984 1 6006 1984 0 0 1984 2005 0 6006 1989 0 0 1984 2005 0 6006 1994 0 0 1984 2005 0 6006 1999 0 0 1984 2005 0 6006 2001 0 0 1984 2005 0 6006 2003 0 0 1984 2005 0 6006 2005 0 0 1984 2005 1 6030 1994 1 1 1994 1994 1 6032 2005 0 1 2005 2005 1 end

                      Comment


                      • #12
                        For any given year, the total number of panelists should equal the previous year's total minus attritors plus joiners, or Year2 = Year1 - Att1 + Joiner. [emphasis added]
                        But you have not defined attritor or joiner in that way. Your code defines attritors and joiners as people who have ever attritted or ever joined the program, not just in that year. And for some of the things you have done up to now, you need those to be defined exactly as you have already done it.

                        You need to create new variables:
                        Code:
                        by id (year), sort: gen byte joined_this_year = (_n == 1) & year > 1984
                        by id (year), sort: gen byte attrited_this_year = (_n == _N) & year < 2003
                        And even with that, you will have to modify your equation to: Participants in current year = Number in Previous Year - Attrited in Previous Year + Joiners in current year.

                        Comment


                        • #13
                          Apologies that the quoted prose was unclear. Sleep deprivation is real. If you look closely at the code I posted above you will see that what I actually did (and intended in my prose) was "Participants in current year = Number in Previous Year - Attrited in Previous Year + Joiners in current year."

                          Your new code still leaves me with the same problem. Let me try to illustrate more clearly.

                          Here is exactly what I did before:

                          For "participants in current year" and "number in previous year" I looked at the current year results from

                          [CODE]
                          . xttab year Overall Between Within year | Freq. Percent Freq. Percent Percent ----------+----------------------------------------------------- 1984 | 5841 13.31 5841 39.69 45.50 1989 | 6089 13.88 6089 41.38 35.79 1994 | 7119 16.23 7119 48.38 37.82 1999 | 5694 12.98 5694 38.69 25.70 2001 | 6097 13.90 6097 41.43 25.67 2003 | 6462 14.73 6462 43.91 28.35 2005 | 6566 14.97 6566 44.62 35.42 ----------+----------------------------------------------------- Total | 43868 100.00 43868 298.10 33.55 (n = 14716)

                          For "attrited in previous year" I used:

                          Code:
                           
                           . tab final_year if att1 & flag   final_year |      Freq.     Percent        Cum. ------------+-----------------------------------        1984 |      1,315       18.55       18.55        1989 |      1,538       21.70       40.25        1994 |      2,985       42.11       82.36        1999 |        504        7.11       89.48        2001 |        639        9.02       98.49        2003 |        107        1.51      100.00 ------------+-----------------------------------       Total |      7,088      100.00
                          Where "final_year" & "flag" are defined by your reply in post #6, and "att1" is defined in my previous post #11. For "joiners in current year" I used:
                          Code:
                             
                           . tab first_year if joiner & flag   first_year |      Freq.     Percent        Cum. ------------+-----------------------------------        1989 |      1,830       20.62       20.62        1994 |      2,531       28.52       49.14        1999 |      1,787       20.14       69.27        2001 |        953       10.74       80.01        2003 |        895       10.08       90.10        2005 |        879        9.90      100.00 ------------+-----------------------------------       Total |      8,875      100.00
                          Where "first_year" & "flag" are defined by your reply in post #6, and "joiner" is defined in my previous post #11. Based on these data, when I look at 1989, I get 5841-1315+1830=6356, but 1989 has only 6089 members. When I use your newly defined variables, I get the same numbers for 1989 (1315 attritors and 1830 joiners).
                          Code:
                           . by id (year), sort: gen byte joined_this_year = (_n == 1) & year > 1984  .  . by id (year), sort: gen byte attrited_this_year = (_n == _N) & year < 2003  . tab joined_this_year if year==1989  joined_this |       _year |      Freq.     Percent        Cum. ------------+-----------------------------------           0 |      4,259       69.95       69.95           1 |      1,830       30.05      100.00 ------------+-----------------------------------       Total |      6,089      100.00  . tab attrited_this_year if year==1984  attrited_th |     is_year |      Freq.     Percent        Cum. ------------+-----------------------------------           0 |      4,526       77.49       77.49           1 |      1,315       22.51      100.00 ------------+-----------------------------------       Total |      5,841      100.00
                          Thoughts?

                          Comment


                          • #14
                            Apologies for the messy code posting. Anyway, all the code that I used to calculate
                            "Participants in current year = Number in Previous Year - Attrited in Previous Year + Joiners in current year" and get 5841-1315+1830=6356, when 1989 has only 6089 members is all in my post #11. The results from your code, which gives me identical results (1315 attritors and 1830 joiners) is:

                            Code:
                            . by id (year), sort: gen byte joined_this_year = (_n == 1) & year > 1984
                             
                            . by id (year), sort: gen byte attrited_this_year = (_n == _N) & year < 2003
                            
                            . tab joined_this_year if year==1989
                            
                            joined_this |
                                  _year |      Freq.     Percent        Cum.
                            ------------+-----------------------------------
                                      0 |      4,259       69.95       69.95
                                      1 |      1,830       30.05      100.00
                            ------------+-----------------------------------
                                  Total |      6,089      100.00
                            
                            . tab attrited_this_year if year==1984
                            
                            attrited_th |
                                is_year |      Freq.     Percent        Cum.
                            ------------+-----------------------------------
                                      0 |      4,526       77.49       77.49
                                      1 |      1,315       22.51      100.00
                            ------------+-----------------------------------
                                  Total |      5,841      100.00

                            Comment


                            • #15
                              I think the problem is in your data, not your code. Look at id 119001. That person is in the study in 1984, 1994, 1999, 2001, and 2003, but 1989 is skipped! So when you count up the numbers for 1994, he fails to be subtracted because he is not an attritor as we have defined it, but he is not there. And when he rejoins in 1999, he is not counted as a joiner either, for the same reason. I'm guessing he's not the only instance of this. The presence of gaps like this invalidates the equation you are trying to verify. That equation is only valid if people are continuously present from their first entry until their last.

                              Comment

                              Working...
                              X