Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying attrition in longitudinal panel data sample

    Hello all,

    I am working with longitudinal panel data (PSID). I would like to identify panelists who (1) attrited from the study and (2) joined the study during my sample period. I would also like to be able to identify mean/median # of waves that panelists participated in the study during the sample period. After running these stats, I would like to figure out if attrition is related to any of my independent variables of interest.

    I am struggling to figure out how to identify the attritors, joiners, and the mean/median # of waves for the panelists. I was thinking of perhaps reshaping my data from long form to wide form and trying to create a dummy variables for attritors and joiners. I am generally unsure. After I identify the attritors and the joiners, my plan would then be to run a correlation matrix and perhaps a tobit or profit model to identify any association between attriting and my independent variables of interest.

    Any help is much appreciated.

    Thanks!

  • #2
    perhaps you'll find the following example useful,
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id wave)
    1 1
    1 2
    1 3
    1 4
    1 5
    2 1
    2 2
    2 3
    2 4
    2 5
    3 1
    3 2
    3 3
    3 4
    4 2
    4 3
    4 4
    4 5
    end
    Code:
    sort id wave
    
    sum wave
    by id: gen attritor = _N < `r(max)' & wave[1]==1
    by id: gen joiner = _N < `r(max)' & wave[1]!=1
    
    by id: gen numwave = _N if _n==1
    egen meanwave = mean(numwave)
    Last edited by Øyvind Snilsberg; 03 Jan 2022, 04:08.

    Comment


    • #3
      Thanks! But I'm still having some difficulty. First, a panelist must miss two consecutive years to formally attrit from the study. For example if over five waves a panelist participates in waves 1,2,3, and 5 (or even on 1, 3, and 5), that panelist doesn't attrit. However, a panelist present for 1 and 2, but 3 and 4 does attrit from the study. How would this change your code?

      Second, I want to make sure I understand the code you did send. Am I correct that in "year[1]==1", the second "1" should be replaced by the year of the first wave i.e. "year[1]==1984"? When I make that tweak, I get the following:

      [CODE]

      . sort id year

      . sum year

      Variable | Obs Mean Std. Dev. Min Max
      -------------+---------------------------------------------------------
      year | 43,868 1996.569 7.091292 1984 2005

      . by id: gen attritor = _N < `r(max)' & year[1]==1984

      . tab attritor

      attritor | Freq. Percent Cum.
      ------------+-----------------------------------
      0 | 23,189 52.86 52.86
      1 | 20,679 47.14 100.00
      ------------+-----------------------------------
      Total | 43,868 100.00

      . sort id year

      . sum year

      Variable | Obs Mean Std. Dev. Min Max
      -------------+---------------------------------------------------------
      year | 43,868 1996.569 7.091292 1984 2005

      . by id: gen joiner = _N < `r(max)' & year[1]!=1984

      . tab joiner

      joiner | Freq. Percent Cum.
      ------------+-----------------------------------
      0 | 20,679 47.14 47.14
      1 | 23,189 52.86 100.00
      ------------+-----------------------------------
      Total | 43,868 100.00

      [/CODE}

      Am I correct that the data show that 47.14% attrition over the sample period? But that 52.86% joined at some later date? Also, it looks like under this code, one is either a joiner or an attritor, and that doesn't seem right.. Panelists should either have been (1) present for all waves, (2) attrited after some wave, or (3) joined after the first wave in the sample period.

      Comment


      • #4
        Sorry here is the code:

        Code:
        . sort id year
        
        . sum year
        
        Variable | Obs Mean Std. Dev. Min Max
        -------------+---------------------------------------------------------
        year | 43,868 1996.569 7.091292 1984 2005
        
        . by id: gen attritor = _N < `r(max)' & year[1]==1984
        
        . tab attritor
        
        attritor | Freq. Percent Cum.
        ------------+-----------------------------------
        0 | 23,189 52.86 52.86
        1 | 20,679 47.14 100.00
        ------------+-----------------------------------
        Total | 43,868 100.00
        
        . sort id year
        
        . sum year
        
        Variable | Obs Mean Std. Dev. Min Max
        -------------+---------------------------------------------------------
        year | 43,868 1996.569 7.091292 1984 2005
        
        . by id: gen joiner = _N < `r(max)' & year[1]!=1984
        
        . tab joiner
        
        joiner | Freq. Percent Cum.
        ------------+-----------------------------------
        0 | 20,679 47.14 47.14
        1 | 23,189 52.86 100.00
        ------------+-----------------------------------
        Total | 43,868 100.00

        Comment


        • #5
          I see, could you post some example data using dataex?

          Comment


          • #6
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float id int year
            4001 1984
            4003 1984
            4003 1989
            4003 1994
            4003 1999
            4003 2001
            4003 2003
            4003 2005
            4004 1999
            4004 2001
            4004 2003
            4004 2005
            4006 1984
            4006 1994
            4006 2003
            4006 2005
            4007 1999
            4007 2001
            4007 2005
            4008 1989
            4008 1999
            4008 2001
            4008 2003
            4008 2005
            4031 1999
            4033 2001
            4033 2003
            4033 2005
            4034 1999
            4034 2001
            4034 2003
            4034 2005
            4035 1999
            4035 2001
            4035 2005
            4036 1999
            4036 2005
            4170 1989
            4170 1994
            4172 1984
            4172 1989
            4172 1994
            4172 1999
            4175 1989
            4175 1994
            4175 1999
            4175 2001
            4175 2003
            4179 1994
            4186 1994
            4186 1999
            4186 2001
            4186 2003
            4188 1999
            4188 2001
            4190 2001
            4190 2003
            4193 2003
            4195 2001
            4195 2003
            4195 2005
            4196 2003
            4196 2005
            5001 1984
            5001 1989
            5001 1994
            5002 1999
            5002 2001
            5002 2003
            5002 2005
            5003 1984
            5003 1989
            5003 1994
            5003 1999
            5003 2001
            5003 2003
            5003 2005
            5004 1999
            5005 1994
            5005 1999
            5005 2003
            5005 2005
            5031 2005
            5171 1989
            5171 1994
            5172 2003
            5175 2001
            5175 2003
            5175 2005
            5176 2001
            6001 1984
            6006 1984
            6006 1989
            6006 1994
            6006 1999
            6006 2001
            6006 2003
            6006 2005
            6030 1994
            6032 2005
            end
            The first three ids show some good examples. 4001 attrited after 1984. 4003 remained in the survey for all seven of the waves I sampled (1984, 1989, 1994, 1999, 2001, 2003, 2005). 4004 joined in 1999 and was present for all subsequent waves in my sample.

            Comment


            • #7
              Code:
              recode year (1984=1) (1989=2) (1994=3) (1999=4) (2001=5) (2003=6) (2005=7), gen(wave)
              
              sort id wave
              
              bys id: gen joiner = wave[1]>1
              
              by id: gen x = wave-wave[_n-1]>2 & _n>1 | 7-wave[_N]>2
              by id: egen attritor = max(x)

              Comment


              • #8
                This is helpful and making progress. But how do I ensure that the attritor missed 2 consecutive waves (and not non-consecutive waves, or joined after missing first two or more waves)?

                I ask because a panelist only attrits if the they miss 2 consecutive waves. Missing multiple nonconsecutive waves is perfectly ok, and such panelists remain in the study. Hypothetically, since I have 7 waves, a panelist could have missed three nonconsecutive waves and remain in the study (i.e. miss every other wave).

                In addition, it seems that some panelists absent in initial waves were labeled both attritors and joiners (2,676 out of 14,716 panelists total). Here is code for the attritor joiners

                Code:
                . gen attritjoin = attritor + joiner
                
                . xttab attritjoin
                
                                  Overall             Between            Within
                attritj~n |    Freq.  Percent      Freq.  Percent        Percent
                ----------+-----------------------------------------------------
                        0 |   12245     27.91      1870     12.71         100.00
                        1 |   28037     63.91     10170     69.11         100.00
                        2 |    3586      8.17      2676     18.18         100.00
                ----------+-----------------------------------------------------
                    Total |   43868    100.00     14716    100.00         100.00
                                             (n = 14716)
                Any further suggestions?

                Comment


                • #9
                  Thank you for your previous help. I believe I understand your code now. However, as I have been looking at the data, it appears that some panelists find their way back into the study even after missing more than 2 waves.

                  I believe the best way to identity attritors is to simply define them as any panelist that did not appear in the last two waves (2003 & 2005). Thus I would like to code attritors as any panelist who does not have data for 2003 & 2005 (or waves 6 & 7).

                  I tried playing around with the code a little, to no success. For example, I tried

                  Code:
                  
                  . by id: gen z = wave[_n] if wave[_n] != 6 & 7
                  . by id: egen att1 = max(z)
                  I realized ex ante that this code isn't going to work, but I'm stuck about where to go from this idea...

                  Comment


                  • #10
                    sorry for lack of explanation. based on #9,
                    Code:
                    bysort id (year): gen joiner = year[1]>1984 //first record after 1984
                    bysort id (year): gen attritor = year[_N]<2003 //last record before 2003

                    Comment


                    • #11
                      Thanks much!

                      Comment

                      Working...
                      X