Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating attrition rate on panel data

    Hello,

    I'm looking to calculate the attrition rate of my unbalanced panel dataset. It spans from 2005 to 2015, and I'm using a monthly time unit so this is 120 periods. Here are some additional details on my data:

    Code:
    . xtset id mdate
           panel variable:  id (unbalanced)
            time variable:  mdate, 2005m2 to 2015m1, but with gaps
                    delta:  1 month
    
    xtdes
          id:  4006, 5003, ..., 6872003                          n =       4248
       mdate:  2005m2, 2005m3, ..., 2015m1                       T =        120
               Delta(mdate) = 1 month
               Span(mdate)  = 120 periods
               (id*mdate uniquely identifies each observation)
    
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                            93     120     120       120       120     120     120
    Based on a previous post where a similar question was asked, I have tried using the following command to look at whether individuals remain or drop out of the dataset over time:

    Code:
    . local i=1
    
    . while `i' <121 {
      2. bys id: egen test`i' = max(timeid == `i')
      3. egen flag`i' = tag(id)
      4. local i = `i' + 1
      5. }
    However, I am admittedly very unfamiliar with looping commands and I'm not sure whether this has achieved what I need it to do. What I essentially want to do is to compare between the months in my dataset to identify which individuals have dropped out over the timespan of my dataset (2005m2 - 2015m1), and subsequently calculate the attrition rate in my dataset.

    If anyone could offer any advice or guidance, it would be very much appreciated!

  • #2
    I'm not sure I understand "attrition rate" in your survey. Is it the case that once a panel leaves the survey, it is not found and readmitted in a later wave? In which case, we see that over 95% of your panels have data for wave 120, an extraordinarily low attrition rate in the way that I understand attrition rate and my experience with longitudinal surveys.

    With that said, if it would suffice for you to have a dataset with one observation for each panel containing the largest value of mdate for that panel, this would do it.
    Code:
    bysort id (mdate): keep if _n==_N

    Comment


    • #3
      Thanks for your input, William. To clarify, there is a possibility that an individual could leave the survey but be recontacted and readmitted in a later wave.

      Excuse my ignorance, but how are you able to tell that over 95% of my panels have data for wave 120?

      Comment


      • #4
        My assertion in post #2 was probably based on a bad guess about how your data are structured. The distribution of T_i tells us that fewer than 5% of your panels have less than 120 observations. But perhaps your data is such that each panel is included in each wave, with a separate indicator telling you whether than panel was actually interviewed in that wave. It's hard to tell, because you show us no sample data, nor have you explained how attrition rate is defined.

        Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post a small hand-made example, perhaps with just a say 5 panel members and 5 waves, and explain how you would expect the attrition rate to be calculated were you doing it by hand. In particular, please read FAQ #12 and use dataex when posting sample data to Statalist.

        Comment


        • #5
          Thanks for the pointers, William -- I will keep those in mind for any future postings. Here is a sample of my data:

          Code:
          input float(id time) byte wtrue
          6002 201405 0
          6002 201406 0
          6002 201407 0
          6002 201408 0
          6002 201409 0
          6002 201410 0
          6002 201411 0
          6002 201412 0
          6003      . .
          6004      . .
          6005      . .
          6006 200501 0
          6006 200502 0
          6006 200503 0
          6006 200504 0
          6006 200505 0
          6006 200506 0
          6006 200507 0
          6006 200508 0
          6006 200509 0
          6006 200510 0
          6006 200511 0
          6006 200512 0
          6006 200601 0
          end
          I included the "wtrue" (whether unemployed in month t) just as an indicator as to whether the individual was still part of the active sample or not. Also the time variable is just for ease of reading, I do have an alternative "mdate" variable that is in %tm format but does not read well in dataex.

          My attrition rate would be looking at how many of the individuals who were in the dataset in 2005m2 consistently remained in the dataset from 2005m2-2015m1. So, this would be the percentage of the individuals in 2005m2 who remained in the dataset until 2015m1. The purpose of calculating the attrition rate is to determine whether it is high enough for it to be a significant cause for concern (as far as I have researched, an attrition rate of >20% is an issue). Following this, my plan of action if the attrition rate is high enough to be problematic is to delete the attrited individuals from my dataset.

          Attrited individuals would be those who dropped out of the dataset at any point and never re-entered -- I want to identify these individuals and delete them from my dataset to eliminate attrition bias. This means that I essentially want to "keep" the individuals who were in my dataset and actively responding for all 120 periods in my dataset.

          I hope that this is sufficient information, please let me know if anything needs to be clarified.
          Last edited by Claire James; 28 Feb 2018, 08:23.

          Comment

          Working...
          X