Calculating attrition rate on panel data

Claire James

Join Date: Jan 2018

Posts: 31
#1

Calculating attrition rate on panel data

27 Feb 2018, 17:57

Hello,

I'm looking to calculate the attrition rate of my unbalanced panel dataset. It spans from 2005 to 2015, and I'm using a monthly time unit so this is 120 periods. Here are some additional details on my data:

Code:

. xtset id mdate panel variable: id (unbalanced) time variable: mdate, 2005m2 to 2015m1, but with gaps delta: 1 month xtdes id: 4006, 5003, ..., 6872003 n = 4248 mdate: 2005m2, 2005m3, ..., 2015m1 T = 120 Delta(mdate) = 1 month Span(mdate) = 120 periods (id*mdate uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 93 120 120 120 120 120 120

Based on a previous post where a similar question was asked, I have tried using the following command to look at whether individuals remain or drop out of the dataset over time:

Code:

. local i=1 . while `i' <121 { 2. bys id: egen test`i' = max(timeid == `i') 3. egen flag`i' = tag(id) 4. local i = `i' + 1 5. }

However, I am admittedly very unfamiliar with looping commands and I'm not sure whether this has achieved what I need it to do. What I essentially want to do is to compare between the months in my dataset to identify which individuals have dropped out over the timespan of my dataset (2005m2 - 2015m1), and subsequently calculate the attrition rate in my dataset.

If anyone could offer any advice or guidance, it would be very much appreciated!
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

27 Feb 2018, 18:16

I'm not sure I understand "attrition rate" in your survey. Is it the case that once a panel leaves the survey, it is not found and readmitted in a later wave? In which case, we see that over 95% of your panels have data for wave 120, an extraordinarily low attrition rate in the way that I understand attrition rate and my experience with longitudinal surveys.

With that said, if it would suffice for you to have a dataset with one observation for each panel containing the largest value of mdate for that panel, this would do it.

Code:

bysort id (mdate): keep if _n==_N
Comment
Claire James

Join Date: Jan 2018

Posts: 31
#3

28 Feb 2018, 04:41

Thanks for your input, William. To clarify, there is a possibility that an individual could leave the survey but be recontacted and readmitted in a later wave.

Excuse my ignorance, but how are you able to tell that over 95% of my panels have data for wave 120?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

28 Feb 2018, 06:16

My assertion in post #2 was probably based on a bad guess about how your data are structured. The distribution of T_i tells us that fewer than 5% of your panels have less than 120 observations. But perhaps your data is such that each panel is included in each wave, with a separate indicator telling you whether than panel was actually interviewed in that wave. It's hard to tell, because you show us no sample data, nor have you explained how attrition rate is defined.

Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post a small hand-made example, perhaps with just a say 5 panel members and 5 waves, and explain how you would expect the attrition rate to be calculated were you doing it by hand. In particular, please read FAQ #12 and use dataex when posting sample data to Statalist.
Comment
Claire James

Join Date: Jan 2018

Posts: 31
#5

28 Feb 2018, 08:04

Thanks for the pointers, William -- I will keep those in mind for any future postings. Here is a sample of my data:

Code:

input float(id time) byte wtrue 6002 201405 0 6002 201406 0 6002 201407 0 6002 201408 0 6002 201409 0 6002 201410 0 6002 201411 0 6002 201412 0 6003 . . 6004 . . 6005 . . 6006 200501 0 6006 200502 0 6006 200503 0 6006 200504 0 6006 200505 0 6006 200506 0 6006 200507 0 6006 200508 0 6006 200509 0 6006 200510 0 6006 200511 0 6006 200512 0 6006 200601 0 end

I included the "wtrue" (whether unemployed in month t) just as an indicator as to whether the individual was still part of the active sample or not. Also the time variable is just for ease of reading, I do have an alternative "mdate" variable that is in %tm format but does not read well in dataex.

My attrition rate would be looking at how many of the individuals who were in the dataset in 2005m2 consistently remained in the dataset from 2005m2-2015m1. So, this would be the percentage of the individuals in 2005m2 who remained in the dataset until 2015m1. The purpose of calculating the attrition rate is to determine whether it is high enough for it to be a significant cause for concern (as far as I have researched, an attrition rate of >20% is an issue). Following this, my plan of action if the attrition rate is high enough to be problematic is to delete the attrited individuals from my dataset.

Attrited individuals would be those who dropped out of the dataset at any point and never re-entered -- I want to identify these individuals and delete them from my dataset to eliminate attrition bias. This means that I essentially want to "keep" the individuals who were in my dataset and actively responding for all 120 periods in my dataset.

I hope that this is sufficient information, please let me know if anything needs to be clarified.

Last edited by Claire James; 28 Feb 2018, 08:23.
Comment

Announcement

Calculating attrition rate on panel data

Comment

Comment

Comment

Comment