Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating attrition rate

    Hi,

    I have an unbalanced panel dataset (N=2976, T=13), using survey responses.
    My dependent variable is the household's ability to save (saving=1 if able to save, 0 otherwise).
    hhid is the Household's unique identifier, and the data is yearly.

    Code:
    . xtset hhid year
           panel variable:  hhid (unbalanced)
            time variable:  year, 2004 to 2016, but with gaps
                    delta:  1 unit
    
    .
    . xtdes
    
        hhid:  6, 21, ..., 89972                                 n =       2976
        year:  2004, 2005, ..., 2016                             T =         13
               Delta(year) = 1 unit
               Span(year)  = 13 periods
               (hhid*year uniquely identifies each observation)
    
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                             1       1       1         3         7      13      13
    
         Freq.  Percent    Cum. |  Pattern
     ---------------------------+---------------
          224      7.53    7.53 |  ...........11
          206      6.92   14.45 |  1111111111111
          166      5.58   20.03 |  ............1
          145      4.87   24.90 |  1............
           94      3.16   28.06 |  ..........111
           81      2.72   30.78 |  ..........1..
           77      2.59   33.37 |  .1...........
           72      2.42   35.79 |  11...........
           62      2.08   37.87 |  ......1......
         1849     62.13  100.00 | (other patterns)
     ---------------------------+---------------
         2976    100.00         |  XXXXXXXXXXXXX
    I would like to calculate the attrition rate.
    Please could you help me to calculate the Frequencies in the table below?
    With Frequency, I wanted to show how many of the respondents who were first interviewed in 2004 were then interviewed in 2016.
    The same for 2005, 2006,...,2015.

    I wanted to create a table as such:

    Code:
    Year of 1st Interview | Frequency | Percent | Cumulative Percent
                2004      | ---       | ---     | ---  
                2005      | ---       | ---     | ---  
                2006      | ---       | ---     | ---  
                .......   | ---       | ---     | ---  
                2015      | ---       | ---     | 100.0
              Total         ---                   100.0

    Please let me know if further clarification is required.

    Thanks in advance
    Last edited by Rose Simmons; 06 Apr 2017, 14:53.

  • #2
    Hi,

    I tried:
    Code:
    . gen yr2004=0
    
    . replace yr2004=1 if year==2004
    (952 real changes made)
    
    . gen yr2016=0
    
    .
    . replace yr2016=1 if year==2016
    (1,245 real changes made)
    
    . tab hhid if yr2004==1 & yr2016==1
    no observations
    I was hoping to use this to see which individuals that were surveyed in 2004 (i.e. yr2004=1) were also surveyed in 2016.

    Looking at a part of the dataset, I see why the above didn't return what I had hoped:
    Code:
    hhid year  yr2004 yr2016
    6    2004    1    0
    6    2005    0    0
    6    2006    0    0
    6    2007    0    0
    6    2008    0    0
    6    2009    0    0
    6    2010    0    0
    6    2011    0    0
    6    2012    0    0
    6    2013    0    0
    6    2014    0    0
    6    2015    0    0
    6    2016    0    1
    Could someone please help me to calculate the frequency?
    I would like to create the frequency table in #1 first to explain my data, and then I will calculate the attrition rate afterwards - apologies if this was unclear
    Last edited by Rose Simmons; 10 Apr 2017, 04:35.

    Comment


    • #3
      I gather the goal is to calcualte the proportion of hhid's surveyed in 2004 that were also surveyed in 2016.

      Code:
      by hhid, sort: egen in_2004 = max(year == 2004)
      by hhid: egen in_2016 = max(year == 2016)
      egen flag = tag(hhid) // SELECT ONE OBSERVATION PER hhid
      tab in_2016 if in_2004 & flag
      In general, when working with panel data, when you need to create a variable that represents whether a panel unit has any observation that meets a certain condition, the code is -by panel, sort: egen ever_meets_condition = max(condition)-.

      The additional complication here is that every observation of in_2004 and in_2016 will be set to the same value. When you ultimately want to tabulate, you do not want to count each hhid in proportion to the number of times they participated in the survey--you want to count each only once. That-s what the -egen, tag()- command is for: it identifies a single observation from each hhid. The -tab- command then counts only the one identified observation in each hhid.
      Last edited by Clyde Schechter; 10 Apr 2017, 08:31. Reason: Correct error in code.

      Comment


      • #4
        Thank you Clyde Schechter for your help, particularly the -egen, tag()- command, as I was unsure of how to separate the observations by hhid.

        In general, when working with panel data, when you need to create a variable that represents whether a panel unit has any observation that meets a certain condition, the code is -by panel, sort: egen ever_meets_condition = max(condition)-.
        My understanding of the following code is that it creates a dummy variable in_2004 (which is 1 if there is an observation in 2004, and 0 otherwise).
        Q1: What does the "max" do?
        Code:
        by hhid, sort: egen in_2004 = max(year == 2004)

        Q2: Does the following suggest that, of the 1,245 hhid's surveyed in 2016, 977 were surveyed in 2006 (i.e. proportion 78.47%)?
        Code:
         by hhid, sort: egen in_2004 = max(year == 2004)
        
        . 
        . by hhid, sort: egen in_2005 = max(year == 2005)
        
        . 
        . by hhid, sort: egen in_2006 = max(year == 2006)
        
        . 
        . by hhid: egen in_2016 = max(year == 2016)
        
        . 
        . egen flag = tag(hhid) 
        
        . 
        . tab in_2016 if in_2004 & flag
        
            in_2016 |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |        622       65.34       65.34
                  1 |        330       34.66      100.00
        ------------+-----------------------------------
              Total |        952      100.00
        
        . 
        . tab in_2016 if in_2005 & flag
        
            in_2016 |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |        669       64.39       64.39
                  1 |        370       35.61      100.00
        ------------+-----------------------------------
              Total |      1,039      100.00
        
        . 
        . tab in_2016 if in_2006 & flag
        
            in_2016 |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |        589       60.29       60.29
                  1 |        388       39.71      100.00
        ------------+-----------------------------------
              Total |        977      100.00
        
        . tab year if year==2016
        
            year of |
             survey |      Freq.     Percent        Cum.
        ------------+-----------------------------------
               2016 |      1,245      100.00      100.00
        ------------+-----------------------------------
              Total |      1,245      100.00
        
        . di 977/1245
        .78473896
        Many thanks

        Comment


        • #5
          To confirm, yes, the goal is to calculate the proportion of hhid's surveyed in 2004, 2005, 2006 etc that were also surveyed in 2016.

          To follow up on Q2, the following suggests there were 977 hhid's surveyed in 2006:

          Code:
          . tab year if year==2006
          
              year of |
               survey |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                 2006 |        977      100.00      100.00
          ------------+-----------------------------------
                Total |        977      100.00
          This 977 figure is that same as:
          Code:
          . tab in_2016 if in_2006 & flag
          
              in_2016 |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |        589       60.29       60.29
                    1 |        388       39.71      100.00
          ------------+-----------------------------------
                Total |        977      100.00
          This suggests that all those who were surveyed in 2006 were still surveyed in 2016 - it is surprising that none seem to have dropped out

          Thanks
          Last edited by Rose Simmons; 10 Apr 2017, 10:05.

          Comment


          • #6
            Q1: What does the "max" do?
            Boolean expressions in Stata are evaluated as 0 if false and 1 if true. So for each observation, the expression condition will be variously zero or one accordingly. If any of the observations satisfies the condition, there will be a 1 in that observation. Any observations for which it is false will have zero. The maximum value of the expression will therefore be 1. On the other hand, if the condition is never satisfied, the expression will always evaluate to zero, and the maximum value is therefore also 0.

            Q2: Does the following suggest that, of the 1,245 hhid's surveyed in 2016, 977 were surveyed in 2006 (i.e. proportion 78.47%)?
            No. If you want the number of hhid's surveyed in 2016 that were also surveyed in 2006 the code would be
            Code:
            tab in_2006 if in_2016 & flag
            and examine the row of the output table where in_2006 is 1. There you will find the N and percent.

            Comment


            • #7
              Q1: I understand the use of -max- now
              Q2: I have corrected my mistake now

              Thank you very much

              Comment

              Working...
              X