Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I count the occurence of an event per year with panel data?

    I have panel data with individual identifier id and time variable year. The year of an event, such as getting married, is recorded for each individual, and for some individuals the event happens more than once in the sample period.

    Code:
    xttab eventyear
    does the job when a need a table depicting for how many people the event happened in each year.

    But I also need line plots showing the number of events per year (along with other characteristics of the individuals with an event). The natural way for me to go was collapsing the data, like this:

    Code:
    bys id (year): keep if _seq==1  // only keep the first obs for each individual (individuals with more than one event will also have _seq==1 more than once)
    collapse  (count) weddings=id  (mean) meaninc_grooms=income,  by(eventyear)
    However, as the screenshot below shows, the numbers differ and I do not see why this is so. What am I missing? Which is the correct way to get what I need, namely, weddings per year, in a way that I can draw a line graph?




  • #2
    Can we see the structure of your data: for example, the first five observations

    Code:
    list in 1/5
    1. What does "the event" mean? Is it getting married?
    2. You have not shown how you generate the _seq indicator variable.
    Last edited by Andrew Musau; 08 May 2015, 19:43.

    Comment


    • #3
      My data looks like this:

      id year sex _spell _seq _end eventyear gross_inc
      3 2001 0 1 1 0 1998 5000
      3 2002 0 1 2 0 1998 5000
      3 2003 0 1 3 0 1998 5000
      3 2004 0 1 4 0 1998 5000
      3 2005 0 1 5 0 1998 5000


      1. The event is the date when the person got married (I have the day and month but only use the year)
      2. The _seq indicator comes from
      Code:
      tsspell id

      Comment


      • #4
        Isabel: I slightly modify your data so that I have at least two events for an individual in the sample and replicate your steps. My counts match. A possibility is that when you generate the mean income of grooms, you delete observations of brides, and this causes a mismatch in the counts (e.g., gen meaninc_grooms= gross_inc if sex==0, for example). Try implementing my code on your data and see if you still have a mis-match

        Code:
        input id year sex _spell _seq _end eventyear gross_inc
        3 2001 0 1 1 0 1998 5000
        3 2002 0 1 2 0 1998 5000
        3 2003 0 1 3 0 1998 5000
        3 2004 0 1 4 0 1998 5000
        3 2005 0 1 5 0 1998 5000
        4 2001 0 1 1 1 1996 3000
        4 2002 0 1 2 1 1996 3000
        4 2003 0 1 3 1 2003 3000
        4 2004 0 1 4 1 2003 3000
        4 2005 0 1 5 1 2003 3000
        5 2001 0 1 1 0 1998 4000
        5 2002 0 1 2 0 1998 4000
        5 2003 0 1 3 0 1998 4000
        5 2004 0 1 4 0 1998 4000
        5 2005 0 1 5 0 1998 4000
        end
        
        xtset  id year
        sort id year
        egen freq= tag( id eventyear)
        
        
        . xttab  eventyear
        
                          Overall             Between            Within
        eventyear |    Freq.  Percent      Freq.  Percent        Percent
        ----------+-----------------------------------------------------
             1996 |       2     13.33         1     33.33          40.00
             1998 |      10     66.67         2     66.67         100.00
             2003 |       3     20.00         1     33.33          60.00
        ----------+-----------------------------------------------------
            Total |      15    100.00         4    133.33          75.00
                                         (n = 3)
        
        bys id (year): keep if  freq==1
        collapse  (count) weddings=id  (mean) meaninc_grooms= gross_inc,  by(eventyear)
        
        
        . list
        
             +--------------------------------+
             | eventy~r   weddings   meanin~s |
             |--------------------------------|
          1. |     1996          1       3000 |
          2. |     1998          2       4500 |
          3. |     2003          1       3000 |
             +--------------------------------+
        If there is still a mis-match, post again and we'll further try to diagnose the problem.
        Last edited by Andrew Musau; 11 May 2015, 13:38.

        Comment


        • #5
          Thank you for your reply Andrew, this was helpful. I found two problems with my prior code:

          1. Indeed there seemed to be a problem caused by selecting incomes. I am interested in the income in the year of the wedding, and this is not necessarily the first year of a spell ( I used year==eventyear). The workaround was to generate a new variable containing income in the year of the event.

          2. Using _seq instead of freq as qualifier to keep a single observation for each individual-spell lead to diverging results. The reason for the latter must have been some data cleaning AFTER having generated _seq, so generating freq instead did the trick.

          The sample code below solves the problem also with my original data:

          Code:
          . input id year sex _spell _seq _end eventyear gross_inc
          
                      id       year        sex     _spell       _seq       _end  eventyear  gross_inc
            1. 3 2001 0 1 1 0 1998 5000
            2. 3 2002 0 1 2 0 1998 3000
            3. 3 2003 0 1 3 0 1998 3500
            4. 3 2004 0 1 4 0 1998  600
            5. 3 2005 0 1 5 0 1998 2000
            6.
          . 4 2001 0 1 1 0 1996 6000
            7. 4 2002 0 1 2 1 1996 6700
            8. 4 2006 0 2 1 0 2006 6900
            9. 4 2007 0 2 2 0 2006 7000
           10. 4 2008 0 2 3 1 2006 7000
           11.
          . 5 2001 0 1 1 0 1998 4000
           12. 5 2002 0 1 2 0 1998 4000
           13. 5 2003 0 1 3 0 1998 5000
           14. 5 2004 0 1 4 0 1998 5000
           15. 5 2005 0 1 5 1 1998 5500
           16.
          . 6 2001 1 1 1 0 2002 .
           17. 6 2002 1 1 2 0 2002 300
           18. 6 2003 1 1 3 0 2002 600
           19. 6 2004 1 1 4 0 2002 800
           20. 6 2005 1 1 5 0 2002 300
           21. 6 2006 1 1 6 1 2002 1500
           22.
          . 7 2001 1 1 1 0 2002 600
           23. 7 2002 1 1 2 0 2002 1500
           24. 7 2003 1 1 3 0 2002 1600
           25. 7 2004 1 1 4 0 2002 1100
           26. 7 2005 1 1 5 0 2002 1200
           27. 7 2006 1 1 6 1 2002 1000
           28.
          . 8 2003 1 1 1 0 2002 1600
           29. 8 2004 1 1 2 0 2002 1100
           30. 8 2005 1 1 3 0 2002 1200
           31. 8 2006 1 1 4 0 2002 1000
           32. 8 2007 1 1 5 0 2002 1300
           33. 8 2008 1 1 6 1 2002 1400
           34. end
          
          .
          . xtset  id year
                 panel variable:  id (unbalanced)
                  time variable:  year, 2001 to 2008, but with a gap
                          delta:  1 unit
          
          .
          . sort id year
          
          . egen freq= tag( id eventyear)
          
          . xttab  eventyear
          
                            Overall             Between            Within
          eventyear |    Freq.  Percent      Freq.  Percent        Percent
          ----------+-----------------------------------------------------
               1996 |       2      6.06         1     16.67          40.00
               1998 |      10     30.30         2     33.33         100.00
               2002 |      18     54.55         3     50.00         100.00
               2006 |       3      9.09         1     16.67          60.00
          ----------+-----------------------------------------------------
              Total |      33    100.00         7    116.67          85.71
                                           (n = 6)
          
          .
          . gen yr_event=(year==eventyear)
          
          . replace yr_event=yr_event*(-1)
          (3 real changes made)
          
          . gen gross_inc_eventyear=.
          (33 missing values generated)
          
          . bys id (year): replace gross_inc_eventyear=gross_inc if year==eventyear
          (3 real changes made)
          
          . bys id _spell (yr_event): replace yr_event=yr_event[1]
          (12 real changes made)
          
          . bys id (year): replace gross_inc_eventyear=gross_inc_eventyear[_n-1] if yr_event==-1 & gross_inc_eventye
          > ar==.
          (10 real changes made)
          
          . bys id (year): replace gross_inc_eventyear=gross_inc_eventyear[_n+1] if yr_event==-1 & gross_inc_eventye
          > ar==. // note: may need to repeat this step several times until 0 changes made
          (2 real changes made)
          
          .
          . xttab  eventyear
          
                            Overall             Between            Within
          eventyear |    Freq.  Percent      Freq.  Percent        Percent
          ----------+-----------------------------------------------------
               1996 |       2      6.06         1     16.67          40.00
               1998 |      10     30.30         2     33.33         100.00
               2002 |      18     54.55         3     50.00         100.00
               2006 |       3      9.09         1     16.67          60.00
          ----------+-----------------------------------------------------
              Total |      33    100.00         7    116.67          85.71
                                           (n = 6)
          
          .
          . bys id (year): keep if  freq==1
          (26 observations deleted)
          
          . collapse  (count) weddings=id  (mean) meaninc_grooms= gross_inc_eventyear,  by(eventyear)
          
          .
          . list
          
               +--------------------------------+
               | eventy~r   weddings   meanin~s |
               |--------------------------------|
            1. |     1996          1          . |
            2. |     1998          2          . |
            3. |     2002          3        900 |
            4. |     2006          1       6900 |
               +--------------------------------+

          Comment

          Working...
          X