Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting Repeated Observations for Each ID

    Dear All,

    I have a panel dataset for individual persons that looks like the following:



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(ID Year Income)
    1 2001 10000
    1 2003 20000
    1 2005 15000
    2 2001  1500
    2 2003   100
    2 2005   200
    2 2007   300
    2 2009   400
    3 2001  5000
    3 2003   600
    4 2001   700
    4 2003   120
    4 2005  1500
    4 2007   100
    4 2009   200
    4 2011   300
    5 2001   400
    5 2003 20000
    5 2005 15000
    end

    I would like to determine how many times each unique ID is repeated (i.e., how many years of data I have for each ID). The output I am looking for should include an additional column that shows the count of years for each ID, like so:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(ID Year Income Repeat)
    1 2001 10000 3
    1 2003 20000 3
    1 2005 15000 3
    2 2001  1500 5
    2 2003   100 5
    2 2005   200 5
    2 2007   300 5
    2 2009   400 5
    3 2001  5000 2
    3 2003   600 2
    4 2001   700 6
    4 2003   120 6
    4 2005  1500 6
    4 2007   100 6
    4 2009   200 6
    4 2011   300 6
    5 2001   400 3
    5 2003 20000 3
    5 2005 15000 3
    end

    The example above shows one way I could calculate how many years of consistent data I have for each unique ID. However, if you have any other method or more efficient approach to achieve this, please let me know.

    Thanks in advance for your help!

  • #2
    Code:
    by ID (Year Income), sort:  gen repeat = _N

    Comment


    • #3
      Thanks a lot! It worked!
      However, what if I have many other variables besides income for each observation, will that change the code?

      Comment


      • #4
        Well, it depends. You could do this instead:
        Code:
        sort ID, stable
        by ID: gen repeat = _N
        The variables other than ID don't really matter in terms of getting the result. The reason I listed them in the code in #2 was to leave the data sorted in a reasonable way. The code here will leave the data sorted by ID and exactly as they already are within IDs, which is also reasonable. You could do just plain old -bysort ID: gen repeat = _N-, but this could leave your data sorted in some bizarre, and irreproducible, order within IDs. Since this looks like panel data, the sort order might ultimately matter, so I'm trying to keep things in order.

        Comment


        • #5
          Thank you very much!

          Comment

          Working...
          X