Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number of unique/distinct values over previous 5 years

    I am trying to calculate the number of unique values over the previous 5 years but having trouble.

    I think ssc rangestat would be the simplest solution, but I don't think it has the capability to count unique values.

    This topic is closely related but not exactly what I am trying to do.

    For example, if I wanted to calculate the number of unique suppliers for firms over the previous 5 years, is there a simple way to do that?

    Code:
    clear
    input float(firm supplier date)
    1 11 1999
    1 12 2001
    1 13 2002
    1 14 2000
    1 15 2004
    2 16 1980
    2 17 1981
    2 18 1982
    2 19 1985
    2 20 1987
    3 21 1980
    3 22 1981
    3 23 1982
    3 24 1985
    3 25 1987
    end

    If rangestat had the capability, the code would look something like:

    Code:
    rangestat distinct (supplier), interval(date -5 -1) by(firm)

    It seems I probably have to create groups based on years and firms and then do something like:

    Code:
    egen ndistinct = total(supplier), by(firm yeargroup)
    where yeargroup contains the previous 5 years.

    Is there a simple way to create that yeargroup variable?
    Last edited by Kyle Smith; 23 May 2018, 10:43.

  • #2
    Actually, I think the simplest way to do this is with -rangejoin-, by Robert Picard, and available from SSC.

    Code:
    tempfile copy
    save `copy'
    
    isid firm date    // VERIFY ONLY ONE OBS PER YEAR PER FIRM
    
    rangejoin date -5 -1 using `copy', by(firm)
    
    by firm date (supplier_U), sort: gen supplier_count = ///
        sum(supplier_U != supplier_U[_n-1] & !missing(supplier_U))
    by firm date: keep if _n == _N

    Comment


    • #3
      Clyde,

      Thank you very much. I had never heard of rangejoin before (although I know of Robert's other very useful commands; rangestat, etc).

      Your solution is a wonderful one.

      Thank you very much.

      Comment


      • #4
        This question has come up quite often, e.g.

        https://www.statalist.org/forums/for...nge-by-inrange

        https://www.statalist.org/forums/for...ns-across-time

        https://www.statalist.org/forums/for...n-past-xx-days

        and the short answer from those is to use rangerun (SSC).

        I was thinking only yesterday of adding count of distinct values as an extra statistic for rangestat. Robert Picard is first author, so I will see what he thinks.

        Authorship here is complicated, but documented.

        rangestat Picard, Cox, Ferrer

        rangerun Picard, Cox

        rangejoin Picard

        Comment


        • #5
          Nick,

          Thanks for the info. I need to search better.

          The second link you gave is a very nice solution using rangerun. (I had never heard of rangerun before. It is not always clear to know what exactly to search for.)

          Thank you very much for all your commands (rangestat, rangerun, etc). You are very much appreciated.

          Comment


          • #6
            Agreed: command names are often not predictable in advance.

            Comment

            Working...
            X