Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to count distinct observations by group?

    Dear experts,

    I have a dataset like below. I'd like to count distinct firm numbers by province. For example, although I have six observations in province A in total, I just want to count the distinct firm number by the province which means there should be only three distinct firms (11, 12, 13) in province A. Thank you very much for helping!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year byte firm str1 province
    2001 11 "A"
    2002 11 "A"
    2003 11 "A"
    2001 12 "A"
    2002 13 "A"
    2002 13 "A"
    end

  • #2
    Maybe
    Code:
    bysort province (firm): egen long tot = max(sum(firm != firm[_n-1]))

    Comment


    • #3
      Nah, that's too fragile. Try this, instead, with the new use case. You could use -frames- to avoid overwriting your original dataset if desired.

      .ÿ
      .ÿversionÿ16.0

      .ÿ
      .ÿclearÿ*

      .ÿ
      .ÿinputÿintÿyearÿbyteÿfirmÿstr1ÿprovince

      ÿÿÿÿÿÿÿÿÿyearÿÿÿÿÿÿfirmÿÿÿprovince
      ÿÿ1.ÿ2001ÿ11ÿ"A"
      ÿÿ2.ÿ2002ÿ11ÿ"A"
      ÿÿ3.ÿ2003ÿ11ÿ"A"
      ÿÿ4.ÿ2001ÿ12ÿ"A"
      ÿÿ5.ÿ2002ÿ13ÿ"A"
      ÿÿ6.ÿ2002ÿ13ÿ"A"
      ÿÿ7.ÿ2001ÿ13ÿ"B"
      ÿÿ8.ÿend

      .ÿ
      .ÿcontractÿprovinceÿfirm,ÿfreq(tot)

      .ÿquietlyÿbyÿprovince:ÿreplaceÿtotÿ=ÿ_N

      .ÿ
      .ÿlist,ÿnoobsÿsepby(province)ÿabbreviate(20)

      ÿÿ+-----------------------+
      ÿÿ|ÿfirmÿÿÿprovinceÿÿÿtotÿ|
      ÿÿ|-----------------------|
      ÿÿ|ÿÿÿ11ÿÿÿÿÿÿÿÿÿÿAÿÿÿÿÿ3ÿ|
      ÿÿ|ÿÿÿ12ÿÿÿÿÿÿÿÿÿÿAÿÿÿÿÿ3ÿ|
      ÿÿ|ÿÿÿ13ÿÿÿÿÿÿÿÿÿÿAÿÿÿÿÿ3ÿ|
      ÿÿ|-----------------------|
      ÿÿ|ÿÿÿ13ÿÿÿÿÿÿÿÿÿÿBÿÿÿÿÿ1ÿ|
      ÿÿ+-----------------------+

      .ÿ
      .ÿexit

      endÿofÿdo-file


      .

      Comment


      • #4
        Here's a way of doing it in place:


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int year byte firm str1 province
        2001 11 "A"
        2002 11 "A"
        2003 11 "A"
        2001 12 "A"
        2002 13 "A"
        2002 13 "A"
        end
        
        egen tag = tag(firm province)
        egen ndistinct = total(tag), by(province)
        
        tabdisp province, c(ndistinct)
        
        ----------------------
         province |  ndistinct
        ----------+-----------
                A |          3
        ----------------------
        This comes up frequently. See e.g. https://www.statalist.org/forums/for...an-id-by-group and the links it gives.

        Comment


        • #5
          Nick Cox Great! Works pretty well. Thank you so much, Nick!

          Comment


          • #6
            Joseph Coveney Thank you so much, Joseph!

            Comment

            Working...
            X