Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting unique number of observations in a panel dataset over firms AND years

    Dear reader,

    Currently, I have a dataset with the variables execid (executive id), cusip (firm id) and years (fyear)
    I would like to count the amount of executives of a firm in a given year.

    PHP Code:
    Cusip    execid    fyear
    000360    36419    2006
    000360    36416    2006
    000360    36417    2006
    000360    36418    2006
    000360    36415    2006
    000360    36418    2007
    000360    36416    2007
    000360    36419    2007
    000360    36415    2007
    000360    36417    2007
    000360    36418    2008
    000360    36417    2008
    000360    36419    2008
    000360    36415    2008
    000360    36416    2008
    000360    36419    2009
    000360    36416    2009
    000360    36417    2009
    000360    36418    2009 
    I want to create a new variable that counts the differcent execid. Current code:

    Code:
    by cusip execid (fyear), sort: gen byte exec_count = 1 if _n == 1 & inrange(fyear, 2002, 2009)
    by cusip (execid fyear): replace exec_count = sum(exec_count)
    by cusip (exec_count), sort: replace exec_count = exec_count[_N]
    This code is however wrong because its counts the distinct execid over ALL years from 2002 to 2009.
    I need stata to calculate per year which, from the example, would result in:
    For cusip 000360 in 2006: 5
    For cusip 000360 in 2007: 5
    For cusip 000360 in 2008: 5
    For cusip 000360 in 2009: 4

    I really hope someone knows the answer.

    Regards,
    Nicole

  • #2
    With no missing values, you can try the following

    Code:
    sort Cusip fyear execid
    by Cusip fyear: gen execid2= execid[_n-1]
    by Cusip fyear: egen exec_count= total(cond( execid== execid2, 0, 1 ))

    The rationale is that after sorting, the preceding observation (first lag) should differ from the current observation if the latter is distinct. With missing values

    Code:
    sort Cusip fyear execid
    by Cusip fyear: gen execid2= execid[_n-1]
    by Cusip fyear: gen count= cond( execid== execid2, 0, 1 )
    by Cusip fyear: replace count=0 if execid==.
    by Cusip fyear: egen exec_count= total(count)
    Last edited by Andrew Musau; 28 Jul 2016, 09:49.

    Comment


    • #3
      Well, this looks like simply:
      Code:
      contract Cusip fyear
      which results in
      Code:
           +-----------------------+
           | Cusip   fyear   _freq |
           |-----------------------|
        1. |   360    2006       5 |
        2. |   360    2007       5 |
        3. |   360    2008       5 |
        4. |   360    2009       4 |
           +-----------------------+
      exactly as required.

      Comment


      • #4
        contract Cusip fyear
        will give you a summary of the number of observations but not the distinct number of observations in a given year. To see this, expand Nicole's example


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(Cusip execid fyear)
        360 36419 2006
        360 36416 2006
        360 36416 2006
        360 36417 2006
        360 36418 2006
        360 36415 2006
        360 36418 2007
        360 36416 2007
        360 36419 2007
        360 36415 2007
        360 36417 2007
        360 36418 2008
        360     . 2008
        360 36417 2008
        360 36419 2008
        360 36415 2008
        360 36416 2008
        360 36419 2009
        360 36416 2009
        360 36417 2009
        360 36418 2009
        end
        Here, I add a duplicate observation in 2006 with execid==36416 and an observation with a missing value in 2008. Essentially, these two additional observations should not change the overall count but they do.

        Code:
        . contract Cusip fyear
        
        . l
        
             +-----------------------+
             | Cusip   fyear   _freq |
             |-----------------------|
          1. |   360    2006       6 |
          2. |   360    2007       5 |
          3. |   360    2008       6 |
          4. |   360    2009       4 |
             +-----------------------+

        Comment


        • #5
          Another way to count the number of distinct observations is to form groups (with duplicates) and tag the first observation. You can add additional conditions (e.g. to not count missing values for one or more variables).

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str6 Cusip float(execid fyear)
          "000360" 36419 2006
          "000360" 36416 2006
          "000360" 36416 2006
          "000360" 36417 2006
          "000360" 36418 2006
          "000360" 36415 2006
          "000360" 36418 2007
          "000360" 36416 2007
          "000360" 36419 2007
          "000360" 36415 2007
          "000360" 36417 2007
          "000360" 36418 2008
          "000360"     . 2008
          "000360" 36417 2008
          "000360" 36419 2008
          "000360" 36415 2008
          "000360" 36416 2008
          "000360" 36419 2009
          "000360" 36416 2009
          "000360" 36417 2009
          "000360" 36418 2009
          end
          
          bysort Cusip fyear execid: gen one = _n == 1 & !mi(execid)
          by Cusip fyear: egen xcount = total(one)
          list, sepby(Cusip fyear)

          Comment

          Working...
          X