Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a relative variable

    Hello Statalist,

    I am totally new to Stata, and I hope someone can help me on how to handle this task. My data looks like this:
    CFOANN CEOANN COMP ID YEAR
    CEO 2476.82 1072 2010
    915.192 1072 2010
    CFO 781.615 1072 2010
    698.464 1072 2010
    686.657 1072 2010
    CEO 1.691.361 1072 2011
    CFO 649.659 1072 2011
    619.669 1072 2011
    549.004 1072 2011
    548.804 1072 2011
    CFEOANN indicates whether the person is a CFO, and CEO indicates whether the person is CFO. COMP is the pay and ID is the company identifier.

    I want to generate a variable that contains for each separate year the CEO's portion of the aggregate pay to the top 5 executives, but not including the CFO in the denominator.
    When I have compensation data for more than five executives, use only the five highest paid executives. When I have compensation data for fewer than five executives, use the pay from the lowest paid executive in place of that of the missing executives.

    So in the example above the lowest compensations should be taken into consideration twice because the compensation of the CFO should be excluded.

    Any ideas on how to handle this? Thanks in advance!


  • #2
    Welcome to Statalist!

    There are a few issues that, if you can address, may make answering easier.

    1) Please take a moment to read and follow the advice in the FAQ (http://www.statalist.org/forums/help), particularly on how to provide a data example that can be directly input into Stata so that users here can make that data set and try out solutions before answering. See step 12 for details on how to use -dataex- for that. Ideally, example data are expressed like this:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3(CFOANN CEOANN) float(COMP ID YEAR)
    ""    "CEO"  247682 1072 2010
    ""    ""     915192 1072 2010
    "CFO" ""     781615 1072 2010
    ""    ""     698464 1072 2010
    ""    ""     686657 1072 2010
    ""    "CEO" 1691361 1072 2011
    "CFO" ""     649659 1072 2011
    ""    ""     619669 1072 2011
    ""    ""     549004 1072 2011
    ""    ""     548804 1072 2011
    end
    2) While providing the data, make sure there are enough cases that would allow the code's accuracy to be checked. In other words your current example is under-representing because in 2010 and 2011 there are only 1 CEO and 1 CFO, while you're asking for top 5 in each year. There will be no way to validate the code using this data.

    3) On the same vein, the missing in CFOANN and CEOANN should better be explained as well. Base on your logic, the lowest number should not just be taken twice, but four times because there was only one CEO in each year? I am not sure. I made some assumptions, hope this rather rudimentary solution would work:

    Code:
    clear
    input str3 CFOANN str3     CEOANN     COMP     ID     YEAR
    ""    "CEO"     247682     1072     2010
    ""    "CEO"    915192     1072     2010
    "CFO" ""        781615     1072     2010
    ""    "CEO"    698464     1072     2010
    ""    "CEO"    686657     1072     2010
    ""    "CEO"     1691361     1072     2011
    "CFO" ""        649659     1072     2011
    ""    "CEO"    1619669     1072     2011
    "CFO"    ""    549004     1072     2011
    ""    "CEO"    548804     1072     2011
    ""    "CEO"     169131     1072     2011
    "CFO" ""        644659     1072     2011
    ""    "CEO"    69669     1072     2011
    "CFO"    ""    54904     1072     2011
    ""    "CEO"    154804     1072     2011
    end
    
    dataex CFOANN CEOANN COMP ID YEAR
    
    * Generate an original sequence for later sorting
    gen seq = _n
    order seq
    
    * Compute the rank of compensation
    gen nCOMP = -COMP
    bysort YEAR CEOANN (nCOMP): gen comprank = _n * (CEOANN == "CEO") if _n <= 5
    * Compute how many legitimate top positions are in each year
    bysort YEAR CEOANN: egen allpos = max(comprank) if comprank > 0 & comprank < .
    
    * Compute the adjusted sums
    gen compsum = .
    replace compsum = COMP   if allpos == 5
    replace compsum = COMP   if allpos == 4 & comprank < 4
    replace compsum = COMP*2 if allpos == 4 & comprank == 4
    replace compsum = COMP   if allpos == 3 & comprank < 3
    replace compsum = COMP*3 if allpos == 3 & comprank == 3
    replace compsum = COMP   if allpos == 2 & comprank < 2
    replace compsum = COMP*4 if allpos == 2 & comprank == 2
    replace compsum = COMP*5 if allpos == 1 & comprank == 1
    
    * Compute the final sum
    bysort YEAR: egen wanted = total(compsum) if compsum < .
    sort seq
    drop nCOMP
    list, sep(0)
    Results:
    Code:
         +---------------------------------------------------------------------------------------+
         | seq   CFOANN   CEOANN      COMP     ID   YEAR   comprank   allpos   compsum    wanted |
         |---------------------------------------------------------------------------------------|
      1. |   1               CEO    247682   1072   2010          4        4    495364   2795677 |
      2. |   2               CEO    915192   1072   2010          1        4    915192   2795677 |
      3. |   3      CFO             781615   1072   2010          0        .         .         . |
      4. |   4               CEO    698464   1072   2010          2        4    698464   2795677 |
      5. |   5               CEO    686657   1072   2010          3        4    686657   2795677 |
      6. |   6               CEO   1691361   1072   2011          1        5   1691361   4183769 |
      7. |   7      CFO             649659   1072   2011          0        .         .         . |
      8. |   8               CEO   1619669   1072   2011          2        5   1619669   4183769 |
      9. |   9      CFO             549004   1072   2011          0        .         .         . |
     10. |  10               CEO    548804   1072   2011          3        5    548804   4183769 |
     11. |  11               CEO    169131   1072   2011          4        5    169131   4183769 |
     12. |  12      CFO             644659   1072   2011          0        .         .         . |
     13. |  13               CEO     69669   1072   2011          .        .         .         . |
     14. |  14      CFO              54904   1072   2011          0        .         .         . |
     15. |  15               CEO    154804   1072   2011          5        5    154804   4183769 |
         +---------------------------------------------------------------------------------------+
    Last edited by Ken Chui; 27 May 2021, 06:01.

    Comment


    • #3
      Ken Chui , thank you for taking the time to answer.

      I actually meant the portion of the CEO relative to the top 5 individuals excluding the CFO. So there's one CEO and one CFO each year and several individuals who are not CEO and not CFO.

      So ultimately for the year 2010, this would have to be the result of 2476.82 / (2476.82 + 915.192 + 698.464 + 686.657 + 686.657).

      The COMP of the CFO should be excluded from the computation and the observation 686.657 should be taken twice.

      Comment


      • #4
        Thanks. See the revision below. Notice that I added one more line of data to test if there are 5 instead of 4. Notice that this code is likely not going to work for your full data because the example only contains one company, but you can adopt the ideas if they'd be helpful.

        Code:
        clear
        input str3(CFOANN CEOANN) float(COMP ID YEAR)
        ""    "CEO"  247682 1072 2010
        ""    ""     915192 1072 2010
        "CFO" ""     781615 1072 2010
        ""    ""     698464 1072 2010
        ""    ""     686657 1072 2010
        ""    "CEO" 1691361 1072 2011
        "CFO" ""     649659 1072 2011
        ""    ""     619669 1072 2011
        ""    ""     549004 1072 2011
        ""    ""     548804 1072 2011
        ""    ""     348804 1072 2011
        end
        
        * Generate an original sequence for later sorting
        gen seq = _n
        order seq
        
        * Compute the rank of compensation
        gen nCOMP = -COMP
        bysort YEAR CEOANN (nCOMP): gen comprank = _n * (CFOANN == "") if _n <= 5
        * Compute how many legitimate top positions are in each year
        bysort YEAR: egen allpos = max(comprank) if comprank > 0 & comprank < .
        
        * Compute the adjusted sums
        gen compsum = .
        replace compsum = COMP   if allpos == 5
        replace compsum = COMP   if allpos == 4 & comprank < 4
        replace compsum = COMP*2 if allpos == 4 & comprank == 4
        replace compsum = COMP   if allpos == 3 & comprank < 3
        replace compsum = COMP*3 if allpos == 3 & comprank == 3
        replace compsum = COMP   if allpos == 2 & comprank < 2
        replace compsum = COMP*4 if allpos == 2 & comprank == 2
        replace compsum = COMP*5 if allpos == 1 & comprank == 1
        
        * Compute the final sum
        bysort YEAR: egen yearlycomp = total(compsum) if compsum < .
        
        * Compute the annual CEO salary, you may need to incorporate ID,
        * the data example is too short for me to test that
        by YEAR: egen ceocomp = max(COMP * (CEOANN == "CEO"))
        
        * Compute the final sum
        gen wanted = ceocomp / yearlycomp
        sort seq
        drop nCOMP
        list CEOANN-wanted, sep(0)
        Results:
        Code:
             +----------------------------------------------------------------------------------------------+
             | CEOANN      COMP     ID   YEAR   comprank   allpos   compsum   yearly~p   ceocomp     wanted |
             |----------------------------------------------------------------------------------------------|
          1. |    CEO    247682   1072   2010          1        4    247682    3234652    247682   .0765714 |
          2. |           915192   1072   2010          1        4    915192    3234652    247682   .0765714 |
          3. |           781615   1072   2010          0        .         .          .    247682          . |
          4. |           698464   1072   2010          3        4    698464    3234652    247682   .0765714 |
          5. |           686657   1072   2010          4        4   1373314    3234652    247682   .0765714 |
          6. |    CEO   1691361   1072   2011          1        5   1691361    3757642   1691361   .4501123 |
          7. |           649659   1072   2011          0        .         .          .   1691361          . |
          8. |           619669   1072   2011          2        5    619669    3757642   1691361   .4501123 |
          9. |           549004   1072   2011          3        5    549004    3757642   1691361   .4501123 |
         10. |           548804   1072   2011          4        5    548804    3757642   1691361   .4501123 |
         11. |           348804   1072   2011          5        5    348804    3757642   1691361   .4501123 |
             +----------------------------------------------------------------------------------------------+

        Comment

        Working...
        X