Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating the mean of a variable for the last two years

    Hi everyone
    I have a panel dataset containing the information of firms across 20 years. I have information on the CEOs of each firm such as the year they became CEO(becameceo_year), their unique identifier for each firm(co_per_rol), etc. Also, each firm is uniquely identified by a variable named gvkey.

    I need to calculate the mean Return on Assets (ROA) for the two years preceding the start of each CEO's tenure for every firm in my dataset.


    Given that each firm may have had multiple CEOs over the years, the challenge is to compute this mean ROA for the two years before each CEO's start year across all firms.

    This is the code I've been trying which I know does not work. I probably need to collapse the dataset and merge it back into the main dataset, but I do not know how to do it.




    * Step 1: Generate Flags for the 1-Year and 2-Year Lags
    gen pre_ceo1 = becameceo_year - 1
    gen pre_ceo2 = becameceo_year - 2

    * Step 2: Combine Flags for Both Years Before the CEO Became CEO
    gen is_pre_ceo = (year == pre_ceo1) | (year == pre_ceo2)

    * Step 3: Create a Separate Dataset for Calculations
    * Keep only observations from the 1-year and 2-year lags
    keep if is_pre_ceo == 1

    * Step 4: Calculate the Mean ROA for the Flagged Years by Firm
    bysort gvkey: egen mean_roa_pre_ceo = mean(roa1)

    * Step 5: Save This Dataset Temporarily
    tempfile pre_ceo_data
    * Drop observations from the year two years before the CEO started
    drop if year == pre_ceo2

    replace year = year + 1
    save pre_ceo_data.dta, replace

    * Reload your main dataset
    use aaaaa.dta, clear

    * Step 6: Merge the Mean ROA Back into the Main Dataset
    * Merge the mean ROA values for the pre-CEO years back into the main dataset
    merge 1:1 gvkey year using pre_ceo_data.dta, keep(master match update) keepusing(mean_roa_pre_ceo)


  • #2
    Hey Saba,

    please provide a data example using dataex from ssc. That will help us in providing a solution.

    Best,
    Sebastian

    Comment


    • #3
      actually, unless the OP is using an old version of Stata, -dataex- is part of official Stata and that version should be used; also, please read the FAQ on how to post so your results/output is easily readable as well as on using -dataex-

      Comment


      • #4
        Code:
        clear
        input co_per_rol year str5 ceo becameceo_year roa
        1 2010 a 2007 10
        1 2011 a 2007 20
        1 2012 a 2007 30
        1 2013 b 2013 40
        1 2014 b 2013 50
        1 2015 b 2013 60
        1 2016 b 2013 70
        1 2017 c 2017 80
        2 2009 d 2009 10
        2 2010 d 2009 20
        2 2011 d 2009 30
        2 2012 d 2009 40
        2 2013 e 2013 50
        2 2014 e 2013 60
        2 2015 e 2013 70
        3 2011 f 2010 10
        3 2012 f 2010 20
        3 2013 g 2013 30
        3 2014 g 2013 40
        3 2015 g 2013 50
        3 2016 f 2016 60
        3 2017 g 2017 70
        3 2018 g 2017 80
        end
        
        
        bysort co_per_rol (year): gen ceo_change = 1 if ceo != ceo[_n-1]
        
        bysort co_per_rol (year): gen mean_roa = (roa[_n-1] + roa[_n-2])/2 if ceo_change == 1
        
        list, sepby(co_per_rol)
        And this is the outcome:

        Code:
             +--------------------------------------------------------------+
             | co_per~l   year   ceo   became~r   roa   ceo_ch~e   mean_roa |
             |--------------------------------------------------------------|
          1. |        1   2010     a       2007    10          1          . |
          2. |        1   2011     a       2007    20          .          . |
          3. |        1   2012     a       2007    30          .          . |
          4. |        1   2013     b       2013    40          1         25 |
          5. |        1   2014     b       2013    50          .          . |
          6. |        1   2015     b       2013    60          .          . |
          7. |        1   2016     b       2013    70          .          . |
          8. |        1   2017     c       2017    80          1         65 |
             |--------------------------------------------------------------|
          9. |        2   2009     d       2009    10          1          . |
         10. |        2   2010     d       2009    20          .          . |
         11. |        2   2011     d       2009    30          .          . |
         12. |        2   2012     d       2009    40          .          . |
         13. |        2   2013     e       2013    50          1         35 |
         14. |        2   2014     e       2013    60          .          . |
         15. |        2   2015     e       2013    70          .          . |
             |--------------------------------------------------------------|
         16. |        3   2011     f       2010    10          1          . |
         17. |        3   2012     f       2010    20          .          . |
         18. |        3   2013     g       2013    30          1         15 |
         19. |        3   2014     g       2013    40          .          . |
         20. |        3   2015     g       2013    50          .          . |
         21. |        3   2016     f       2016    60          1         45 |
         22. |        3   2017     g       2017    70          1         55 |
         23. |        3   2018     g       2017    80          .          . |
             +--------------------------------------------------------------+
        Last edited by Ken Chui; 21 Feb 2024, 10:33.

        Comment


        • #5
          Thank you Ken,
          There are 2 issues with this code: 1. It generates missing for the entire dataset while I don't have missing in roa or other variables. I do not know how to fix it.
          2. it only calculates the mean of roa when there is a CEO change. However, I want to calculate it for all of my firms ( the unique identifier for each firm is gvkey).

          Comment


          • #6
            Thanks for your response, Sebastian!
            This is my date:

            input long gvkey double(co_per_rol year becameceo_year roa1)
            1045 1 1992 1985 -.04998396236501657
            1045 1 1993 1985 -.005691814136396564
            1045 1 1994 1985 .011700708200759519
            1045 1 1995 1985 .008539578645939865
            1045 1 1996 1985 .04956822949699956
            1045 1 1997 1985 .047095386086540764
            1045 3 1998 1998 .058915840918262116
            1045 3 1999 1998 .04041191433494708
            1045 3 2000 1998 .031015145156983177
            1045 3 2001 1998 -.05365244663682592
            1045 3 2002 1998 -.11600092509994384
            1078 6 1992 1989 .17850668094270242
            1078 6 1993 1989 .18197482522430375
            1078 6 1994 1989 .1779366624259537
            1078 6 1995 1989 .1794088337097799
            1078 6 1996 1989 .16916238225354135
            1078 6 1997 1989 .1736547708710373
            1078 6 1998 1989 .1765430838622229
            1161 11 1992 1969 .1691953911863517
            1161 11 1993 1969 .11858662855821828
            1161 11 1994 1969 .1248173326104325
            1161 11 1995 1969 .09914035974384318
            1161 11 1996 1969 -.021921715788372622
            1161 11 1997 1969 -.005999537446757305
            1161 11 1998 1969 -.024444105857368312
            1161 11 1999 1969 -.020315700169358415
            1161 11 2000 1969 .1704353615414023
            1161 11 2001 1969 -.010727537442170887
            1177 16 1992 1992 .0006227190967085871
            1177 16 1993 1992 -.003657636674430461
            1177 16 1994 1992 .004964294247259019
            1177 16 1995 1992 .0029849259460863315
            1177 16 1996 1992 .007006562059735516
            1177 16 1997 1992 .009386399668335406

            Comment


            • #7
              Originally posted by Saba Ahmadi View Post
              Thank you Ken,
              There are 2 issues with this code: 1. It generates missing for the entire dataset while I don't have missing in roa or other variables. I do not know how to fix it.
              2. it only calculates the mean of roa when there is a CEO change. However, I want to calculate it for all of my firms ( the unique identifier for each firm is gvkey).
              Thanks for the feedback. This comment, to me, does not go with what is asked in #1:

              I need to calculate the mean Return on Assets (ROA) for the two years preceding the start of each CEO's tenure for every firm in my dataset.
              For you to have 2 years preceding the start, that must mean a new is selected, correct? I'm confused why you thought you'd have data for the whole column. For similar reason I don't under the other point either.

              I think to make it easier, maybe you can modify your sample data in #6, add a column of data called "wanted" and show us you actually wanted. You don't need to complete the whole column, just a couple companies as an example should be enough.

              Comment

              Working...
              X