Calculating the mean of a variable for the last two years

Saba Ahmadi

Join Date: Feb 2024

Posts: 3
#1

Calculating the mean of a variable for the last two years

20 Feb 2024, 19:23

Hi everyone
I have a panel dataset containing the information of firms across 20 years. I have information on the CEOs of each firm such as the year they became CEO(becameceo_year), their unique identifier for each firm(co_per_rol), etc. Also, each firm is uniquely identified by a variable named gvkey.

I need to calculate the mean Return on Assets (ROA) for the two years preceding the start of each CEO's tenure for every firm in my dataset.

Given that each firm may have had multiple CEOs over the years, the challenge is to compute this mean ROA for the two years before each CEO's start year across all firms.

This is the code I've been trying which I know does not work. I probably need to collapse the dataset and merge it back into the main dataset, but I do not know how to do it.

* Step 1: Generate Flags for the 1-Year and 2-Year Lags
gen pre_ceo1 = becameceo_year - 1
gen pre_ceo2 = becameceo_year - 2

* Step 2: Combine Flags for Both Years Before the CEO Became CEO
gen is_pre_ceo = (year == pre_ceo1) | (year == pre_ceo2)

* Step 3: Create a Separate Dataset for Calculations
* Keep only observations from the 1-year and 2-year lags
keep if is_pre_ceo == 1

* Step 4: Calculate the Mean ROA for the Flagged Years by Firm
bysort gvkey: egen mean_roa_pre_ceo = mean(roa1)

* Step 5: Save This Dataset Temporarily
tempfile pre_ceo_data
* Drop observations from the year two years before the CEO started
drop if year == pre_ceo2

replace year = year + 1
save pre_ceo_data.dta, replace

* Reload your main dataset
use aaaaa.dta, clear

* Step 6: Merge the Mean ROA Back into the Main Dataset
* Merge the mean ROA values for the pre-CEO years back into the main dataset
merge 1:1 gvkey year using pre_ceo_data.dta, keep(master match update) keepusing(mean_roa_pre_ceo)
Tags: None
Sebastian Schirner

Join Date: Jan 2023

Posts: 53
#2

21 Feb 2024, 09:10

Hey Saba,

please provide a data example using dataex from ssc. That will help us in providing a solution.

Best,
Sebastian
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4409
#3

21 Feb 2024, 09:48

actually, unless the OP is using an old version of Stata, -dataex- is part of official Stata and that version should be used; also, please read the FAQ on how to post so your results/output is easily readable as well as on using -dataex-
Comment

Ken Chui

Join Date: Aug 2014
Posts: 1054

21 Feb 2024, 10:30

Code:

clear
input co_per_rol year str5 ceo becameceo_year roa
1 2010 a 2007 10
1 2011 a 2007 20
1 2012 a 2007 30
1 2013 b 2013 40
1 2014 b 2013 50
1 2015 b 2013 60
1 2016 b 2013 70
1 2017 c 2017 80
2 2009 d 2009 10
2 2010 d 2009 20
2 2011 d 2009 30
2 2012 d 2009 40
2 2013 e 2013 50
2 2014 e 2013 60
2 2015 e 2013 70
3 2011 f 2010 10
3 2012 f 2010 20
3 2013 g 2013 30
3 2014 g 2013 40
3 2015 g 2013 50
3 2016 f 2016 60
3 2017 g 2017 70
3 2018 g 2017 80
end


bysort co_per_rol (year): gen ceo_change = 1 if ceo != ceo[_n-1]

bysort co_per_rol (year): gen mean_roa = (roa[_n-1] + roa[_n-2])/2 if ceo_change == 1

list, sepby(co_per_rol)

And this is the outcome:

Code:

     +--------------------------------------------------------------+
     | co_per~l   year   ceo   became~r   roa   ceo_ch~e   mean_roa |
     |--------------------------------------------------------------|
  1. |        1   2010     a       2007    10          1          . |
  2. |        1   2011     a       2007    20          .          . |
  3. |        1   2012     a       2007    30          .          . |
  4. |        1   2013     b       2013    40          1         25 |
  5. |        1   2014     b       2013    50          .          . |
  6. |        1   2015     b       2013    60          .          . |
  7. |        1   2016     b       2013    70          .          . |
  8. |        1   2017     c       2017    80          1         65 |
     |--------------------------------------------------------------|
  9. |        2   2009     d       2009    10          1          . |
 10. |        2   2010     d       2009    20          .          . |
 11. |        2   2011     d       2009    30          .          . |
 12. |        2   2012     d       2009    40          .          . |
 13. |        2   2013     e       2013    50          1         35 |
 14. |        2   2014     e       2013    60          .          . |
 15. |        2   2015     e       2013    70          .          . |
     |--------------------------------------------------------------|
 16. |        3   2011     f       2010    10          1          . |
 17. |        3   2012     f       2010    20          .          . |
 18. |        3   2013     g       2013    30          1         15 |
 19. |        3   2014     g       2013    40          .          . |
 20. |        3   2015     g       2013    50          .          . |
 21. |        3   2016     f       2016    60          1         45 |
 22. |        3   2017     g       2017    70          1         55 |
 23. |        3   2018     g       2017    80          .          . |
     +--------------------------------------------------------------+

Last edited by Ken Chui; 21 Feb 2024, 10:33.

Comment

Saba Ahmadi

Join Date: Feb 2024

Posts: 3
#5

22 Feb 2024, 09:58

Thank you Ken,
There are 2 issues with this code: 1. It generates missing for the entire dataset while I don't have missing in roa or other variables. I do not know how to fix it.
2. it only calculates the mean of roa when there is a CEO change. However, I want to calculate it for all of my firms ( the unique identifier for each firm is gvkey).
Comment
Saba Ahmadi

Join Date: Feb 2024

Posts: 3
#6

22 Feb 2024, 10:07

Thanks for your response, Sebastian!
This is my date:

input long gvkey double(co_per_rol year becameceo_year roa1)
1045 1 1992 1985 -.04998396236501657
1045 1 1993 1985 -.005691814136396564
1045 1 1994 1985 .011700708200759519
1045 1 1995 1985 .008539578645939865
1045 1 1996 1985 .04956822949699956
1045 1 1997 1985 .047095386086540764
1045 3 1998 1998 .058915840918262116
1045 3 1999 1998 .04041191433494708
1045 3 2000 1998 .031015145156983177
1045 3 2001 1998 -.05365244663682592
1045 3 2002 1998 -.11600092509994384
1078 6 1992 1989 .17850668094270242
1078 6 1993 1989 .18197482522430375
1078 6 1994 1989 .1779366624259537
1078 6 1995 1989 .1794088337097799
1078 6 1996 1989 .16916238225354135
1078 6 1997 1989 .1736547708710373
1078 6 1998 1989 .1765430838622229
1161 11 1992 1969 .1691953911863517
1161 11 1993 1969 .11858662855821828
1161 11 1994 1969 .1248173326104325
1161 11 1995 1969 .09914035974384318
1161 11 1996 1969 -.021921715788372622
1161 11 1997 1969 -.005999537446757305
1161 11 1998 1969 -.024444105857368312
1161 11 1999 1969 -.020315700169358415
1161 11 2000 1969 .1704353615414023
1161 11 2001 1969 -.010727537442170887
1177 16 1992 1992 .0006227190967085871
1177 16 1993 1992 -.003657636674430461
1177 16 1994 1992 .004964294247259019
1177 16 1995 1992 .0029849259460863315
1177 16 1996 1992 .007006562059735516
1177 16 1997 1992 .009386399668335406
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1054
#7

22 Feb 2024, 13:19

Originally posted by Saba Ahmadi View Post

Thank you Ken,
There are 2 issues with this code: 1. It generates missing for the entire dataset while I don't have missing in roa or other variables. I do not know how to fix it.
2. it only calculates the mean of roa when there is a CEO change. However, I want to calculate it for all of my firms ( the unique identifier for each firm is gvkey).

Thanks for the feedback. This comment, to me, does not go with what is asked in #1:

I need to calculate the mean Return on Assets (ROA) for the two years preceding the start of each CEO's tenure for every firm in my dataset.

For you to have 2 years preceding the start, that must mean a new is selected, correct? I'm confused why you thought you'd have data for the whole column. For similar reason I don't under the other point either.

I think to make it easier, maybe you can modify your sample data in #6, add a column of data called "wanted" and show us you actually wanted. You don't need to complete the whole column, just a couple companies as an example should be enough.
Comment

Announcement

Calculating the mean of a variable for the last two years

Comment

Comment

Comment

Comment

Comment

Comment