Hi everyone
I have a panel dataset containing the information of firms across 20 years. I have information on the CEOs of each firm such as the year they became CEO(becameceo_year), their unique identifier for each firm(co_per_rol), etc. Also, each firm is uniquely identified by a variable named gvkey.
I need to calculate the mean Return on Assets (ROA) for the two years preceding the start of each CEO's tenure for every firm in my dataset.
Given that each firm may have had multiple CEOs over the years, the challenge is to compute this mean ROA for the two years before each CEO's start year across all firms.
This is the code I've been trying which I know does not work. I probably need to collapse the dataset and merge it back into the main dataset, but I do not know how to do it.
* Step 1: Generate Flags for the 1-Year and 2-Year Lags
gen pre_ceo1 = becameceo_year - 1
gen pre_ceo2 = becameceo_year - 2
* Step 2: Combine Flags for Both Years Before the CEO Became CEO
gen is_pre_ceo = (year == pre_ceo1) | (year == pre_ceo2)
* Step 3: Create a Separate Dataset for Calculations
* Keep only observations from the 1-year and 2-year lags
keep if is_pre_ceo == 1
* Step 4: Calculate the Mean ROA for the Flagged Years by Firm
bysort gvkey: egen mean_roa_pre_ceo = mean(roa1)
* Step 5: Save This Dataset Temporarily
tempfile pre_ceo_data
* Drop observations from the year two years before the CEO started
drop if year == pre_ceo2
replace year = year + 1
save pre_ceo_data.dta, replace
* Reload your main dataset
use aaaaa.dta, clear
* Step 6: Merge the Mean ROA Back into the Main Dataset
* Merge the mean ROA values for the pre-CEO years back into the main dataset
merge 1:1 gvkey year using pre_ceo_data.dta, keep(master match update) keepusing(mean_roa_pre_ceo)
I have a panel dataset containing the information of firms across 20 years. I have information on the CEOs of each firm such as the year they became CEO(becameceo_year), their unique identifier for each firm(co_per_rol), etc. Also, each firm is uniquely identified by a variable named gvkey.
I need to calculate the mean Return on Assets (ROA) for the two years preceding the start of each CEO's tenure for every firm in my dataset.
Given that each firm may have had multiple CEOs over the years, the challenge is to compute this mean ROA for the two years before each CEO's start year across all firms.
This is the code I've been trying which I know does not work. I probably need to collapse the dataset and merge it back into the main dataset, but I do not know how to do it.
* Step 1: Generate Flags for the 1-Year and 2-Year Lags
gen pre_ceo1 = becameceo_year - 1
gen pre_ceo2 = becameceo_year - 2
* Step 2: Combine Flags for Both Years Before the CEO Became CEO
gen is_pre_ceo = (year == pre_ceo1) | (year == pre_ceo2)
* Step 3: Create a Separate Dataset for Calculations
* Keep only observations from the 1-year and 2-year lags
keep if is_pre_ceo == 1
* Step 4: Calculate the Mean ROA for the Flagged Years by Firm
bysort gvkey: egen mean_roa_pre_ceo = mean(roa1)
* Step 5: Save This Dataset Temporarily
tempfile pre_ceo_data
* Drop observations from the year two years before the CEO started
drop if year == pre_ceo2
replace year = year + 1
save pre_ceo_data.dta, replace
* Reload your main dataset
use aaaaa.dta, clear
* Step 6: Merge the Mean ROA Back into the Main Dataset
* Merge the mean ROA values for the pre-CEO years back into the main dataset
merge 1:1 gvkey year using pre_ceo_data.dta, keep(master match update) keepusing(mean_roa_pre_ceo)
Comment