Dear Statalist Members,
I am currently trying to compute a conditional median of bond ratings in a panel dataset of announcements and transactions.
For each announcement (identified by non-missing anndats_act), I want to compute the median rating_filled taking into account each bond (identified by cusip_id) of the announcing firm (by oftic) traded within the last three months. The rating_filled might be updated during the three month period, so only the latest rating_filled should be considered in the median.
I found a solution that is much too slow for my dataset with over 13 million observations (of which over 50 thousand announcements).
My data set (with only the variables relevant to this problem) looks like this:
Does anyone know of a solution to increase the efficiency of my code, or even to avoid the generally slow combination of looping and preserve-restore?
Thank you very much,
Laura
I am currently trying to compute a conditional median of bond ratings in a panel dataset of announcements and transactions.
For each announcement (identified by non-missing anndats_act), I want to compute the median rating_filled taking into account each bond (identified by cusip_id) of the announcing firm (by oftic) traded within the last three months. The rating_filled might be updated during the three month period, so only the latest rating_filled should be considered in the median.
I found a solution that is much too slow for my dataset with over 13 million observations (of which over 50 thousand announcements).
Code:
* Create a variable to store the median firm rating gen median_rating_filled = . * Loop through each announcement forvalues i = 1/`=_N' { if !missing(anndats_act[`i']) { local oftic = oftic[`i'] local anndate = anndats_act[`i'] * Keep only the latest observation per cusip_id within 3 months prior to the announcement. Exclude announcement day because of possible induced rating changes. preserve keep if oftic == "`oftic'" & inrange(trd_exctn_dt, `anndate'-90, `anndate'-1) by cusip_id (trd_exctn_dt), sort: keep if _n == _N * Calculate the median rating_filled egen median_rating_temp = median(rating_filled) local median_rating = median_rating_temp[1] * Store the median rating in the original dataset restore replace median_rating_filled = `median_rating' in `i' } }
Code:
clear input str10 cusip_id float trd_exctn_dt str6 oftic long anndats_act float rating_filled "69349LAS7" 22386 "PNC" . 7 "693475AX3" 22386 "PNC" . 7 "693475AQ8" 22386 "PNC" . 9 "693475AK1" 22386 "PNC" . 9 "693476BN2" 22386 "PNC" . 7 "69349LAG3" 22386 "PNC" . 7 "69349LAK4" 22386 "PNC" . 7 "693475AW5" 22386 "PNC" . 7 "693475AM7" 22386 "PNC" . 9 "" 22386 "PNC" 22386 . "693475AL9" 22386 "PNC" . 7 "69349LAM0" 22386 "PNC" . 7 end format %td trd_exctn_dt format %td anndats_act
Thank you very much,
Laura
Comment