Hi,
I am converting a monthly panel dataset on government bonds with approximately 300, 000 observations to cross-sectional.
For this purpose I would like to capture the fluctuation of the interest rate across time for each bond in format [rate1];[rate2];[rate3] in one variable.
So I would like to combine all the interest rates from the panel to one variable by ID (concat vertically in a way).
I*ve modelled the code partly based on this: https://www.statalist.org/forums/for...vations-per-id and https://www.statalist.org/forums/for...g-observations
I'm having problems writing the proper code to get the results I want. With the current code I'm not able to capture the fluctuation fo the coupon rate - so let's say for one bond the rate would start from 1%, increase to 3% and decrease again to 1%.
What I get:,
What I would like to have:
A sample of the data:
-------
In this sample, the ID 10797 is problematic because of the increase from 5 to 6 and decrease back to 5.
My code:
Hope the problem is explained ok, would really appreciate any advice!
I am converting a monthly panel dataset on government bonds with approximately 300, 000 observations to cross-sectional.
For this purpose I would like to capture the fluctuation of the interest rate across time for each bond in format [rate1];[rate2];[rate3] in one variable.
So I would like to combine all the interest rates from the panel to one variable by ID (concat vertically in a way).
I*ve modelled the code partly based on this: https://www.statalist.org/forums/for...vations-per-id and https://www.statalist.org/forums/for...g-observations
I'm having problems writing the proper code to get the results I want. With the current code I'm not able to capture the fluctuation fo the coupon rate - so let's say for one bond the rate would start from 1%, increase to 3% and decrease again to 1%.
What I get:,
Code:
1;3
Code:
1;3;1
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int id str7 tm int y str3 coupon str11 intfxall 10797 "1845m10" 1845 "5" "5;6" 10797 "1845m11" 1845 "5" "5;6" 10797 "1845m12" 1845 "5" "5;6" 10797 "1846m1" 1846 "5" "5;6" 10797 "1846m2" 1846 "5" "5;6" 10797 "1846m3" 1846 "5" "5;6" 10797 "1846m4" 1846 "5" "5;6" 10797 "1846m5" 1846 "5" "5;6" 10797 "1846m6" 1846 "5" "5;6" 10797 "1846m7" 1846 "6" "5;6" 10797 "1846m8" 1846 "6" "5;6" 10797 "1846m9" 1846 "6" "5;6" 10797 "1846m10" 1846 "6" "5;6" 10797 "1846m11" 1846 "6" "5;6" 10797 "1846m12" 1846 "6" "5;6" 10797 "1847m1" 1847 "6" "5;6" 10797 "1847m2" 1847 "6" "5;6" 10797 "1847m3" 1847 "6" "5;6" 10797 "1847m4" 1847 "5" "5;6" 10797 "1847m5" 1847 "5" "5;6" 10797 "1847m6" 1847 "5" "5;6" 11530 "1899m10" 1899 "1,5" "1.5;2;2.5;3" 11530 "1899m11" 1899 "1,5" "1.5;2;2.5;3" 11530 "1899m12" 1899 "2" "1.5;2;2.5;3" 11530 "1900m1" 1900 "2" "1.5;2;2.5;3" 11530 "1900m2" 1900 "2" "1.5;2;2.5;3" 11530 "1900m3" 1900 "2" "1.5;2;2.5;3" 11530 "1900m4" 1900 "2" "1.5;2;2.5;3" 11530 "1900m5" 1900 "2,5" "1.5;2;2.5;3" 11530 "1900m6" 1900 "2,5" "1.5;2;2.5;3" 11530 "1900m7" 1900 "3" "1.5;2;2.5;3" 11530 "1900m8" 1900 "3" "1.5;2;2.5;3" 11530 "1900m9" 1900 "3" "1.5;2;2.5;3" end
In this sample, the ID 10797 is problematic because of the increase from 5 to 6 and decrease back to 5.
My code:
Code:
** Creating a dummy variable coupon_2_diff for Identifying bonds with variance in coupon rate (1 if variance, 0 if no variance) ds coupon_2 foreach v of varlist `r(varlist)' { by ID (`v'), sort: gen byte `v'_diff = `v'[1] != `v'[_N] } tsset ID tm ** 1. Concatenate the year of change in coupon rate with the coupon rate of that year (* I actually don't want to have the year of change in the final variable but without this step, the order of the interest rates does no respect the timing of their occurrence) foreach v of varlist coupon { by ID `v' (tm), sort : gen `v'_vals = _n == 1 gen `v'_temp = `v' if `v'_vals ==1 & `v'_2_diff ==1 egen `v'_temp2 = concat(y `v'_temp), punct(:) replace `v'_temp2 = "" if missing(`v'_temp) } **2. Create variable couponChange which records the change in coupon rate in format YYYY: [rate1]; YYYY: [rate2] etc. foreach v of varlist coupon { bysort ID (`v'_temp2) : gen `v'Change = `v'_temp2 if _n == 1 by ID : replace `v'Change = `v'Change[_n-1] + ";" + cond(`v'_temp2 != `v'_temp2[_n-1], `v'_temp2, "") if _n > 1 & missing(`v'_temp2)==0 by ID: replace `v'Change = `v'Change[_N] replace `v'Change = subinstr(`v'Change, ";", "", 1) if regexm(`v'Change, "^;") drop `v'_temp2 `v'_vals `v'_temp } ** 3. Split the resulting variable in order to get rid of the years split couponChange, parse(":" ";") gen(temp) drop couponChange ** 4. Drop the temp vars containing years, preserve only the temp vars with the coupon rate foreach v of var temp* { if mod(real(substr("`v'",-1,1)), 2) == 1 drop `v' } ** 5. rename tempvars from even numbered to running from 1-19 foreach v of var temp* { forval x = 1/19 { capture rename `v' jaa`x' } } **6. combine interest rates into one variable gen intfxAll = jaa1 forval j = 2/19 { replace intfxAll = intfxAll + ";" + jaa`j' if jaa`j' !="" } label var intfxAll "NEW: couponrates ALL, separated by ;"
Hope the problem is explained ok, would really appreciate any advice!
Comment