Summarize multiple obs by ID to one variable, correct time order

Ilona Nietosvaara

Join Date: Apr 2019
Posts: 4

Summarize multiple obs by ID to one variable, correct time order

13 Nov 2019, 08:27

Hi,

I am converting a monthly panel dataset on government bonds with approximately 300, 000 observations to cross-sectional.

For this purpose I would like to capture the fluctuation of the interest rate across time for each bond in format [rate1];[rate2];[rate3] in one variable.
So I would like to combine all the interest rates from the panel to one variable by ID (concat vertically in a way).
I*ve modelled the code partly based on this: https://www.statalist.org/forums/for...vations-per-id and https://www.statalist.org/forums/for...g-observations

I'm having problems writing the proper code to get the results I want. With the current code I'm not able to capture the fluctuation fo the coupon rate - so let's say for one bond the rate would start from 1%, increase to 3% and decrease again to 1%.

What I get:,

Code:

1;3

What I would like to have:

Code:

1;3;1

A sample of the data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id str7 tm int y str3 coupon str11 intfxall
10797 "1845m10" 1845 "5"   "5;6"        
10797 "1845m11" 1845 "5"   "5;6"        
10797 "1845m12" 1845 "5"   "5;6"        
10797 "1846m1"  1846 "5"   "5;6"        
10797 "1846m2"  1846 "5"   "5;6"        
10797 "1846m3"  1846 "5"   "5;6"        
10797 "1846m4"  1846 "5"   "5;6"        
10797 "1846m5"  1846 "5"   "5;6"        
10797 "1846m6"  1846 "5"   "5;6"        
10797 "1846m7"  1846 "6"   "5;6"        
10797 "1846m8"  1846 "6"   "5;6"        
10797 "1846m9"  1846 "6"   "5;6"        
10797 "1846m10" 1846 "6"   "5;6"        
10797 "1846m11" 1846 "6"   "5;6"        
10797 "1846m12" 1846 "6"   "5;6"        
10797 "1847m1"  1847 "6"   "5;6"        
10797 "1847m2"  1847 "6"   "5;6"        
10797 "1847m3"  1847 "6"   "5;6"        
10797 "1847m4"  1847 "5"   "5;6"        
10797 "1847m5"  1847 "5"   "5;6"        
10797 "1847m6"  1847 "5"   "5;6"        
11530 "1899m10" 1899 "1,5" "1.5;2;2.5;3"
11530 "1899m11" 1899 "1,5" "1.5;2;2.5;3"
11530 "1899m12" 1899 "2"   "1.5;2;2.5;3"
11530 "1900m1"  1900 "2"   "1.5;2;2.5;3"
11530 "1900m2"  1900 "2"   "1.5;2;2.5;3"
11530 "1900m3"  1900 "2"   "1.5;2;2.5;3"
11530 "1900m4"  1900 "2"   "1.5;2;2.5;3"
11530 "1900m5"  1900 "2,5" "1.5;2;2.5;3"
11530 "1900m6"  1900 "2,5" "1.5;2;2.5;3"
11530 "1900m7"  1900 "3"   "1.5;2;2.5;3"
11530 "1900m8"  1900 "3"   "1.5;2;2.5;3"
11530 "1900m9"  1900 "3"   "1.5;2;2.5;3"
end

-------

In this sample, the ID 10797 is problematic because of the increase from 5 to 6 and decrease back to 5.

My code:

Code:

** Creating a dummy variable coupon_2_diff for Identifying bonds with variance in coupon rate (1 if variance, 0 if no variance)

ds coupon_2
foreach v of varlist `r(varlist)' {
    by ID (`v'), sort: gen byte `v'_diff = `v'[1] != `v'[_N]
    }        

tsset ID tm

** 1. Concatenate the year of change in coupon rate with the coupon rate of that year (* I actually don't want to have the year of change in the final variable but without this step, the order of the interest rates does no respect the timing of their occurrence)

foreach v of varlist coupon {
    by ID `v' (tm), sort : gen `v'_vals = _n == 1
    gen `v'_temp = `v' if `v'_vals ==1 & `v'_2_diff ==1 
    egen `v'_temp2 = concat(y `v'_temp), punct(:)
    replace `v'_temp2 = "" if missing(`v'_temp)
    }

**2. Create variable couponChange which records the change in coupon rate in format YYYY: [rate1]; YYYY: [rate2] etc. 

foreach v of varlist coupon {
    bysort ID (`v'_temp2) : gen `v'Change = `v'_temp2 if _n == 1
    by ID : replace `v'Change = `v'Change[_n-1] + ";" + cond(`v'_temp2 != `v'_temp2[_n-1], `v'_temp2, "") if _n > 1 & missing(`v'_temp2)==0
    by ID: replace `v'Change = `v'Change[_N]
    replace `v'Change = subinstr(`v'Change, ";", "", 1) if regexm(`v'Change, "^;")
    drop `v'_temp2 `v'_vals `v'_temp 
}

** 3. Split the resulting variable in order to get rid of the years
split couponChange, parse(":" ";") gen(temp)
drop couponChange

** 4. Drop the temp vars containing years, preserve only the temp vars with the coupon rate

foreach v of var temp* {
    if mod(real(substr("`v'",-1,1)), 2) == 1 drop `v'
 }

** 5. rename tempvars from even numbered to running from 1-19

foreach v of var temp* {
forval x = 1/19 {
    capture rename `v' jaa`x'
 }
 }

**6.   combine interest rates into one variable
gen intfxAll = jaa1

forval j = 2/19 { 
    replace intfxAll = intfxAll + ";" + jaa`j' if jaa`j' !="" 
}

label var intfxAll "NEW: couponrates ALL, separated by ;"

Hope the problem is explained ok, would really appreciate any advice!

Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

13 Nov 2019, 09:27

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id str7 tm int y str3 coupon str11 intfxall
10797 "1845m10" 1845 "5"   "5;6"       
10797 "1845m11" 1845 "5"   "5;6"       
10797 "1845m12" 1845 "5"   "5;6"       
10797 "1846m1"  1846 "5"   "5;6"       
10797 "1846m2"  1846 "5"   "5;6"       
10797 "1846m3"  1846 "5"   "5;6"       
10797 "1846m4"  1846 "5"   "5;6"       
10797 "1846m5"  1846 "5"   "5;6"       
10797 "1846m6"  1846 "5"   "5;6"       
10797 "1846m7"  1846 "6"   "5;6"       
10797 "1846m8"  1846 "6"   "5;6"       
10797 "1846m9"  1846 "6"   "5;6"       
10797 "1846m10" 1846 "6"   "5;6"       
10797 "1846m11" 1846 "6"   "5;6"       
10797 "1846m12" 1846 "6"   "5;6"       
10797 "1847m1"  1847 "6"   "5;6"       
10797 "1847m2"  1847 "6"   "5;6"       
10797 "1847m3"  1847 "6"   "5;6"       
10797 "1847m4"  1847 "5"   "5;6"       
10797 "1847m5"  1847 "5"   "5;6"       
10797 "1847m6"  1847 "5"   "5;6"       
11530 "1899m10" 1899 "1,5" "1.5;2;2.5;3"
11530 "1899m11" 1899 "1,5" "1.5;2;2.5;3"
11530 "1899m12" 1899 "2"   "1.5;2;2.5;3"
11530 "1900m1"  1900 "2"   "1.5;2;2.5;3"
11530 "1900m2"  1900 "2"   "1.5;2;2.5;3"
11530 "1900m3"  1900 "2"   "1.5;2;2.5;3"
11530 "1900m4"  1900 "2"   "1.5;2;2.5;3"
11530 "1900m5"  1900 "2,5" "1.5;2;2.5;3"
11530 "1900m6"  1900 "2,5" "1.5;2;2.5;3"
11530 "1900m7"  1900 "3"   "1.5;2;2.5;3"
11530 "1900m8"  1900 "3"   "1.5;2;2.5;3"
11530 "1900m9"  1900 "3"   "1.5;2;2.5;3"
end

//  FIRST CREATE A USABLE STATA INTERNAL FORMAT MONTHLY DATE VARIABLE
gen mdate = monthly(tm, "YM")
format mdate %tm
assert missing(mdate) == missing(tm)

//  IDENTIFY SPELLS OF CONSTANT COUPON
by id (mdate), sort: gen spell_num  = sum(coupon != coupon[_n-1])

//  NOW CREATE THE DESIRED VARIABLE
by id (mdate): gen wanted = coupon if _n == 1

by id (mdate): replace wanted = wanted[_n-1] + ///
    cond(spell_num != spell_num[_n-1], ";" + coupon, "") if _n > 1
by id (mdate): replace wanted = wanted[_N]

Comment

Ilona Nietosvaara

Join Date: Apr 2019

Posts: 4
#3

14 Nov 2019, 05:19

Thank you, seems I took a much too long detour!
Comment

Announcement

Summarize multiple obs by ID to one variable, correct time order

Comment

Comment