Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summarize multiple obs by ID to one variable, correct time order

    Hi,

    I am converting a monthly panel dataset on government bonds with approximately 300, 000 observations to cross-sectional.

    For this purpose I would like to capture the fluctuation of the interest rate across time for each bond in format [rate1];[rate2];[rate3] in one variable.
    So I would like to combine all the interest rates from the panel to one variable by ID (concat vertically in a way).
    I*ve modelled the code partly based on this: https://www.statalist.org/forums/for...vations-per-id and https://www.statalist.org/forums/for...g-observations

    I'm having problems writing the proper code to get the results I want. With the current code I'm not able to capture the fluctuation fo the coupon rate - so let's say for one bond the rate would start from 1%, increase to 3% and decrease again to 1%.

    What I get:,
    Code:
    1;3
    What I would like to have:
    Code:
    1;3;1
    A sample of the data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int id str7 tm int y str3 coupon str11 intfxall
    10797 "1845m10" 1845 "5"   "5;6"        
    10797 "1845m11" 1845 "5"   "5;6"        
    10797 "1845m12" 1845 "5"   "5;6"        
    10797 "1846m1"  1846 "5"   "5;6"        
    10797 "1846m2"  1846 "5"   "5;6"        
    10797 "1846m3"  1846 "5"   "5;6"        
    10797 "1846m4"  1846 "5"   "5;6"        
    10797 "1846m5"  1846 "5"   "5;6"        
    10797 "1846m6"  1846 "5"   "5;6"        
    10797 "1846m7"  1846 "6"   "5;6"        
    10797 "1846m8"  1846 "6"   "5;6"        
    10797 "1846m9"  1846 "6"   "5;6"        
    10797 "1846m10" 1846 "6"   "5;6"        
    10797 "1846m11" 1846 "6"   "5;6"        
    10797 "1846m12" 1846 "6"   "5;6"        
    10797 "1847m1"  1847 "6"   "5;6"        
    10797 "1847m2"  1847 "6"   "5;6"        
    10797 "1847m3"  1847 "6"   "5;6"        
    10797 "1847m4"  1847 "5"   "5;6"        
    10797 "1847m5"  1847 "5"   "5;6"        
    10797 "1847m6"  1847 "5"   "5;6"        
    11530 "1899m10" 1899 "1,5" "1.5;2;2.5;3"
    11530 "1899m11" 1899 "1,5" "1.5;2;2.5;3"
    11530 "1899m12" 1899 "2"   "1.5;2;2.5;3"
    11530 "1900m1"  1900 "2"   "1.5;2;2.5;3"
    11530 "1900m2"  1900 "2"   "1.5;2;2.5;3"
    11530 "1900m3"  1900 "2"   "1.5;2;2.5;3"
    11530 "1900m4"  1900 "2"   "1.5;2;2.5;3"
    11530 "1900m5"  1900 "2,5" "1.5;2;2.5;3"
    11530 "1900m6"  1900 "2,5" "1.5;2;2.5;3"
    11530 "1900m7"  1900 "3"   "1.5;2;2.5;3"
    11530 "1900m8"  1900 "3"   "1.5;2;2.5;3"
    11530 "1900m9"  1900 "3"   "1.5;2;2.5;3"
    end
    -------

    In this sample, the ID 10797 is problematic because of the increase from 5 to 6 and decrease back to 5.

    My code:

    Code:
    ** Creating a dummy variable coupon_2_diff for Identifying bonds with variance in coupon rate (1 if variance, 0 if no variance)
    
    ds coupon_2
    foreach v of varlist `r(varlist)' {
        by ID (`v'), sort: gen byte `v'_diff = `v'[1] != `v'[_N]
        }        
    
    tsset ID tm
    
    ** 1. Concatenate the year of change in coupon rate with the coupon rate of that year (* I actually don't want to have the year of change in the final variable but without this step, the order of the interest rates does no respect the timing of their occurrence)
    
    foreach v of varlist coupon {
        by ID `v' (tm), sort : gen `v'_vals = _n == 1
        gen `v'_temp = `v' if `v'_vals ==1 & `v'_2_diff ==1 
        egen `v'_temp2 = concat(y `v'_temp), punct(:)
        replace `v'_temp2 = "" if missing(`v'_temp)
        }
    
    **2. Create variable couponChange which records the change in coupon rate in format YYYY: [rate1]; YYYY: [rate2] etc. 
    
    foreach v of varlist coupon {
        bysort ID (`v'_temp2) : gen `v'Change = `v'_temp2 if _n == 1
        by ID : replace `v'Change = `v'Change[_n-1] + ";" + cond(`v'_temp2 != `v'_temp2[_n-1], `v'_temp2, "") if _n > 1 & missing(`v'_temp2)==0
        by ID: replace `v'Change = `v'Change[_N]
        replace `v'Change = subinstr(`v'Change, ";", "", 1) if regexm(`v'Change, "^;")
        drop `v'_temp2 `v'_vals `v'_temp 
    }
    
    ** 3. Split the resulting variable in order to get rid of the years
    split couponChange, parse(":" ";") gen(temp)
    drop couponChange
    
    ** 4. Drop the temp vars containing years, preserve only the temp vars with the coupon rate
    
    foreach v of var temp* {
        if mod(real(substr("`v'",-1,1)), 2) == 1 drop `v'
     }
    
    ** 5. rename tempvars from even numbered to running from 1-19
    
    foreach v of var temp* {
    forval x = 1/19 {
        capture rename `v' jaa`x'
     }
     }
    
    **6.   combine interest rates into one variable
    gen intfxAll = jaa1
    
    forval j = 2/19 { 
        replace intfxAll = intfxAll + ";" + jaa`j' if jaa`j' !="" 
    }
    
    label var intfxAll "NEW: couponrates ALL, separated by ;"

    Hope the problem is explained ok, would really appreciate any advice!

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int id str7 tm int y str3 coupon str11 intfxall
    10797 "1845m10" 1845 "5"   "5;6"       
    10797 "1845m11" 1845 "5"   "5;6"       
    10797 "1845m12" 1845 "5"   "5;6"       
    10797 "1846m1"  1846 "5"   "5;6"       
    10797 "1846m2"  1846 "5"   "5;6"       
    10797 "1846m3"  1846 "5"   "5;6"       
    10797 "1846m4"  1846 "5"   "5;6"       
    10797 "1846m5"  1846 "5"   "5;6"       
    10797 "1846m6"  1846 "5"   "5;6"       
    10797 "1846m7"  1846 "6"   "5;6"       
    10797 "1846m8"  1846 "6"   "5;6"       
    10797 "1846m9"  1846 "6"   "5;6"       
    10797 "1846m10" 1846 "6"   "5;6"       
    10797 "1846m11" 1846 "6"   "5;6"       
    10797 "1846m12" 1846 "6"   "5;6"       
    10797 "1847m1"  1847 "6"   "5;6"       
    10797 "1847m2"  1847 "6"   "5;6"       
    10797 "1847m3"  1847 "6"   "5;6"       
    10797 "1847m4"  1847 "5"   "5;6"       
    10797 "1847m5"  1847 "5"   "5;6"       
    10797 "1847m6"  1847 "5"   "5;6"       
    11530 "1899m10" 1899 "1,5" "1.5;2;2.5;3"
    11530 "1899m11" 1899 "1,5" "1.5;2;2.5;3"
    11530 "1899m12" 1899 "2"   "1.5;2;2.5;3"
    11530 "1900m1"  1900 "2"   "1.5;2;2.5;3"
    11530 "1900m2"  1900 "2"   "1.5;2;2.5;3"
    11530 "1900m3"  1900 "2"   "1.5;2;2.5;3"
    11530 "1900m4"  1900 "2"   "1.5;2;2.5;3"
    11530 "1900m5"  1900 "2,5" "1.5;2;2.5;3"
    11530 "1900m6"  1900 "2,5" "1.5;2;2.5;3"
    11530 "1900m7"  1900 "3"   "1.5;2;2.5;3"
    11530 "1900m8"  1900 "3"   "1.5;2;2.5;3"
    11530 "1900m9"  1900 "3"   "1.5;2;2.5;3"
    end
    
    //  FIRST CREATE A USABLE STATA INTERNAL FORMAT MONTHLY DATE VARIABLE
    gen mdate = monthly(tm, "YM")
    format mdate %tm
    assert missing(mdate) == missing(tm)
    
    //  IDENTIFY SPELLS OF CONSTANT COUPON
    by id (mdate), sort: gen spell_num  = sum(coupon != coupon[_n-1])
    
    //  NOW CREATE THE DESIRED VARIABLE
    by id (mdate): gen wanted = coupon if _n == 1
    
    by id (mdate): replace wanted = wanted[_n-1] + ///
        cond(spell_num != spell_num[_n-1], ";" + coupon, "") if _n > 1
    by id (mdate): replace wanted = wanted[_N]

    Comment


    • #3
      Thank you, seems I took a much too long detour!

      Comment

      Working...
      X