Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping observations based on certain criteria

    Hello all,

    Here is a snippet of the data I am working with:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double secid long(datadate exdate) str2 cp_flag double strike_price
    101062 13535 13595 "C"  40000
    101535 13535 13595 "P"  30000
    101121 13535 13595 "P"  35000
    101508 13535 13595 "P"  55000
    101375 13535 13595 "P"  60000
    102113 13535 13595 "C"  10000
    102349 13535 13595 "P" 120000
    102349 13535 13595 "C" 125000
    103879 13535 13595 "P"  70000
    104229 13535 13595 "P"  35000
    104059 13535 13595 "P"  25000
    108065 13535 13595 "P"  20000
    105700 13535 13595 "P"  55000
    105971 13535 13595 "P"  35000
    106638 13535 13595 "P"  70000
    106595 13535 13595 "C"  45000
    110357 13535 13595 "P"  22500
    108965 13535 13595 "P" 120000
    108965 13535 13595 "C" 120000
    107747 13535 13595 "C"  70000
    end
    format %td datadate
    format %td exdate
    label var secid "Security ID" 
    label var datadate "The Date of this Price" 
    label var exdate "Expiration Date of the Option" 
    label var cp_flag "C=Call, P=Put" 
    label var strike_price "Strike Price of the Option Times 1000"

    I would like to be able to create pairs based on having the same "datadate", same "exdate", same "strike_price", but a different "cp_flag" (one observation in the pair has a cp_flag equal to "C" and the other to "P"). If I am unable to create this pair, I would like to drop these observations. Those anyone know how I can do that?

    Thanks

  • #2
    Code:
    local match_vars datadate exdate strike_price
    set seed 1234 // OR WHATEVER RANDOM NUMBER GENERATOR SEED YOU LIKE
    
    //  SEPARATE PUTS AND CALLS
    preserve
    keep if cp_flag == "P"
    ds `match_vars', not
    rename (`r(varlist)') =_put
    gen double shuffle = runiform()
    by `match_vars' (shuffle), sort: gen priority = _n
    drop shuffle
    tempfile puts
    save `puts'
    
    restore
    keep if cp_flag == "C"
    ds `match_vars', not
    rename (`r(varlist)') =_call
    gen double shuffle = runiform()
    by `match_vars' (shuffle), sort: gen priority = _n
    drop shuffle
    
    // PAIR THEM UP
    merge 1:1 `match_vars' priority using `puts', keep(match) nogenerate
    
    gen long pair_num = _n
    unab stubs: *_put
    local stubs: subinstr local stubs "_put" "", all
    reshape long `stubs', i(pair_num) j(_j) string
    drop _j priority
    Notes: You did not specify whether you wanted the pairs laid out "side by side" in a single observation or "vertically" stacked above each other with a variable (pair_num) identifying the pair they belong to. I gave you the latter. If you prefer the former, just stop after the -merge- command.

    You also did not say whether a given put could be matched to more than one call (or vice versa). I did the pairing in a way that does not allow this. (My experience on Statalist is that this is more commonly what is desired, though there is no statistical reason to prefer it.)

    Finally, if this is at all representative of your data, you are only going to be able to match a pretty small fraction of your observations. If you decide that's unsatisfactory, the easiest way to get more matches is to allow an approximate match on strike price (within, say, some specified amount, or some specified ratio). Alternatively, you can allow individual puts and calls to have "multiple partners," though that won't help so much with this particular kind of data.

    Comment


    • #3
      Not sure if this can help:

      Code:
      tempvar counter
      egen g=group(datadate exdate strike_price)
      gen `counter'=(cp_flag=="C")
      bys g: egen revision=mean(`counter')
      replace revision=2 if revision!=0 & revision!=1
      la def revision_lbl 0 "Only P" 1 "Only C" 2 "P and C"  
      la val revision revision_lbl
      It's just to easily check the observations that are P and C in cp_flag.

      Comment


      • #4
        Thank you Clyde your code really helps out

        Comment

        Working...
        X