Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge/joinby problem

    Hello,

    I am trying to merge/combine two datasets

    The first dataset "stockdata" , contains daily returns for all stocks that have been listed on this particular stock exchange and the daily index return

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float date long company_id float(ret market_return)
    13381 1  .007722046 .0006220813
    13496 1           0 -.008546757
    13446 1  .016129382  .007810902
    13485 1  .015037877  .013100781
    13536 1   .02702867  .006271818
    13452 1  -.00809721 -.000337475
    13503 1   .04016604  .011819593
    13535 1           0 -.006412229
    13493 1 -.015504187  .008738445
    13389 1  -.02316706  .005394191
    end
    format %td date
    label values company_id company_id
    label def company_id 1 "ANN7425Q1095", modify
    I need to combine this dataset with a dataset that contains information on M&A announcements (not all companies included in stockdata are necessarily found in this dataset depends on wether the cmpany announced an m&a):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int event_date str30 AcquirorName long company_id
    13529 "Rieber & Son ASA"       88
    13530 "Hafslund ASA"           82
    13545 "Blom ASA"               77
    13549 "Smedvig ASA"            73
    13562 "Merkantildata ASA"      84
    13594 "Den Norske Banken ASA" 108
    13605 "RingCom ASA"            14
    13623 "Wenaas ASA"             29
    13640 "Merkantildata ASA"      84
    13642 "Tomra Systems ASA"      96
    end
    format %td event_date
    label values company_id company_id
    label def company_id 14 "NO0003042608", modify
    label def company_id 29 "NO0003060303", modify
    label def company_id 73 "NO0003390205", modify
    label def company_id 77 "NO0003679102", modify
    label def company_id 82 "NO0004306408", modify
    label def company_id 84 "NO0004822503", modify
    label def company_id 88 "NO0004951104", modify
    label def company_id 96 "NO0005668905", modify
    label def company_id 108 "NO0010031479", modify
    I need to do an event study and I have tried to follow the method described by Robert Picard here:

    HTML Code:
    https://www.statalist.org/forums/forum/general-stata-discussion/general/1407083-event-study-with-r2000-no-observations-coding-issue
    However his method for combining the stockdata with eventdata does not work for me

    Code:
     
     use "eventdates.dta", clear bysort company_id: gen event_id = _n joinby company_id using "stockdata.dta"
    egen group_id = group(company_id event_id)
    isid group_id date, sort
    (data now sorted by group_id date)
    results in:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int event_date str30 AcquirorName long company_id float(event_id date ret market_return group_id)
    16805 "SeaDrill Ltd" 1 1 13345           . -.0029552735 1
    16805 "SeaDrill Ltd" 1 1 13346 -.036105003   -.02137943 1
    16805 "SeaDrill Ltd" 1 1 13347 -.007380107   -.00062269 1
    16805 "SeaDrill Ltd" 1 1 13348 -.007434979  -.005353319 1
    16805 "SeaDrill Ltd" 1 1 13349 -.007490672   .008196751 1
    16805 "SeaDrill Ltd" 1 1 13352 -.007547206  -.006498405 1
    16805 "SeaDrill Ltd" 1 1 13353           0   .005344259 1
    16805 "SeaDrill Ltd" 1 1 13354 -.015267472   -.01801691 1
    16805 "SeaDrill Ltd" 1 1 13355  .007662873   .006850596 1
    16805 "SeaDrill Ltd" 1 1 13356    .0076046 -.0012584217 1
    end
    format %td event_date
    format %td date
    label values company_id company_id
    label def company_id 1 "BMG7945E1057", modify
    What has happened here is that the first 10 observations of stockdata belong to the company_id: ANN7425Q1095 which is not "Seadrill ltd" are assigned to company_id
    BMG7945E1057 which is seadrill


    The code is not matching the company_id in eventdata with the company_id in stockdata and returning me all return data for the company specified by company_id. Instead, it seems that stata changes company_id and just feeds me back the stockdata. What I want to do is

    Use company_id in eventdata, match it with company_id in stock data and give me all daily returns for this company.

    Thanks in advance


  • #2
    I cannot replicate your problem with the example data because the examples shown provide no matches at all between the two data sets.

    But I'm pretty sure I know what's going on. In both data sets, I see that company_id is a value-labeled numeric variable. That means that although you see company names when you -list- the variable or use other commands that show values for it, the actual variables are just simple non-negative integers, and each data set has a value label that maps those integers to the names. I think your problem arises because the value labels in the two data sets are different and when you apply -joinby- the situation becomes confused.

    The solution is to -decode company_id, gen(str_company_id)- in both data sets and -drop company_id-. Then when you run -joinby-, use str_company_id rather than company_id as the join key. Then , if you wish, you can -encode str_company_id, gen(company_id)-, which will create a new value-labeled numeric company_id variable. The advantage of doing that is that the company_id variable will take up less space and can be used with commands like -xtset- or with factor variable notation in regression commands.

    But do bear in mind that the resulting company_id variable's mapping from non-negative integers to company names will be different from both of the mappings in the original data sets. So if you later try to -merge- or -joinby- with this variable and the old data sets you will get false matches again.

    Comment


    • #3
      Thank you so much Mr. Schechter!

      Using the string version of company_id seems to have solved my problem.

      Comment

      Working...
      X