Help with merging dataset

EIRINI FANOURIA

Join Date: Feb 2016

Posts: 12
#1

Help with merging dataset

19 Feb 2016, 12:51

We have tried to merge the datasets we've obtained from CDS, Compustat and CRSP and we've run into some trouble.

First, I would like to ask if we did the right thing by
1: obtaining the 6-digit CUSIP values from CDS
2: using the CUSIPs to obtain Compustat data
3: using the gvkey values from the Compustat data to obtain permnos using the link function of the Compustat-CRSP merged database
4: using the permnos to obtain CRSP data

We decided to merge the datasets we obtained in reverse order, starting with the CRSP data, but when we came to the CDS data, we expected to see the monthly data expand so that there would be monthly data for every company for every event date. This went wrong and for example with the first firm in the dataset, there was only one observation that matched the first event date and only one for the second event date, but all other observations were matched with a third event date. Maybe using an m:m was a bad idea in this case. Could you help us figure out what to do to fix our problem?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29911
#2

19 Feb 2016, 13:01

Maybe using an m:m was a bad idea in this case.

Yes. m:m merging is almost always a bad idea. Usually when people are tempted to do m:m merges they either are overlooking important aspects of the structure of their data, or they need to use some other command altogether.

That said, for those of us who don't work in finance, your description is quite confusing. Show us small, representative samples of the data sets you want to merge together, along with a display of how the final result from those samples should look and probably somebody can help you.

Anticipatory guidance: do not show the data by attaching spreadsheets or posting screenshots. Open actual Stata data sets and use the -dataex- command to generate code that can be pasted into the Forum so that those who might want to try working on this can quickly, accurately, and easily replicate your example data. If you do not already have the -dataex- command installed (it is user written by Robert Picard and Nick Cox), run -ssc install dataex-, and then follow the simple instructions to be found in -help dataex-.
2 likes
Comment
EIRINI FANOURIA

Join Date: Feb 2016

Posts: 12
#3

20 Feb 2016, 05:11

SDC
Repurchase observation -> open market repurchase announcement (event days)
We need CUSIP(8 digit) (we found only 6digit) PERMNO(not there) Firm name

CRSP
Market price
Dialy closing price
Holding period returns
Returns on equally weighted CRSP intex
Risk free rate
Share Price
Number of shares outstanding
Momentum
Liquidity
gvkey

CRSP COMPOSTAT MERGED
PERMNO

Compustat
Total Assets
Intangibles
Liabilities

We had to download all this data from different dataset, so the next step was to merge them all in a master file. In order to do this , first we merged all our monthly data, which seemed just fine.

However, our companies with their own company IDs had to be matched with several observations per company ID, so that one company ID would match with several observations in our final dataset.
As these would have to be matched by company ID, because of the different observations per company in both datafiles so that we would end up with a dataset with the size of A*B, where A and B are observations in each respective file.

I think this was our main problem, but as we're competely new to STATA, here's roughly what we did: We obtained the CUSIPs from SDC/Thomson One, then expanded these CUSIPs in order to obtain data from Compustat. Compustat gave us gvkey codes that we used in CRSP-Compustat-merged to finally obtain the permnos that we used in CRSP. We started merging our data in reverse order (We merged the CRSP data with the GVKEY data, merged that with the Compustat data and finally merged that with the SDC/Thomson One data) and we think we managed to merge the monthly dataset until we hit the following snag: (see picture)
The problem as you see from the highlighted area in the picture, after we attempted to merge our event data (SDC) and our monthly data in an m:m manner, we did not get a dataset with the size of the event data multiplied by the amount of months (more or less A*B), but instead, only one of the monthly observations was matched to the event date, leaving us with a dataset with size B (the monthly data). Instead of using code to handle these merges, we have used the drop-down menus to reduce the chance of encountering unexpected errors that would slow us down drastically.

So, the problem is that we were unable to match our event dates to the monthly data with the same company code.
Here are the commands
to create string gvkey* rename gvkey num_gvkey tostring num_gvkey, generate(gvkey) *to put zero's infront of the gvkey* replace gvkey = "0" + gvkey if strlen(gvkey) == 4 replace gvkey = "0" + gvkey if strlen(gvkey) == 5 *string mofd* book value file tostring mofd, generate(str_mofd) *create a new identifier* generate gvkey_monthly = gvkey + "+" + str_mofd *string mofd* stoc price file tostring mofd, generate(str_mofd) *create a new identifier* generate gvkey_monthly = gvkey + "+" + str_mofd drop _merge merged again keep if _merge==3 *Create new identifier* stockprice file generate ncusip6 = substr(ncusip, 1,6) *event data file* rename cusip6 ncusip6 *stock price file* drop _merge keep if _merge==3
Comment
EIRINI FANOURIA

Join Date: Feb 2016

Posts: 12
#4

20 Feb 2016, 05:17

This is the picture
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

20 Feb 2016, 05:47

As these would have to be matched by company ID, because of the different observations per company in both datafiles so that we would end up with a dataset with the size of A*B, where A and B are observations in each respective file.

If what you described in this quotation was the goal of your merge m:m then it is possible that what you wanted was a joinby command instead.

With that said, if you need further advice, Statalist can better help you if you review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. See especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, as described in section 12 of the FAQ. Posting a screenshot and unreadable code, ignoring the well-meant advice given in post #2 by Clyde, one of the most prolific and knowledgeable participants in Statalist, does not encourage those of us who are most likely to be able to help you.
Comment
EIRINI FANOURIA

Join Date: Feb 2016

Posts: 12
#6

20 Feb 2016, 07:33

Yes you are right, we are sorry..We spent a little time to find out how to use the commant and which files to send you. The problem is that when we try to have the dataex we have too many data and we can not sent you 1000 cariable from each file. Do you know how we can select just one-two firms to send in order to understant the problem and have a small sample?

So first we tried to merge the file GVKEY with STOCK PRICE, we used the variable PERMNO in order to merge them, we used from the merge faction in the menu DATA->COMPINE DATASETS->MERGE TWO DATASETS->MANY TO MANY, key variable PERMNO

Then, in order to merge our new master file with BOOK VALUE we create a monthly GVKEY as an identifier:
and file where we rename lpermno in permno

Because in the resulting master file, the gvkey values are all 6-digit strings and in BOOK TO VALUE file the gvkeys are numeric and have variable lengths, ranging from 4 to 6 digits, we adjusted the values in BOOK TO VALUE using:

Code:

*to turn the gvkey values into strings* rename gvkey num_gvkey tostring num_gvkey, generate(gvkey) *to put zero's infront of the gvkey values* replace gvkey = "0" + gvkey if strlen(gvkey) == 4 replace gvkey = "0" + gvkey if strlen(gvkey) == 5

In order to merge the files properly, we added a monthly identifier to the company identifier (gvkey) in both BOOK TO VALUE and our master file by combining the gvkey string and the mofd string (monthly identifier):

Code:

*turn mofd (monthly identifier) into strings* tostring mofd, generate(str_mofd) *create a new identifier* generate gvkey_monthly = gvkey + "+" + str_mofd drop _merge

After, we merge the master file with the BOOK TO VALUE file by using: DATA->COMPINE DATASETS->MERGE TWO DATASETS->MANY TO MANY, key variable gvkey_monthly

Code:

keep if _merge==3

Finally, we wanted to merge the master file with the EVENT DAY file In EVENT DAY file the cusip6 variable had a length of 6, which we wanted to use as the key variable to merge it with BOOK VALUE file, but in the master file the name was ncusip and had a length of 8 digits, so first we delete the last 2 digit to turn it into a 6 digit variable and then in order to merge with the event file we rename it also to ncusip6 in our master:

Code:

*Create new identifier* stockprice file generate ncusip6 = substr(ncusip, 1,6) *event data file* rename cusip6 ncusip6

After, we merge the master file with the EVENT DAY file by using: DATA->COMPINE DATASETS->MERGE TWO DATASETS->MANY TO MANY, key variable gvkey_monthly

Code:

*master file* drop _merge keep if _merge==3

Last edited by EIRINI FANOURIA; 20 Feb 2016, 08:19.
Comment

EIRINI FANOURIA

Join Date: Feb 2016
Posts: 12

20 Feb 2016, 08:17

SO STOCK PRICE file

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float mofd str8 ncusip double permno
1 "14979210" 10009
2 "14979210" 10009
3 "46334710" 10009
4 "46334710" 10009
5 "46334710" 10009
6 "46334710" 10009
1 "12709510" 10010
2 "12709510" 10010
3 "12709510" 10010
4 "12709510" 10010
5 "12709510" 10010
6 "12709510" 10010
1 "49307510" 10020
2 "49307510" 10020
3 "49307510" 10020
4 "49307510" 10020
5 "49307510" 10020
6 "49307510" 10020
1 "46603210" 10026
2 "46603210" 10026
3 "46603210" 10026
4 "46603210" 10026
5 "46603210" 10026
6 "46603210" 10026
1 "38209110" 10035
2 "38209110" 10035
3 "38209110" 10035
4 "38209110" 10035
5 "38209110" 10035
6 "38209110" 10035
1 "03267210" 10043
2 "03267210" 10043
3 "03267210" 10043
4 "03267210" 10043
5 "03267210" 10043
6 "03267210" 10043
1 "15650310" 10048
2 "15650310" 10048
3 "15650310" 10048
4 "15650310" 10048
5 "15650310" 10048
6 "15650310" 10048
1 "65163310" 10056
2 "65163310" 10056
3 "65163310" 10056
4 "65163310" 10056
5 "65163310" 10056
6 "65163310" 10056
1 "01390210" 10071
2 "01390210" 10071
3 "01390210" 10071
4 "01390210" 10071
5 "01390210" 10071
6 "01390210" 10071
1 "07465510" 10087
2 "07465510" 10087
3 "07465510" 10087
4 "07465510" 10087
5 "07465510" 10087
6 "07465510" 10087
end

GVKEY

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double lpermno str6 gvkey
10035 "011974"
10048 "012102"
10048 "012102"
10071 "012261"
10043 "012307"
10010 "012622"
10026 "012825"
10020 "013073"
10087 "013119"
10009 "016739"
10056 "017081"
end

BOOK VALUE file

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long gvkey float mofd
11974 3
11974 6
12102 2
12102 5
12261 3
12261 6
12307 3
12307 6
12622 1
12622 4
12825 3
12825 6
13073 3
13073 6
13119 2
13119 5
16739 3
16739 6
17081 3
17081 6
11974 4
11974 5
12102 3
12102 4
12102 6
12261 4
12261 5
12307 4
12307 5
12622 2
12622 3
12622 5
12622 6
12825 4
12825 5
13073 4
13073 5
13119 3
13119 4
13119 6
16739 4
16739 5
17081 4
17081 5
end

EVENT FILE

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 cusip6 float eventdate2
"466032" 14598
"463347" 14545
"466032" 13109
"466032" 12563
"127095" 12045
"466032" 11968
"466032" 11857
"493075" 11716
"651633" 14979
"382091" 13760
"156503" 13739
"032672" 13664
"032672" 13233
"382091" 13177
"074655" 13010
"013902" 12170
"156503" 11994
end
format %td eventdate2

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29911
#8

20 Feb 2016, 09:34

OK, this is helpful, but I have a few more questions.

What is the relationship between ncusip and cusip6? It looks like the cusip6 is just the first six characters of ncusip. Is that correct?

Is lpermno the same thing as permno? If not, how, if at all, are they related?

It appears as if there is a 1-1 correspondence among cusip6, permno, and gvkey--that these are just different identifier systems for firms and that in principle knowing any one of them you can look up the others. Is that correct?

Finally, do the observations of the data set you are trying to create consist of: cusip6 permno gvkey eventdate and mofd, and there are potentially several observations for each firm?

Last edited by Clyde Schechter; 20 Feb 2016, 09:35. Reason: Correct grammatical error.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29911
#9

20 Feb 2016, 09:54

I think some more clarification beyond what I asked in #8 is needed. Experimenting with the data, I see that the answer to my third question in #8 is no: the same cusip6 can correspond to multiple permno values.

But more important, looking at the event file and the stock price file, it is evident that every cusip can potentially have both multiple event dates and multiple mofd values. So when you put this all together, how do you decide which event date(s) to match up with which mofd value(s) in the file you wish to create. I don't see how to use gvkey or permno to decide this question. So what is the rule for association?
Comment
EIRINI FANOURIA

Join Date: Feb 2016

Posts: 12
#10

20 Feb 2016, 10:44

What is the relationship between ncusip and cusip6? It looks like the cusip6 is just the first six characters of ncusip. Is that correct?

Yes, ncusip6 as found in the stock price file is just the first 6 digits of ncusip, but the cusip6 in the event data file was obtained from a different dataset, so that's why we were merging them using matching values in these variables.

Is lpermno the same thing as permno? If not, how, if at all, are they related?

They are the same, sorry for the confusion

Finally, do the observations of the data set you are trying to create consist of: cusip6 permno gvkey eventdate and mofd, and there are potentially several observations for each firm?

Cusip6, permno and gvkey are identifying variables for the company's stock. mofd is a time series (monthly) variable for every company, so each firm will correspond to many mofd values. Finally, eventdate corresponds to events that occurred to a specific firm, where multiple events may occur to the same company.

But more important, looking at the event file and the stock price file, it is evident that every cusip can potentially have both multiple event dates and multiple mofd values. So when you put this all together, how do you decide which event date(s) to match up with which mofd value(s) in the file you wish to create. I don't see how to use gvkey or permno to decide this question. So what is the rule for association?

The rule of association is that one set of months (mofd values) should correspond to one event date, so that if, for example, two events occur in one firm, the corresponding firm should have two sets of months (mofd values), one set for each event.

Thank you so much for helping us out!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29911
#11

20 Feb 2016, 11:20

Well, we're getting there, but I still have two problems:

The rule of association is that one set of months (mofd values) should correspond to one event date, so that if, for example, two events occur in one firm, the corresponding firm should have two sets of months (mofd values), one set for each event.

OK, but which mofd value goes with which event date? Is it mofd = 1 for the earliest event date, mofd = 2 for the next, etc.? Or is there some other rule? Along these lines, when I merge together the stock price file, the gvkey file and the book value file ignoring cusip6 (see next question below), each gvkey/permno is associated with exactly 6 mofd values. But in the event file , most of the cusip's have only one associated event, a few have 2, and the largest number of events for any cusip is 5 (which only occurs with one cusip). So regardless of how the mofd's are matched up with event dates, there will be excess mofd's unless there is some rule to assign multiple mofd's to the same date? Is there? If not, what do we do with the "excess" mofd's?

Cusip6, permno and gvkey are identifying variables for the company's stock.

Except that in your data, it isn't so. In your stock price file, permno 10009 sometimes corresponds to cusip6 149792 and sometimes to 463347. Is that a data error? If not, is there some rule to select the appropriate cusip from among the options for each permno?

Last edited by Clyde Schechter; 20 Feb 2016, 11:20. Reason: Correct grammatical error.
Comment
EIRINI FANOURIA

Join Date: Feb 2016

Posts: 12
#12

20 Feb 2016, 11:54

OK, but which mofd value goes with which event date? Is it mofd = 1 for the earliest event date, mofd = 2 for the next, etc.? Or is there some other rule? Along these lines, when I merge together the stock price file, the gvkey file and the book value file ignoring cusip6 (see next question below), each gvkey/permno is associated with exactly 6 mofd values. But in the event file , most of the cusip's have only one associated event, a few have 2, and the largest number of events for any cusip is 5 (which only occurs with one cusip). So regardless of how the mofd's are matched up with event dates, there will be excess mofd's unless there is some rule to assign multiple mofd's to the same date? Is there? If not, what do we do with the "excess" mofd's?

The mofd's are the monthly indicators that correspond to other variables such as stock prices, returns etc. After we've merged our data, we will compare the regular mofd's to the event mofd's and focus on the months where they match. In our full dataset, every stock has a variable amount of mofd's belonging to it, up to 144, so we had to make a small selection of 6 months in order to obtain a small, but functional sample.

Except that in your data, it isn't so. In your stock price

file, permno 10009 sometimes corresponds to cusip6 149792 and sometimes to 463347. Is that a data error? If not, is there some rule to select the appropriate cusip from among the options for each permno?

We notice a change in company name in this case, so there's a good chance that it's a data error. Thank you for noticing! As for selecting the appropriate cusip in order to remove further possible data errors systematically, we're not entirely sure.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29911
#13

20 Feb 2016, 12:04

OK, that's helpful. But a key question remains unanswered. The event file contains only cusip and event dates. You say you will match the regular mofd's to the event mofd's, but there is no mofd information in the event file. So how do you do this. Is there some rule that mofd = 1 corresponds to January 1992, mofd = 2 to February 1992, etc.or something like that?
Comment
EIRINI FANOURIA

Join Date: Feb 2016

Posts: 12
#14

21 Feb 2016, 05:37

Originally posted by Clyde Schechter View Post

OK, that's helpful. But a key question remains unanswered. The event file contains only cusip and event dates. You say you will match the regular mofd's to the event mofd's, but there is no mofd information in the event file. So how do you do this. Is there some rule that mofd = 1 corresponds to January 1992, mofd = 2 to February 1992, etc.or something like that?

Yes, mofd is generated by using the month, so mofd = 1 will always correspond to January 1990, mofd = 2 to February 1990 and so on. The event dates have been converted to mofd's as well, so event mofd = 1 will also correspond to january 1990
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29911

#15

21 Feb 2016, 10:26

OK. I think we've got it. There are two basic steps. The first is to create a crosswalk between cusip6, gvkey, and permno as these are the different identifiers used in the various files. Then, we merge together the various source files, where the merge key is not just cusip6 or gvkey or permno, but the combination of cusip6/gvkey/permno and mofd.

A certain amount of cleanup of the example files was needed to do this: the presence of 2 cusips for the same permno in the stock price file is a problem. The original poster says he doesn't know how to resolve it. So I have resolved it arbitrarily by keeping only the numerically first cusip. Presumably the original poster will look into the matter further and can modify that decision if it isn't appropriate. In addition, the eventfile has daily dates, but those need to be converted to the mofd coding that starts in January 1990 so they could be linked up with the other files.

So here is my proposed solution:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float mofd str8 ncusip double permno
1 "14979210" 10009
2 "14979210" 10009
3 "46334710" 10009
4 "46334710" 10009
5 "46334710" 10009
6 "46334710" 10009
1 "12709510" 10010
2 "12709510" 10010
3 "12709510" 10010
4 "12709510" 10010
5 "12709510" 10010
6 "12709510" 10010
1 "49307510" 10020
2 "49307510" 10020
3 "49307510" 10020
4 "49307510" 10020
5 "49307510" 10020
6 "49307510" 10020
1 "46603210" 10026
2 "46603210" 10026
3 "46603210" 10026
4 "46603210" 10026
5 "46603210" 10026
6 "46603210" 10026
1 "38209110" 10035
2 "38209110" 10035
3 "38209110" 10035
4 "38209110" 10035
5 "38209110" 10035
6 "38209110" 10035
1 "03267210" 10043
2 "03267210" 10043
3 "03267210" 10043
4 "03267210" 10043
5 "03267210" 10043
6 "03267210" 10043
1 "15650310" 10048
2 "15650310" 10048
3 "15650310" 10048
4 "15650310" 10048
5 "15650310" 10048
6 "15650310" 10048
1 "65163310" 10056
2 "65163310" 10056
3 "65163310" 10056
4 "65163310" 10056
5 "65163310" 10056
6 "65163310" 10056
1 "01390210" 10071
2 "01390210" 10071
3 "01390210" 10071
4 "01390210" 10071
5 "01390210" 10071
6 "01390210" 10071
1 "07465510" 10087
2 "07465510" 10087
3 "07465510" 10087
4 "07465510" 10087
5 "07465510" 10087
6 "07465510" 10087
end
gen cusip6 = substr(ncusip, 1, 6)
// CLEAN UP DUPLICATE CUSIPS WITHIN PERMNO
// BY ARBITRARILY CHOOSING FIRST CUSIP
by permno (cusip6), sort: replace cusip6 = cusip6[1]
tempfile stock_price_file
save `stock_price_file'

* Example generated by -dataex-. To install: ssc install dataex
clear
input double lpermno str6 gvkey
10035 "011974"
10048 "012102"
10048 "012102"
10071 "012261"
10043 "012307"
10010 "012622"
10026 "012825"
10020 "013073"
10087 "013119"
10009 "016739"
10056 "017081"
end
rename lpermno permno
tempfile gvkey_file
save `gvkey_file'

* Example generated by -dataex-. To install: ssc install dataex
clear
input long gvkey float mofd
11974 3
11974 6
12102 2
12102 5
12261 3
12261 6
12307 3
12307 6
12622 1
12622 4
12825 3
12825 6
13073 3
13073 6
13119 2
13119 5
16739 3
16739 6
17081 3
17081 6
11974 4
11974 5
12102 3
12102 4
12102 6
12261 4
12261 5
12307 4
12307 5
12622 2
12622 3
12622 5
12622 6
12825 4
12825 5
13073 4
13073 5
13119 3
13119 4
13119 6
16739 4
16739 5
17081 4
17081 5
end
tostring gvkey, format(%06.0f) replace
isid gvkey mofd
tempfile book_value_file
save `book_value_file'

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 cusip6 float eventdate2
"466032" 14598
"463347" 14545
"466032" 13109
"466032" 12563
"127095" 12045
"466032" 11968
"466032" 11857
"493075" 11716
"651633" 14979
"382091" 13760
"156503" 13739
"032672" 13664
"032672" 13233
"382091" 13177
"074655" 13010
"013902" 12170
"156503" 11994
end
format %td eventdate2
// MAP EVENT DATES TO "mofd" WITH mofd = 1
// CORRESPONDING TO January 1990
gen mofd = mofd(eventdate2)-tm(1989m12)
isid cusip6 mofd
tempfile event_file
save `event_file'

// CREATE A CROSSWALK BETWEEN
// CUSIP, GVKEY, AND PERMNO

use `stock_price_file', clear
keep cusip6 permno
duplicates drop
isid cusip6
isid permno
tempfile crosswalk
save `crosswalk'
use `gvkey_file', clear
duplicates drop
isid permno
isid gvkey
merge 1:1 permno using `crosswalk', assert(match) nogenerate
save `"`crosswalk'"', replace

// NOW MERGE IN THE STOCK PRICE DATA
merge 1:m cusip6 using `stock_price_file'
drop _merge
isid cusip6 mofd

// AND NOW MERGE IN THE BOOK VALUE DATA
merge 1:1 gvkey mofd using `book_value_file'
drop _merge
isid cusip6 mofd

// AND FINALLY MERGE IN THE EVENT DATA
merge 1:1 cusip6 mofd using `event_file'
drop _merge

Notes:

1. In creating the crosswalk file, the merge comes with -assert(match)- because if there are cusip's, gvkey's, or permno's that do not have a corresponding values we can't really proceed.

2. By contrast, when merging in the substantive data on stock prices, book values, and events, it is anticipatable that there may be some unmatchable data. That is how my epidemiologic real world works, and I assume the same can happen in finance. And even in the example data provided, there are some events that do not match because their dates are out of range of the dates in the other files. So no -assert(match)- option specified for these merges.

3. In the original poster's description of the data sets, the stock price and book value are regular monthly data, so I have assumed (and verified in the code) that each firm has only a single observation in any given month. This will need to be true in the original data, not just to pass the -isid-s, but for the whole process to work coherently. If it is not true in the real data, then decisions need to be made about which value of a given firm's stock price/book value in a given month to use (or whether to average them, or whatever), but a single value is needed.

4. On the other hand, although in the example data each firm experiences at most a single event in a given month, I do not see anything in the original description that guarantees this will always be true. I have relied on that single event per month restriction by using -merge 1:1- in the final step where the event data is merged in. However, if a firm can in fact experience more than one event per month, the 1:1 can be changed to 1:m and nothing terrible will happen.

5. I have used tempfiles to hold all of the example data sets and the crosswalk file created. The few data-transformation/cleanup steps incorporated in those parts of the code need to be followed in the original data, but there is no reason they cannot be saved as real data sets assuming they are useful for other purposes. Probably saving the crosswalk as a real data file would make sense also: I imagine it will prove useful in the future.

Announcement