creating an event window and estimaton window in STATA

Sophie vanderValk

Join Date: May 2018

Posts: 1
#1

creating an event window and estimaton window in STATA

26 May 2018, 03:53

Hi there,

I'm new to STATA an not very familiar with it. I'm having some trouble creating an event window and an estimation window. My data consists of macroeconomic announcements from the bank of england from 2007 until the end of 2013 and high frequency exchange rates (5 minute intervals). I'm essentially trying to test the effect of these announcements on the price of exchange rates. I have exchange rate data for the day before the event and the day of the event. My data goes as follows: A1 year, B1 month, C1 day, D1, time (eastern US), E1 price EUR/USD, F1 GBP/USD, G1 EUR/GBP and H1 announcement (1 if announcement and 0 if not) and I have 10,081 observations.

My problem is that I don't know the STATA code for creating the event window (the five minute price of the exchange rates when the announcement occurs) and the estimation window (the day before, leading up to the annoucement and the 5 minute prices after the announcement). I have multiple announcements in the excel file that (luckily) happen at the same time every time but different months, days and years.

I was trying to follow the princeton university stata guide for event studies: https://dss.princeton.edu/online_hel...ventstudy.html but this link only deals with days and not times of events and I am not sure how to create the windows.

if anyone could help me with this, that would be fantastic.

Sophie.
Tags: None
Devra Golbe

Join Date: Apr 2014

Posts: 170
#2

26 May 2018, 09:23

Hi Sophie,

Welcome to Stata and Statalist. If you haven't already, I suggest you read the FAQ. They will help you frame questions in ways that make it easier for others to answer them. I, for one, am having trouble visualizing the structure of your data. A small data extract would help.

That said, I suspect your question is less about Stata and more about framing your model. The Princeton code to which you refer applies, as you note, to daily data, and defines a 5-day event window, including the day of the event and 2 days before and after. Do you have the time of the announcement to within 5 minutes? If so, you could form event windows in 5-minute blocks: the block containing the announcement, and say 10 minutes before and 10 minutes after. Such a short window (25 minutes) may or may not make sense in the context of your research question. If your announcement data are less precise, you will need to aggregate over longer periods for your event window.

Best,
Devra

Devra Golbe
Professor Emerita, Dept. of Economics
Hunter College, CUNY
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#3

26 May 2018, 14:21

You can use rangejoin (from SSC) to match the time of each announcement with your high-frequency data.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

27 May 2018, 10:00

Your problem can most likely be addressed using simpler code since the announcements are always at the same time of the day but I'll use more flexible code to illustrate a general approach. When dealing with dates and time, familiarity with Stata's numerical dates is critical. See help datetime for more information if the code below is not obvious.

You in fact have two datasets, one for the announcements (which comes presumably with some variables of interest) and the other which contains high frequency exchange rate at 5 minute intervals. To best illustrate the technique, I create demonstration datasets for each. First, here code to create the xrate data:

Code:

* create 5 min data from 2007 to 2013
clear all
set seed 3213
local obs = mdy(12,31,2013) - mdy(12,31,2006)
set obs `obs'
gen date = mdy(12,31,2006) + _n
format %tdDayDDmonCCYY date
gen year = year(date)
gen month = month(date)
gen day = day(date)
gen eurusd = runiform(1.1,1.5)
gen gbpusd = runiform(1.7,2.1)
gen eurgbp = runiform(.6,.7)
expand 288
bysort date: gen time = (_n-1)*5
drop date

* convert date and time to SIF (Stata internal form)
gen double datetime = mdyhms(month,day,year,0,0,0) + msofminutes(time)
format %tc datetime
save "exchange_rates_5min.dta", replace
list in 1/5

and here's the results from the final list command:

Code:

. list in 1/5

     +---------------------------------------------------------------------------------+
     | year   month   day     eurusd     gbpusd     eurgbp   time             datetime |
     |---------------------------------------------------------------------------------|
  1. | 2007       1     1   1.322795   1.893155   .6464114      0   01jan2007 00:00:00 |
  2. | 2007       1     1   1.322795   1.893155   .6464114      5   01jan2007 00:05:00 |
  3. | 2007       1     1   1.322795   1.893155   .6464114     10   01jan2007 00:10:00 |
  4. | 2007       1     1   1.322795   1.893155   .6464114     15   01jan2007 00:15:00 |
  5. | 2007       1     1   1.322795   1.893155   .6464114     20   01jan2007 00:20:00 |
     +---------------------------------------------------------------------------------+

.

Now here's code for the announcements dataset

Code:

* create 20 announcements from 2007 to 2014
clear
set seed 5543
set obs 20
gen anndate = runiformint(mdy(1,1,2007), mdy(12,31,2014))
format %tdDayDDmonCCYY anndate
sort anndate
gen ann_id = _n
save "announcements.dta", replace
list in 1/5

and the results from the list command:

Code:

. list in 1/5

     +-----------------------+
     |      anndate   ann_id |
     |-----------------------|
  1. | Tue17jul2007        1 |
  2. | Wed01aug2007        2 |
  3. | Wed06feb2008        3 |
  4. | Sat10jan2009        4 |
  5. | Fri24apr2009        5 |
     +-----------------------+

The first task is to pair each announcement with all observations from the xrate data that fall within the desired event window. This is easy to do with rangejoin (from SSC). You define the low and high bounds to use for each announcement and rangejoin will pair each of these announcements with all observations from "exchange_rates_5min.dta" where the datetime variable has a value within the specified bounds.

Code:

* dataset with event window observations
use "announcements.dta", clear
gen double low  = dhms(anndate,6,0,0) - msofminutes(10)
gen double high = dhms(anndate,6,0,0) + msofminutes(15)
format %tc low high
list in 1/5
rangejoin datetime low high using "exchange_rates_5min.dta"
drop low high year month day
sort ann_id datetime
save "event_window.dta", replace
list if ann_id <= 2, sepby(ann_id)

Here's the results from the first list command which shows the bounds used:

Code:

. list in 1/5

     +-----------------------------------------------------------------+
     |      anndate   ann_id                  low                 high |
     |-----------------------------------------------------------------|
  1. | Tue17jul2007        1   17jul2007 05:50:00   17jul2007 06:15:00 |
  2. | Wed01aug2007        2   01aug2007 05:50:00   01aug2007 06:15:00 |
  3. | Wed06feb2008        3   06feb2008 05:50:00   06feb2008 06:15:00 |
  4. | Sat10jan2009        4   10jan2009 05:50:00   10jan2009 06:15:00 |
  5. | Fri24apr2009        5   24apr2009 05:50:00   24apr2009 06:15:00 |
     +-----------------------------------------------------------------+

and here's the results from the second list command that shows the matching observations for the first two events:

Code:

. list if ann_id <= 2, sepby(ann_id)

     +------------------------------------------------------------------------------------+
     |      anndate   ann_id     eurusd     gbpusd     eurgbp   time             datetime |
     |------------------------------------------------------------------------------------|
  1. | Tue17jul2007        1   1.319044   1.836712   .6696532    350   17jul2007 05:50:00 |
  2. | Tue17jul2007        1   1.319044   1.836712   .6696532    355   17jul2007 05:55:00 |
  3. | Tue17jul2007        1   1.319044   1.836712   .6696532    360   17jul2007 06:00:00 |
  4. | Tue17jul2007        1   1.319044   1.836712   .6696532    365   17jul2007 06:05:00 |
  5. | Tue17jul2007        1   1.319044   1.836712   .6696532    370   17jul2007 06:10:00 |
  6. | Tue17jul2007        1   1.319044   1.836712   .6696532    375   17jul2007 06:15:00 |
     |------------------------------------------------------------------------------------|
  7. | Wed01aug2007        2   1.204478   1.804457    .688996    350   01aug2007 05:50:00 |
  8. | Wed01aug2007        2   1.204478   1.804457    .688996    355   01aug2007 05:55:00 |
  9. | Wed01aug2007        2   1.204478   1.804457    .688996    360   01aug2007 06:00:00 |
 10. | Wed01aug2007        2   1.204478   1.804457    .688996    365   01aug2007 06:05:00 |
 11. | Wed01aug2007        2   1.204478   1.804457    .688996    370   01aug2007 06:10:00 |
 12. | Wed01aug2007        2   1.204478   1.804457    .688996    375   01aug2007 06:15:00 |
     +------------------------------------------------------------------------------------+

Now here's code to match each event to the desired estimation window observations. My reading of #4 is that each event should be matched to all xrate data on the day of the announcement and one day before, minus those in the event window. I use listsome (from SSC) to list the observations to drop (that fall within the event window):

Code:

* dataset with estimation window observations
use "announcements.dta", clear
gen double low  = dhms(anndate-1,0,0,0)
gen double high = dhms(anndate+1,0,0,0) -1
format %tc low high
list in 1/5
rangejoin datetime low high using "exchange_rates_5min.dta"
drop low high year month day
sort ann_id datetime

* exclude observations within the event window, -listsome- is from SSC
gen double low  = dhms(anndate,6,0,0) - msofminutes(10)
gen double high = dhms(anndate,6,0,0) + msofminutes(15)
format %tc low high
gen todrop = inrange(datetime,low, high)
listsome anndate ann_id datetime if todrop, max(12) sepby(ann_id)
drop if todrop
drop todrop low high
save "estimation_window.dta", replace

Here are the results from the first list command that shows the bounds used:

Code:

. list in 1/5

     +-----------------------------------------------------------------+
     |      anndate   ann_id                  low                 high |
     |-----------------------------------------------------------------|
  1. | Tue17jul2007        1   16jul2007 00:00:00   17jul2007 23:59:59 |
  2. | Wed01aug2007        2   31jul2007 00:00:00   01aug2007 23:59:59 |
  3. | Wed06feb2008        3   05feb2008 00:00:00   06feb2008 23:59:59 |
  4. | Sat10jan2009        4   09jan2009 00:00:00   10jan2009 23:59:59 |
  5. | Fri24apr2009        5   23apr2009 00:00:00   24apr2009 23:59:59 |
     +-----------------------------------------------------------------+

and here's the results from the listsome (from SSC) command that shows the observations that will be dropped for the first two announcements because they are part of the event window:

Code:

. listsome anndate ann_id datetime if todrop, max(12) sepby(ann_id)

       +--------------------------------------------+
       |      anndate   ann_id             datetime |
       |--------------------------------------------|
  359. | Tue17jul2007        1   17jul2007 05:50:00 |
  360. | Tue17jul2007        1   17jul2007 05:55:00 |
  361. | Tue17jul2007        1   17jul2007 06:00:00 |
  362. | Tue17jul2007        1   17jul2007 06:05:00 |
  363. | Tue17jul2007        1   17jul2007 06:10:00 |
  364. | Tue17jul2007        1   17jul2007 06:15:00 |
       |--------------------------------------------|
  935. | Wed01aug2007        2   01aug2007 05:50:00 |
  936. | Wed01aug2007        2   01aug2007 05:55:00 |
  937. | Wed01aug2007        2   01aug2007 06:00:00 |
  938. | Wed01aug2007        2   01aug2007 06:05:00 |
  939. | Wed01aug2007        2   01aug2007 06:10:00 |
  940. | Wed01aug2007        2   01aug2007 06:15:00 |
       +--------------------------------------------+

What you do from here is up to you. I suggest that you look into runby (from SSC) to process each event.

Last edited by Robert Picard; 27 May 2018, 10:04.

Comment

Devra Golbe

Join Date: Apr 2014

Posts: 170
#5

30 May 2018, 06:49

Sophie,

I'm not sure I understand your question. Robert's code is more elaborate than you need, because it creates data sets (gen anndate = runiformint...) as well as demonstrating the subsequent steps. I am assuming you already have the announcement dates stored as 3 variables: year, month, day:

Code:

list +--------------------+ | year month day | |--------------------| 1. | 2007 11 1 | 2. | 2007 10 5 | 3. | 2007 5 7 | +--------------------+

Then this code will create and format the SIF announcement dates

Code:

gen anndate=mdy(month, day, year) format %tdDayDDmonCCYY anndate

And this code will create the id variable

Code:

sort anndate gen ann_id = _n

Devra

Devra Golbe
Professor Emerita, Dept. of Economics
Hunter College, CUNY
Comment
Devra Golbe

Join Date: Apr 2014

Posts: 170
#6

30 May 2018, 11:42

Sophie,

Let me try one more time. Robert gave you code which will do what you asked: use event dates/times to create event windows and estimation windows. He is suggesting that you start with two datasets: one with the exchange rate data, and one with just the announcement dates. If you have not done so already, I suggest that you run Robert's code to see what it does.

Given the current format of your data, you should split the one dataset you have now into two parts: exchange rate data and announcement dates. In the (new) announcement dataset, keep only the dates/times for the observation in which the announcement variable =1. You seemed to be stuck on generating the SIF (Stata internal form) of the date variable and the id variable. I just highlighted those bits of Robert's code which do that. (Note that Stata is case-sensitive; I assumed the variable names are all lower case.) Try it out on your own data and see if it does what you need.

Best,
Devra

Devra Golbe
Professor Emerita, Dept. of Economics
Hunter College, CUNY
Comment
Devra Golbe

Join Date: Apr 2014

Posts: 170
#7

01 Jun 2018, 07:17

Sophie,

Now we are back to my initial points.

It is very difficult to find your question in your most recent post, which mostly quotes from the Princeton page to which you referred in your first point. Please re-read the FAQs. Make your post as succinct as possible, Learn to use code delimiters to make code (quoted from others or your own) as easy to follow as possible.

The only question I see in this long post is the following:

The code below should work if I change some of the variable names to ones I have? E.g. (company_id will be ann_id)

This is a question not about Stata, but about your underlying model.You have not said what that is. The Princeton code estimates a standard market model, where the return on a company's stock is regressed against the return on a market index. I would guess that in your model, the left-hand side variables are the exchange rate (i.e., the exchange rates take the place of the company_id's in the Princeton code.) But again, this is not a Stata question.

Best,
Devra

Devra Golbe
Professor Emerita, Dept. of Economics
Hunter College, CUNY
1 like
Comment
Devra Golbe

Join Date: Apr 2014

Posts: 170
#8

03 Jun 2018, 11:44

Sophie,

From a Stata standpoint, I think your two questions are the same: namely, How do I run a linear regression on a particular set of observations? The command you want is regress. See the Stata manual for syntax-- in particular, the options to restrict your sample to the proper window. (The Princeton web page you cite shows an example of code like this.)

While Stata documention is truly excellent, I think you will want to consult additional resources to help you formulate your analysis. UCLA's IDRE website has lots of useful examples. I found an earlier edition of Lawrence Hamilton's Statistics with Stata to be very helpful as well. There are other books available in the Stata bookstore which may suit your needs better.

Best,
Devra

Devra Golbe
Professor Emerita, Dept. of Economics
Hunter College, CUNY
Comment

Announcement