Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep the first X entries based on different dates

    Hi,
    I have a large dataset with closing prices of a list of securities as well as a variable that contains the date of a specific event of interest:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 SecurityID int date double price float EventDate
    "1" 12421  6.25     .
    "1" 12422  6.25     .
    "1" 12423 6.375     .
    "1" 12424  6.25     .
    "1" 12425     6 12425
    "1" 12428 5.875     .
    "1" 12429  6.25 12429
    "2" 12430  6.25     .
    "2" 12431     6     .
    "2" 12432  6.25     .
    "2" 12435  6.25     .
    "2" 12436  6.25     .
    "2" 12437 6.125     .
    "2" 12438     6 12438
    "2" 12439 5.875 12439
    "2" 12442 6.125     .
    "2" 12443     6     .
    "2" 12444 5.875     .
    "3" 12445     6     .
    "3" 12446   6.5     .
    "3" 12449  6.25 12449
    end
    format %td date
    format %td EventDate
    Right now my data contains all security prices for my sample firms for the period 1994 to 2018. As I would like to perform an event study I will only need a specific range of observations around each event date. In particular, for each SecurityID-EventDate combination, I would like to keep the observations 200 days before the event date as well as 20 days after the event date. In cases where there are less than 200 observations before the event date, the maximum number of observations available should be kept.
    First command that came to my mind would be
    Code:
    bysort SecurityID:...
    . However, I have no idea how to account for the EventDate as a point of reference so that any help is much appreciated.

    Kind regards

  • #2
    Several small details seem ambiguous here, but this may get you moving.


    Code:
    bysort SecurityID : egen minEventDate = min(EventDate)
    by SecurityID : egen maxEventDate = max(EventDate)
    by SecurityID : egen countBefore = total(date < minEventDate)
    
    keep if countBefore < 200 | inrange(date, minEventDate - 200, maxEventDate + 20)

    Comment


    • #3
      Thank you very much for your answer.
      The approach works and reduces my sample of 5 million stock price observations to about 1.5 million.
      However, in some cases there are still quite a few unnecessary observations, especially when a company has an event at the beginning of the sample period and one at the end. Thus, performing the event study still takes a lot of time.

      Originally posted by Nick Cox View Post
      Several small details seem ambiguous here, but this may get you moving.
      Let me try to be a little more specific regarding what I am trying to achieve.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str5 SecurityID int date double price float EventDate
      "1" 12421  6.25     .
      "1" 12422  6.25     .
      "1" 12423 6.375     .
      "1" 12424  6.25     .
      "1" 12425     6 12425
      "1" 12428 5.875     .
      "1" 12429  6.25 12429
      "2" 12421  6.25     .
      "2" 12422     6     .
      "2" 12423  6.25     .
      "2" 12424  6.25     .
      "2" 12425  6.25     .
      "2" 12428 6.125     .
      "2" 12429     6 12438
      "2" 12430 5.875 12439
      "2" 12431 6.125     .
      "2" 12433     6     .
      "2" 12434 5.875     .
      "3" 12421     6     .
      "3" 12422   6.5     .
      "3" 12423  6.25 12449
      end
      format %td date
      format %td EventDate
      I have a list of securities and have drawn daily closing prices for each one for the period 01/02/1996 to 12/31/2018. Some of the companies did not exist at the initial date, so for those the first prices are available at later dates. Other companies no longer exist as of 12/31/2018, so in such cases prices are only available until an earlier date.

      For each company in the list, at least one EventDate exists, and often more than one. In the example, for instance, 2 events exist for SecurityID 1. For each event, I want to keep only the first 200 observations that are chronologically before the corresponding Event Date or the 20 observations that are chronologically after the Event Date. In other words, I want to make sure that for each event date of a company, I keep the prices of this company up to -200 days before and +20 days after the event - as long as the data is available for this period of time.
      Is there a way to keep only the desired window of -200/+20 days around each event? In cases where I am not able to get the full time period of 220 days (because the event occurs some days after price data is available for example) I would like to keep the maximum number of available price data.

      Please let me know if something is not clear.

      Comment


      • #4
        Sorry, but I don't see anything different here, except perhaps that the code in #2 works from first event date - 200 to last event data + 20. If you want to treat different events separately, that might select fewer observations and you should modify the code accordingly.

        Comment


        • #5
          Hi Nick,
          Let's assume I have daily closing prices for a company for the period from 1996 to 2019. Let's further assume that the company has two events that are widely spread in time: one event occurs on January 01, 1997 and one event on January 01, 2018. For my event study, I would need the closing prices for the following intervals: [-200; January 01, 1997; +20] and [-200; January 01, 2018; +20].
          All other price observations that fall outside of these two intervals could be deleted. Using the code in 2# will remove the unwanted observations that are more than 200 days before the January 01, 1997 and more than 20 days after the January 01, 2018 but will keep all the redundant observations between those dates or am I misunderstanding something? So instead of working with the min and max Event date of a specific company I would like to consider every single event date and keep the observations that are in the [-200,+20] interval.

          Comment


          • #6
            You're not misunderstanding. As I said in #4 if you want to treat event dates separately you need to modify the code accordingly.

            Comment


            • #7
              Do you mind to give me a hint which command might be useful here? Thank you.

              Comment


              • #8

                This may help a bit.

                Code:
                 
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input str5 SecurityID int date double price float EventDate
                "1" 12421 6.25 .
                "1" 12422 6.25 .
                "1" 12423 6.375 .
                "1" 12424 6.25 .
                "1" 12425 6 12425
                "1" 12428 5.875 .
                "1" 12429 6.25 12429
                "2" 12430 6.25 .
                "2" 12431 6 .
                "2" 12432 6.25 .
                "2" 12435 6.25 .
                "2" 12436 6.25 .
                "2" 12437 6.125 .
                "2" 12438 6 12438
                "2" 12439 5.875 12439
                "2" 12442 6.125 .
                "2" 12443 6 .
                "2" 12444 5.875 .
                "3" 12445 6 .
                "3" 12446 6.5 .
                "3" 12449 6.25 12449
                end
                format %td date
                format %td EventDate
                
                bysort SecurityID (date) : gen counter = sum(EventDate < .) if EventDate < .
                su counter, meanonly
                local max = r(max)
                
                gen tokeep = 0
                gen this_date = .
                
                forval j = 1/`max' {
                replace this_date = cond(counter == `j', EventDate, .)
                bysort SecurityID (this_date) : replace this_date = this_date[1]
                replace tokeep = tokeep | inrange(date, this_date - 200, this_date + 20)
                }
                
                sort SecurityID date
                Or think of it this way:

                Code:
                gen previous = EventDate
                gen next = EventDate
                gen negdate = -date
                bysort SecurityID (date) : replace previous = previous[_n-1] if missing(previous)
                bysort Security (negdate) : replace next = next[_n-1] if missing(next)
                sort SecurityID date
                gen TOKEEP = ((date - previous) < 20) | ((next - date) < 200)
                
                drop negdate
                list, sepby(SecurityID)

                Comment


                • #9
                  Thank you very much!

                  Comment

                  Working...
                  X