Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a counter variable around a date (event-window)

    Hi,

    I am currently conducting a research project with Stata. As I ran into a problem which I wasn´t able to solve on my own (after browsing through the commands and the forum), I decided to ask the question in this community.

    I am working with a panel data set which contains several firms and daily stock prices.
    Each firm has several events over time (profit-warnings on a specific day). I have created a dummy variable called event, which takes the value of 0 on the day of the event and . otherwise.

    What I´d like to do is the following:
    I want to create an event window of 6 trading days (-2,3) around each event (where 0 is the day of each event) by the firm. As my data only contains trading days (Monday To Friday), each observation (before and after the event should count as a "day").
    I am mainly struggling with this, because each firm contains several events. I could not figure out how to solve this problem. I would be helpful for any guidance on the problem. Thank you.
    I have included a part of my data below:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float Company int Date double price byte event
    1 21479        18.78 .
    1 21480        18.57 .
    1 21481        18.35 .
    1 21482        18.68 .
    1 21483         18.7 0
    1 21486        18.92 .
    1 21487        18.69 .
    1 21488        18.74 .
    1 21489        19.22 .
    1 21490        19.52 .
    1 21493        19.17 .
    1 21494        19.13 .
    1 21495        19.29 .
    1 21496        18.94 .
    1 21497        18.94 .
    1 21500         18.7 .
    1 21501         20.2 .
    1 21502        20.08 .
    1 21503         20.2 .
    1 21504        19.86 .
    1 21507        18.99 .
    1 21508        19.17 .
    1 21509        19.25 .
    1 21510        19.33 .
    1 21511        19.49 .
    1 21514        20.02 .
    1 21515        19.83 0
    1 21516        19.64 .
    1 21517        19.52 .
    1 21518        19.36 .
    1 21521        19.28 .
    1 21522        19.11 .
    1 21523        18.89 .
    1 21524         18.5 .
    1 21525        18.69 .
    1 21528        18.46 .
    1 21529        18.49 .
    1 21530        18.69 .
    1 21531        18.66 .
    1 21532        18.18 .
    1 21535        17.89 .
    1 21536        17.51 .
    1 21537         17.7 .
    2 21479 15.048256504 .
    2 21480 14.239803163 .
    2 21481 14.239803163 .
    2 21482 13.881511341 .
    2 21483 13.789641644 .
    2 21486  13.96419407 .
    2 21487 14.276551042 .
    2 21488 14.708338622 .
    2 21489 15.121752262 .
    2 21490 15.011508625 0
    2 21493 14.855330138 .
    2 21494 15.158500141 .
    2 21495 15.434109235 .
    2 21496 15.608661661 .
    2 21497 15.755653177 .
    2 21500 15.709718328 .
    2 21501 15.663783479 .
    2 21502 15.314678628 .
    2 21503 14.653216803 .
    2 21504 14.717525592 .
    2 21507 14.157120435 .
    2 21508 14.046876798 .
    2 21509 14.037689828 .
    2 21510 14.056063767 .
    2 21511 14.175494375 .
    2 21514 14.313298921 .
    2 21515   13.9550071 .
    2 21516  13.93663316 .
    2 21517 14.304111951 .
    2 21518 13.835576493 .
    2 21521 14.395981649 .
    2 21522 13.155740729 .
    2 21523 13.174114668 .
    2 21524 13.256797396 .
    2 21525 13.311919215 0
    2 21528 12.935253454 .
    2 21529 13.284358306 .
    2 21530 13.348667094 .
    2 21531 13.174114668 .
    2 21532 13.229236487 .
    2 21535 13.045497091 .
    2 21536 12.586148602 .
    2 21537 12.898505575 .
    end
    format %td Date




  • #2
    Note:
    This is the code I have came up with. Is there a more direct way to achieve this?
    What I do not unterstand is, why event[_n--2] creates a datapoint before my event date. Shouldn´t it be after the event?
    Code:
    gen eventwindow=.
    forvalues i=-2/3 {
        
        bysort Company(Date): replace eventwindow=`i' if event[_n-`i']==0
        
        }
    Last edited by David Moeller; 18 Nov 2024, 14:10.

    Comment


    • #3
      _n--2 is the same as _n+2. If event[_n+2] == 0, that means that the observation two steps later has event == 0. That's the same as saying that the current observation is two steps earlier than the event.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        _n--2 is the same as _n+2. If event[_n+2] == 0, that means that the observation two steps later has event == 0. That's the same as saying that the current observation is two steps earlier than the event.
        Thank you for your response. I got a little confused. So the logic of the code should be correct, right?

        Another question I would like to ask, is how I could calculate the returns for each event (over the 6 day period) in this case.

        My first intuition would be, to group by Company and event date, to get a unique event ID. As shown below:

        Code:
        replace event=Date if event==0    
        format event %td
        egen event_id=group(Company event)
        gen ret=log(price[_n]/price[_n-1])*100
        However I fail to understand how I can then calculate the cumulative returns over the 6 day period (-2,3) for each event under every Company ID.

        This is the rest which I have came up with:

        Code:
        forvalues i=-2/3 {
            
            bysort Company(Date): replace event_id=event_id[_n-`i'] if eventwindow==`i'
            
            }
            
        bysort event_id(eventwindow): egen cum_return=sum(ret) if event_id!=. 
        sort Company Date
        Last edited by David Moeller; 18 Nov 2024, 15:29.

        Comment


        • #5
          So the logic of the code should be correct, right?
          Yes, it's correct.

          As for the calculation of returns, your code proposes to use a formula that dates back to the days before calculators and computers were readily available. There is no justification for its continued use. And there are two good reasons against using it. First, and most important, it is only an approximation to the correct value, and the approximation gets worse as the actual return gets larger. Second, on a computer or calculator it is slower than the direct calculation. Your calculation of the period return by adding the individual daily returns is yet another approximation, and an even worse one than the first, in that not only does it perform poorly as the return itself grows, but it also deteriorates rapidly as the number of days in the period grows. Now, at least in the example data you show, your returns are pretty small, and your event windows are only 6 days, so you probably wouldn't go wrong to a material extent using the approximate formulas. But there is no advantage to doing it that way, and if there do happen to be some events in the full data set where the returns get large, you would have meaningfully wrong results for them.

          The direct calculation of the return over any time period is 100*(ending price/beginning price - 1).

          Here's some code that applies that to your data:
          Code:
          by Company (Date): gen event_num = ///
              sum(!missing(eventwindow) & missing(eventwindow[_n-1])) ///
              if !missing(eventwindow)
              
          by Company event_num (Date), sort: gen event_window_cum_return = 100*(price[_N]/price[1] - 1) ///
              if !missing(event_num)
          sort Company Date
          Note: My event variable, which I named event_num, is slightly different from yours in that mine restarts at 1 within each Company. This will prove more convenient for your further work with this data as, I assume, you will generally want to deal with each company's events separately and there will be little, if any, call to perform calculations involving events from all of the different companies.


          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Yes, it's correct.

            As for the calculation of returns, your code proposes to use a formula that dates back to the days before calculators and computers were readily available. There is no justification for its continued use. And there are two good reasons against using it. First, and most important, it is only an approximation to the correct value, and the approximation gets worse as the actual return gets larger. Second, on a computer or calculator it is slower than the direct calculation. Your calculation of the period return by adding the individual daily returns is yet another approximation, and an even worse one than the first, in that not only does it perform poorly as the return itself grows, but it also deteriorates rapidly as the number of days in the period grows. Now, at least in the example data you show, your returns are pretty small, and your event windows are only 6 days, so you probably wouldn't go wrong to a material extent using the approximate formulas. But there is no advantage to doing it that way, and if there do happen to be some events in the full data set where the returns get large, you would have meaningfully wrong results for them.

            The direct calculation of the return over any time period is 100*(ending price/beginning price - 1).

            Here's some code that applies that to your data:
            Code:
            by Company (Date): gen event_num = ///
            sum(!missing(eventwindow) & missing(eventwindow[_n-1])) ///
            if !missing(eventwindow)
            
            by Company event_num (Date), sort: gen event_window_cum_return = 100*(price[_N]/price[1] - 1) ///
            if !missing(event_num)
            sort Company Date
            Note: My event variable, which I named event_num, is slightly different from yours in that mine restarts at 1 within each Company. This will prove more convenient for your further work with this data as, I assume, you will generally want to deal with each company's events separately and there will be little, if any, call to perform calculations involving events from all of the different companies.

            Thanks again for your help!
            I know that calculating cumulative returns (by adding them up) is mathematically incorrect, however if I understand the finance literature correctly, some people are still doing it (e.g. cumulative abnormal returns).

            One issue that somehow remains with your approach, is what to do if the last price datapoint in an eventwindow is missing. In this case the entire return over the event period would be missing as well. This would be unfortunate if the price one day before the event window end would be available. The same problem could arise if the first price datapoint would be missing.

            Would it be possible to construct a remedy for this?
            I have thought about the following but failed to implement it:

            Instead of calculating the returns by dividing the last price by the first price and then subtracting 1, I would first generate a return variable and then generate a variable which contains the returns+1. Then one could calculate the product of the returns over the eventwindow period and subtract 1.








            Comment


            • #7
              One issue that somehow remains with your approach, is what to do if the last price datapoint in an eventwindow is missing. In this case the entire return over the event period would be missing as well. This would be unfortunate if the price one day before the event window end would be available.
              You did not previously mention the possibility that returns could be missing, and the example data did not have any such situations, so I did not write code that would handle this problem.

              The the code I proposed earlier can be modified to use the last non-missing return in calculating the cumulative return.

              Code:
              by Company (Date), sort: gen event_num = ///
                  sum(!missing(eventwindow) & missing(eventwindow[_n-1])) ///
                  if !missing(eventwindow)
                  
              by Company event_num (Date), sort: gen final_price = price if _n == 1 ///
                  & !missing(event_num)
              by Company event_num (Date): replace final_price = ///
                  cond(missing(price), final_price[_n-1], price) if _n > 1 & !missing(event_num)
              by Company event_num (Date): gen window_cum_return = ///
                  100*(final_price[_N]/price[1] - 1)
              sort Company Date
              I know that calculating cumulative returns (by adding them up) is mathematically incorrect, however if I understand the finance literature correctly, some people are still doing it (e.g. cumulative abnormal returns).
              That may be. I'm not in finance and don't read that literature. But I think cumulative abnormal returns is something different from a cumulative return over a period.

              Comment

              Working...
              X