Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate holding period

    Hello,

    I have a dataset of individual trading data representing investors buying and selling behavior. My dataset consists of investor identifier, stock identifier (cusip), date of purchase or sale of stocks, amount of stocks purchased or sold (negative sign indicates sale of stock) and the price of the stock. Each time a stock is sold in the investor stock portfolio, I additionally evaluate the other stocks in his portfolio using closing prices from CRSP. These "unrealized stocks" are the stocks where the variable "stock" == . . An unrealized stock is a stock the investor has in his portfolio but does not sell on the particular date.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(investor cusip) str9 dates float(stock price date)
    1 1234 "19910108"  100  10 11330
    1 4321 "19910203"   50  25 11356
    1 1234 "19910505" -100  12 11447
    1 4321 "19910505"    .  30 11447
    2 5678 "19920407"  200  75 11785
    2 5678 "19920409"   50  72 11787
    2 5678 "19920802" -250  50 11902
    3 1234 "19921105"  100 100 11997
    3 1234 "19921111"  100  85 12003
    3 1234 "19921212" -150 110 12034
    3 1234 "19921220"  -50 150 12042
    4 8910 "19930218"  100 500 12102
    4 7891 "19930509"   25  10 12182
    4 7891 "19930606"   50   8 12210
    4 8910 "19931006"  -75 510 12332
    4 7891 "19931006"    .  15 12332
    4 7891 "19931020"   30  11 12346
    4 8910 "19931020"    . 525 12346
    4 8910 "19931130"  -25 530 12387
    4 7891 "19931130"    .  16 12387
    5 4657 "19940506"   40 260 12544
    5 8585 "19940512"   50  25 12550
    5 4657 "19940513"   20 250 12551
    5 8585 "19940601"  -20  15 12570
    5 4657 "19940601"    . 251 12570
    5 8585 "19940605"   30  20 12574
    5 4657 "19940707"  -50 300 12606
    5 8585 "19940707"    .  26 12606
    5 8585 "19940725"   50  30 12624
    5 4567 "19940725"    . 330 12624
    6 1234 "19950506"  500 100 12909
    6 1234 "19950606"  250 110 12940
    6 1234 "19950607" -750 150 12941
    6 1234 "19950707"  100 110 12971
    6 1234 "19950709" -100 115 12973
    7 4567 "19950111"    5  10 12794
    7 4567 "19950115"   10   8 12798
    7 4567 "19950120"   -7   9 12803
    7 4567 "19950201"    8  15 12815
    7 4567 "19950209"  -16  16 12823
    end
    format %td date
    Now, I want to calculate the number of days an investor has been holding the stock. So, I want to take the purchase date and calculate the difference between the sale date and I want to take the inital purchase date if the investor bought multiple times the same stock the purchase price used to calculate holding period should be the date were the stock was first bought. But I do not know how to do this properly maybe someone could help me here.

    Thanks in advance!

  • #2
    I already found a solution. Below you see my code:
    Code:
    egen pair = group(investor cusip)
    
    
    local count=_N
    local j = 1
    local k = 0
    local i = 1 
    local u = 1
    gen buydate = .
    
    
    
    forvalues j = `i'/`count'{
        if pair[`i']==pair[`u'+1] & stock[`i']>0 & stock[`u'+1]<0 | stock[`u'+1]==.  {
            local u = `u'+1    
            replace buydate = date[`i'] in `u'
            
        }
        else if pair[`i']==pair[`u'+1] & stock[`i']>0 & stock[`u'+1]>0 {
            local u = `u'+1
                    
        }
        else if pair[`i']!=pair[`u'+1]{
            
            local i = `u'+1
            local u = `u'+1
            //local u = `u'+1
        }
                else {
                    
                }    
    }
    
    gen hp = date-buydate
    This soultion might not be the most elegant one but it works for me.

    Comment


    • #3
      It seems your questions have not been getting much helpful responses (post 1, post 2). I did look at all of them (and helped in one) and was confused in different ways. I'd like to present some suggestion to the way your questions are phrased, and hopefully that may help improving the chance of getting what you wanted.

      Generally, your questions are asking for something that is much more complicated than how it was explained in the questions. That is, the rules/algorithms are very incomplete.

      For instance, this:
      Code:
      3 1234 "19921105"  100 100 11997
      3 1234 "19921111"  100  85 12003
      3 1234 "19921212" -150 110 12034
      3 1234 "19921220"  -50 150 12042
      Thread #1 says:
      So, I want to take the purchase date and calculate the difference between the sale date and I want to take the inital purchase date if the investor bought multiple times the same stock the purchase price used to calculate holding period should be the date were the stock was first bought.
      First bought is on 11997, but without the extra 100 stocks purchased on 12003, then the sale on 12034 would not be possible. Then which one is the actual start date?

      Another case:
      Code:
      7 4567 "19950111"    5  10 12794
      7 4567 "19950115"   10   8 12798
      7 4567 "19950120"   -7   9 12803
      7 4567 "19950201"    8  15 12815
      7 4567 "19950209"  -16  16 12823
      The first purchase was on 12794, but the sale on 12803 depleted the first 5 stocks. In that case, should the sale on 12823 consider 12798 as the purchase date or 12794 as the purchase date?

      When some users posted complicated scenario like this, what will be helpful is if the person can also add a column called "wanted" (as what I would like Stata to return) to indicate what you expected to get. That way other users may be more able to figure out the detailed thought process.

      Here is an example on what I meant:
      Code:
      clear
      input float(investor cusip) str9 dates float(stock price date wanted)
      1 1234 "19910108"  100 10 11330 .
      1 1234 "19910505" -100 12 11447 117
      1 4321 "19910203"   50 25 11356 .
      1 4321 "19910505"    . 30 11447 .
      2 5678 "19920407"  200 75 11785 .
      2 5678 "19920409"   50 72 11787 .
      2 5678 "19920802" -250 50 11902 117
      end
      format %td date
      <><><> EDIT <><><>
      Good, I saw you found the solution in #2. Congrats.

      Comment


      • #4
        Jana He I just ran your code on your sample data, and I just wanted to double-check that you are indeed getting what you wanted.

        For instance, for investor 1, we get
        Code:
          +---------------------------------------------------------------+
          | investor   cusip   stock   price        date     buydate   hp |
          |---------------------------------------------------------------|
          |        1    1234     100      10   08jan1991           .    . |
          |        1    1234    -100      12   05may1991           .    . |
          |        1    4321      50      25   03feb1991           .    . |
          |        1    4321       .      30   05may1991   05may1991    0 |
          +---------------------------------------------------------------+
        Is this correct? Should buydate for cusip 4321 not be 03feb1991?

        Also, for many stocks and/or investors, there is no buydate and no hp computed (all missing values). This includes cusip 1234 above for investor 1, but also all observations for investors 6 and 7, among other examples. Is this expected?

        Comment


        • #5
          Hemanshu Kumar I am quite confused about your result. If I run my code I get the following output:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float(investor cusip) str9 dates float(stock price date pair buydate hp)
          1 1234 "19910108"  100  10 11330  1     .   .
          1 1234 "19910505" -100  12 11447  1 11330 117
          1 4321 "19910203"   50  25 11356  2     .   .
          1 4321 "19910505"    .  30 11447  2 11356  91
          2 5678 "19920407"  200  75 11785  3     .   .
          2 5678 "19920409"   50  72 11787  3     .   .
          2 5678 "19920802" -250  50 11902  3 11785 117
          3 1234 "19921105"  100 100 11997  4     .   .
          3 1234 "19921111"  100  85 12003  4     .   .
          3 1234 "19921212" -150 110 12034  4 11997  37
          3 1234 "19921220"  -50 150 12042  4 11997  45
          4 7891 "19930509"   25  10 12182  5     .   .
          4 7891 "19930606"   50   8 12210  5     .   .
          4 7891 "19931006"    .  15 12332  5 12182 150
          4 7891 "19931020"   30  11 12346  5     .   .
          4 7891 "19931130"    .  16 12387  5 12182 205
          4 8910 "19930218"  100 500 12102  6     .   .
          4 8910 "19931006"  -75 510 12332  6 12102 230
          4 8910 "19931020"    . 525 12346  6 12102 244
          4 8910 "19931130"  -25 530 12387  6 12102 285
          5 4657 "19940506"   40 260 12544  7     .   .
          5 4657 "19940513"   20 250 12551  7     .   .
          5 4657 "19940601"    . 251 12570  7 12544  26
          5 4657 "19940707"  -50 300 12606  7 12544  62
          5 4657 "19940725"    . 330 12624  7 12544  80
          5 8585 "19940512"   50  25 12550  8     .   .
          5 8585 "19940601"  -20  15 12570  8 12550  20
          5 8585 "19940605"   30  20 12574  8     .   .
          5 8585 "19940707"    .  26 12606  8 12550  56
          5 8585 "19940725"   50  30 12624  8     .   .
          6 1234 "19950506"  500 100 12909  9     .   .
          6 1234 "19950606"  250 110 12940  9     .   .
          6 1234 "19950607" -750 150 12941  9 12909  32
          6 1234 "19950707"  100 110 12971  9     .   .
          6 1234 "19950709" -100 115 12973  9 12909  64
          7 4567 "19950111"    5  10 12794 10     .   .
          7 4567 "19950115"   10   8 12798 10     .   .
          7 4567 "19950120"   -7   9 12803 10 12794   9
          7 4567 "19950201"    8  15 12815 10     .   .
          7 4567 "19950209"  -16  16 12823 10 12794  29
          end
          format %td date
          And this is exactly what I wanted

          Comment


          • #6
            No, that is not the result I get. Perhaps the difference is that you are running your code on your full dataset while I am running it only on the sample data.

            If it works for you, great!

            Comment


            • #7
              Ken Chui , it is not easy to explain my problem, so thank you for your advise. I created a new variable showing my desired output:

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float(investor cusip) str9 dates float(stock price date pair wanted)
              1 1234 "19910108"  100  10 11330  1        .
              1 1234 "19910505" -100  12 11447  1       10
              1 4321 "19910203"   50  25 11356  2        .
              1 4321 "19910505"    .  30 11447  2       50
              2 5678 "19920407"  200  75 11785  3        .
              2 5678 "19920409"   50  72 11787  3        .
              2 5678 "19920802" -250  50 11902  3     74.4
              3 1234 "19921105"  100 100 11997  4        .
              3 1234 "19921111"  100  85 12003  4        .
              3 1234 "19921212" -150 110 12034  4     92.5
              3 1234 "19921220"  -50 150 12042  4     92.5
              4 7891 "19930509"   25  10 12182  5        .
              4 7891 "19930606"   50   8 12210  5        .
              4 7891 "19931006"    .  15 12332  5     8.67
              4 7891 "19931020"   30  11 12346  5        .
              4 7891 "19931130"    .  16 12387  5     9.33
              4 8910 "19930218"  100 500 12102  6        .
              4 8910 "19931006"  -75 510 12332  6      500
              4 8910 "19931020"    . 525 12346  6      500
              4 8910 "19931130"  -25 530 12387  6      500
              5 4657 "19940506"   40 260 12544  7        .
              5 4657 "19940513"   20 250 12551  7        .
              5 4657 "19940601"    . 251 12570  7   256.67
              5 4657 "19940707"  -50 300 12606  7   256.67
              5 4657 "19940725"    . 330 12624  7   256.67
              5 8585 "19940512"   50  25 12550  8        .
              5 8585 "19940601"  -20  15 12570  8       25
              5 8585 "19940605"   30  20 12574  8        .
              5 8585 "19940707"    .  26 12606  8   23.125
              5 8585 "19940725"   50  30 12624  8        .
              6 1234 "19950506"  500 100 12909  9        .
              6 1234 "19950606"  250 110 12940  9        .
              6 1234 "19950607" -750 150 12941  9   103.33
              6 1234 "19950707"  100 110 12971  9        .
              6 1234 "19950709" -100 115 12973  9 104.1176
              7 4567 "19950111"    5  10 12794 10        .
              7 4567 "19950115"   10   8 12798 10        .
              7 4567 "19950120"   -7   9 12803 10     8.67
              7 4567 "19950201"    8  15 12815 10        .
              7 4567 "19950209"  -16  16 12823 10  10.8696
              end
              format %td date
              I want to get average purchase prices to be able to calculate stock returns. I dropped my approach in thread #1 because it was just to complicated and I now want to calculate the returns using the approach in thread #2.
              I make a few examples:
              1.) If we take a look at investor 1: This example is straight forward, the investor bought 100 shares for a price of 10$ and then sold his whole position a few months later for 12$. To be able to calculate returns I need the purchase price which is 10$ in this case and the sell price which is 12$ in this case, therefore the return is 0.2.

              2.) Now we look at investor 2: He bought 200 shares for 75$ and then a few days later the same stock but 50 pieces for a price of 72$. A few months later he sells his whole position (250 stocks) for a price of 50$. Again, to calculate a return I need a base price, in this case I cannot just use the price of the first purchase (i.e. 75$) because multiple purchases (in this case two purchases) occured before the whole position was sold. Hence, I want to calculate the weighted average purchase price: (200/250)*75 + (50/250)*72.

              I hope my problem is clear now.

              Comment

              Working...
              X