Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace an observation under specific condition

    Hello,
    I'm working on some stock data and some stocks are dead or delisted before the end of my sample period. To avoid any survivorship bias, I do want to consider these stocks in my analysis. Is there a way in Stata to replace an observation if all the following observations are equal to zero?

  • #2
    replace with what value?

    Comment


    • #3
      Hi Nick,
      Thanks for your response. To clarify I added the following table to illustrate. I have daily stock data and I want to replace the values that do not change anymore to either 0 or "."
      This means that I want to change the values after the first 10,4 to either 0 or "."
      Date Price
      01/01/00 10
      01/02/00 10,2
      01/03/00 10,3
      01/04/00 10,4
      01/05/00 10,4
      01/05/00 10,4
      01/06/00 10,4

      Comment


      • #4
        Disregarding the duplicate on 01/05/00, I think you can create a new price variable as below


        Code:
        gen price_n=price if price !=price[_n-1]
        The new variable price_n will have missing price from the first 10.4 observation

        Comment


        • #5
          Hi David,
          Thanks for your response. This would indeed work but only in the illustration I used above. Some of my price data does not change for a couple of days and then changes again. That is why I'm wondering if it is possible to loop over all the remaining values and only if they are all zero, they should be replaced with ".". To illustrate: in the table below I don't want all zeros after the first 10,4 because the prices changes to 10,5 later. I want all zeros after the first 10,5 (under the assumption that the prices does not change anymore).
          Date Price
          1 10
          2 10,2
          3 10,3
          4 10,4
          5 10,4
          6 10,4
          7 10,5
          8 10,5
          9 ...

          Comment


          • #6
            Originally posted by Friso Liezenberg View Post
            Hi David,
            Thanks for your response. This would indeed work but only in the illustration I used above. Some of my price data does not change for a couple of days and then changes again. That is why I'm wondering if it is possible to loop over all the remaining values and only if they are all zero, they should be replaced with ".". To illustrate: in the table below I don't want all zeros after the first 10,4 because the prices changes to 10,5 later. I want all zeros after the first 10,5 (under the assumption that the prices does not change anymore).
            Date Price
            1 10
            2 10,2
            3 10,3
            4 10,4
            5 10,4
            6 10,4
            7 10,5
            8 10,5
            9 ...
            I think you can still tweak the code a bit to something like
            Code:
            gen price_n = price if price-price[_n-1] !=0

            Comment


            • #7
              Hi David,
              Thanks for your help. However, I still don't see how this will work. I'm working with more than 30 years of daily data, so using [_n-1] can work in some instances, but it won't work in most cases. Here is an excerpt of my data to illustrate:
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input double RP
               1.4
               1.1
               1.1
               1.1
               1.4
               1.4
               1.4
               1.4
               1.1
               1.3
               1.3
               1.3
               1.3
               1.3
               1.3
               1.1
               1.1
               1.1
               1.1
               1.1
               1.1
               1.1
               1.1
               1.1
               1.1
               1.1
              end
              As you see, the data changes are not very frequent so comparing them to the previous observation would not solve my problem. Again, I want only the last 10 observations to be replaced by zeros because their values don't changes anymore. I do appreciate your help though! Please let me know if you have any other thoughts.
              Last edited by Friso Liezenberg; 24 Feb 2022, 08:15.

              Comment


              • #8
                Does your data have a time variable and multiple panels?

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input double RP
                1.4
                1.1
                1.1
                1.1
                1.4
                1.4
                1.4
                1.4
                1.1
                1.3
                1.3
                1.3
                1.3
                1.3
                1.3
                1.1
                1.1
                1.1
                1.1
                1.1
                1.1
                1.1
                1.1
                1.1
                1.1
                1.1
                end
                
                gen time=_n
                gsort -time
                gen group= sum(RP!=RP[_n-1])
                replace RP=. if group==1 & group[_n+1]!=2
                Res.:

                Code:
                     +--------------------+
                     |  RP   time   group |
                     |--------------------|
                  1. | 1.4      1       6 |
                  2. | 1.1      2       5 |
                  3. | 1.1      3       5 |
                  4. | 1.1      4       5 |
                  5. | 1.4      5       4 |
                  6. | 1.4      6       4 |
                  7. | 1.4      7       4 |
                  8. | 1.4      8       4 |
                  9. | 1.1      9       3 |
                 10. | 1.3     10       2 |
                 11. | 1.3     11       2 |
                 12. | 1.3     12       2 |
                 13. | 1.3     13       2 |
                 14. | 1.3     14       2 |
                 15. | 1.3     15       2 |
                 16. | 1.1     16       1 |
                 17. |   .     17       1 |
                 18. |   .     18       1 |
                 19. |   .     19       1 |
                 20. |   .     20       1 |
                 21. |   .     21       1 |
                 22. |   .     22       1 |
                 23. |   .     23       1 |
                 24. |   .     24       1 |
                 25. |   .     25       1 |
                 26. |   .     26       1 |
                     +--------------------+
                
                .

                Comment


                • #9
                  Dear Andrew,
                  Thanks, problem solved!

                  Comment


                  • #10
                    Originally posted by Friso Liezenberg View Post
                    Hi David,
                    Thanks for your help. However, I still don't see how this will work. I'm working with more than 30 years of daily data, so using [_n-1] can work in some instances, but it won't work in most cases. Here is an excerpt of my data to illustrate:
                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input double RP
                    1.4
                    1.1
                    1.1
                    1.1
                    1.4
                    1.4
                    1.4
                    1.4
                    1.1
                    1.3
                    1.3
                    1.3
                    1.3
                    1.3
                    1.3
                    1.1
                    1.1
                    1.1
                    1.1
                    1.1
                    1.1
                    1.1
                    1.1
                    1.1
                    1.1
                    1.1
                    end
                    As you see, the data changes are not very frequent so comparing them to the previous observation would not solve my problem. Again, I want only the last 10 observations to be replaced by zeros because their values don't changes anymore. I do appreciate your help though! Please let me know if you have any other thoughts.
                    I guess I misunderstood your question. I though you wanted to replace price to "." every time the price didn't change. Thanks Andrew.

                    Comment


                    • #11
                      Just a minor point for Andrew's code in #8: ultimately, sorting by 'time' is more favourable than by '-time'.
                      Code:
                      *gen time = _n
                      *sort time
                      
                      gen a = sum(RP!=RP[_n-1])
                      replace RP =. if a==a[_N] & a==a[_n-1]

                      Comment

                      Working...
                      X