Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy variable with 4 conditions

    Hello Statalist,

    I have a panel data of stock prices with open high low and close and it looks something like this:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(TradeOpen TradeHigh TradeLow TradeClose)
             26.500053 27.000054000000002          26.500053           26.66672
    26.833387000000002 26.833387000000002          26.500053 26.833387000000002
              26.66672 26.833387000000002 26.416719500000003         26.7500535
    end

    I would like to create a dummy variable that take the value of 1 if it meets at least one of the following four conditions :

    1. (((TradeOpen [_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15)
    2. (((TradeHigh [_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15)
    3. (((TradeLow [_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15)
    4. (((TradeClose [_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15)

    I have tried different codes but I got contradicting results. For instance:

    Code:
    by id: gen dummy1= 0
    by id: replace dummy1= 1 if (((TradeOpen [_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15) | (((TradeHig
    > h [_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15) | (((TradeLow [_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.
    > 15) | (((TradeClose [_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15)
    I do not think that " by id" is necessary in the first line but I just threw it there anyways but might be important for the second line, correct me if I am mistaken. Another code that I have tried is the following:

    Code:
    by id: gen dummy1= 0
    replace if dummy1= ((TradeOpen [_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15 
    replace if dummy1= ((TradeHigh [_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15 
    replace if dummy1= ((TradeLow[_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15 
    replace if dummy1= ((TradeClose[_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15
    Again, I am not sure if each "replace" line overrides the previous one or they all work together.

    Any help in this regard is highly appreciated.

    ​​​​​​​Saad

  • #2
    Hi Saad,

    Welcome to Statalist! And BTW, big thanks (a) for using the -dataex- command to provide a sample of your data, and (b) putting your code within code delimiters! (Many first time posters do neither!)
    1. You are getting different results because the two different sets of code do two different things (short answer is you need to add by id: to each of your lines in your second set of code. Otherwise the TradeOpen[_n] - TradeClose[_n-1] could refer to different stocks.
    2. Your other challenge (I suspect) is that you need to use the absolute value, since TradeLow[_n] - TradeClose[_n-1] is very likely to give you a negative number. It looks like really what you care about is if it had at least a 15% swing either way. Remove the abs() if you don't (i.e. you really do only want it set to 1 for a 15% increase).
    Code:
    gen dummy1= 0  // you're right, you don't need the "by id" on this line
    bysort id (date): replace if dummy1= abs(((TradeOpen[_n] - TradeClose[_n-1]) / TradeClose[_n-1])) >= 0.15  
    bysort id (date): replace if dummy1= abs(((TradeHigh[_n] - TradeClose[_n-1]) / TradeClose[_n-1])) >= 0.15  
    bysort id (date): replace if dummy1= abs(((TradeLow[_n] -  TradeClose[_n-1]) / TradeClose[_n-1])) >= 0.15  
    bysort id (date): replace if dummy1= abs(((TradeClose[_n] -TradeClose[_n-1]) / TradeClose[_n-1])) >= 0.15
    Note: I haven't tested this code (so check for typos and stuff, and *especially* that I have the right number of parentheses.) Also make sure that that there is no space between the variable name and the [_n]

    In "bysort id (date):" putting the date in parentheses tells Stata to sort by date (presumably your data was already sorted by date, but this just makes sure), but then Stata replaces dummy1 "by id:" not "by id date:"

    Also, if you had added "bysort id (date):" to your second set of code, both sets of your code above should have given you the same thing.

    Hope that helps!
    Last edited by David Benson; 29 Oct 2018, 00:41.

    Comment


    • #3
      Hello David,

      Thank you for your valuable comments. You are right, i'm interested to see the swing in today's price ( open high low or close) relative to yesterday's closing price. In other words, I would like to identify days when today's open high low or close is +/- 15% from yesterday's closing price. However, dummy1 will only have days when open high low or close >= 15% from yesterday's closing price whereas dummy2 will only have days when open high low or close < 15% from yesterday's closing price.

      The code should ignore any negative value in dummy1 and only focus in >= 15% and dummy2 captures the negative values <15%

      what do you think?

      Saad
      Last edited by Saad Al; 29 Oct 2018, 00:44.

      Comment


      • #4
        You could do it either way:
        1. Either create a 2nd dummy dummy2 for values <= -0.15 (in that case be sure to remove the absolute value function that I included in my code in #2.), or
        2. Leave the code as is in #2. Because of the absolute value function abs() that code will capture both >=15% increases from previous day closes as well as >=15% decreases, which sounds like what you want.

        Comment


        • #5
          Thank you David. This makes perfect sense.
          I ran the codes but came back with the following error : varlist required
          Any suggestion?

          Comment


          • #6
            Oh, looks like the "if" was in the wrong place. Should be:

            Code:
             gen dummy1= 0
            bysort id (date): replace dummy1 = 1 if abs(((TradeOpen[_n] - TradeClose[_n-1]) / TradeClose[_n-1])) >= 0.15
            bysort id (date): replace dummy1 = 1 if abs(((TradeHigh[_n] - TradeClose[_n-1]) / TradeClose[_n-1])) >= 0.15
            bysort id (date): replace dummy1 = 1 if abs(((TradeLow[_n] -  TradeClose[_n-1]) / TradeClose[_n-1])) >= 0.15
            bysort id (date): replace dummy1 = 1 if abs(((TradeClose[_n] -TradeClose[_n-1]) / TradeClose[_n-1])) >= 0.15

            Comment


            • #7
              Thank you David. Everything works smoothly now.

              Comment


              • #8
                Back with a new challenge
                So as we discussed last time that the following code would identify/capture days with 15% ( or higher) increase from yesterday's closing price.
                Code:
                  
                 gen Dummy1= 0
                bysort id ( Timestamp ): replace Dummy1 = 1 if ((TradeOpen[_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15
                bysort id (Timestamp): replace Dummy1 = 1 if ((TradeHigh[_n] - TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15
                bysort id (Timestamp): replace Dummy1 = 1 if ((TradeLow[_n] -  TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15
                bysort id (Timestamp): replace Dummy1 = 1 if ((TradeClose[_n] -TradeClose[_n-1]) / TradeClose[_n-1]) >= 0.15 
                My question now is that for days when 15% or higher are reached ( which is value of 1 in this case from the above dummy), I would like classify stocks that did not reach the 15% increase into two subgroups: stocks that came within at least 90% of reaching 15% increase and stocks that came within at least 80% but less than 90% of reaching 15% increase.
                Last edited by Saad Al; 02 Nov 2018, 22:00.

                Comment


                • #9
                  Well, a few quick thoughts on that:
                  1. Remember that you need to enclose the change in price part (TradeOpen[_n] - TradeClose[_n-1]) / TradeClose[_n-1] in absolute value to capture the +/- change from prior day closing price.

                  2. I was thinking that you needed to change 0.15 to 1.15 (or subtract one from the entire thing). (i.e. if stock closed yesterday at $100 and opened today at $115, then ($115 - $100) / $100 is 1.15

                  3. 0.8 * 15% = 12%, and 0.9 * 15% = 13.5%, so you could make one >=1.135 & <=1.15, and the other >=1.12 & <1.135

                  4. Lookup Stata local macros, which just lets you come up with a short name for a much longer string. That would let you do the following:

                  Code:
                  local change_open "(TradeOpen[_n] - TradeClose[_n-1]) / TradeClose[_n-1]"
                  local change_high "(TradeHigh[_n] - TradeClose[_n-1]) / TradeClose[_n-1]"
                  // create change_low and change_close as well
                  
                  gen Dummy1 = 0
                  bysort id ( Timestamp ): replace Dummy1 = 1 if abs(`change_open') >=1.15
                  bysort id ( Timestamp ): replace Dummy1 = 1 if abs(`change_high') >=1.15
                  // repeat for the rest
                  
                  * Creating the 90% of +/- 15% change  
                  gen Dummy2 = 0
                  bysort id ( Timestamp ): replace Dummy2 = 1 if abs(`change_open') >=1.135 & abs(`change_open') < 1.15
                  // repeat for the rest
                  Note that when refer to a `change_open' that left mark isn't a single quote, its the forward slanting single quote key ` (to the left of the 1 key on your keyboard).

                  Comment


                  • #10
                    Hello David,

                    1. You are right but I would like to segregate the + from the - to see the effect of each change. In other words, I already created the following code, as you pointed out in an earlier post, to capture the decrease change separately:

                    Code:
                    gen Decrease= 0
                    bysort id ( Timestamp ): replace Decrease = 1 if ((TradeOpen[_n] - TradeClose[_n-1]) / TradeClose[_n-1]) <= -0.15
                    bysort id (Timestamp): replace Decrease = 1 if ((TradeHigh[_n] - TradeClose[_n-1]) / TradeClose[_n-1]) <= -0.15
                    bysort id (Timestamp): replace Decrease = 1 if ((TradeLow[_n] -  TradeClose[_n-1]) / TradeClose[_n-1]) <= -0.15
                    bysort id (Timestamp): replace Decrease = 1 if ((TradeClose[_n] -TradeClose[_n-1]) / TradeClose[_n-1]) <= -0.15
                    2. I might be missing the point here but if we use the example you provided, then ( 115-100 ) / 100 = 0.15

                    3. Totally agree

                    4. Thank you for the local macro, this is really helpful. let us assume that we are only interested in stocks that came within at least 90% of reaching a 15% increase change. One thing to keep in mind is that we are not interested in the whole sample period; rather, only for the remaining stocks that did not hit the 15% increase change on that specific day. To illustrate, say for example on August 3, 2015 we had two stocks out 20 that hit the 15% threshold and 18 stocks did not. I would like to see which stocks out of the remaining 18 stocks came within at least 90% of reaching 15% increase change on August 3, 2015. So I think dummy2 needs to be conditioned on dummy1, because if you apply the following code, we might end up having all stocks that came within at least 90% of reaching a 15% increase change for the entire sample period. I do not know, I could be wrong.

                    Code:
                    gen Dummy2 = 0
                    bysort id ( Timestamp ): replace Dummy2 = 1 if abs(`change_open') >=1.135 & abs(`change_open') < 1.15
                    Sorry for not clarifying this more accurately.

                    Comment


                    • #11
                      Hi Saad,

                      Regarding point #4 in your previous post, you don't need to condition on dummy1. The code is only going to be set to 1 if it came within "least 90% of reaching a 15% increase change" (so >= 13.5% but < 15% increase) the prior day, not over the entire sample period.

                      I've made a toy dataset with 4 stocks (Apple, Tesla, PluralSight, and Dropbox) for 10/1/2018 to 10/5/2018. I've changed some of the stock prices so that they were >12%, >13.5%, and greater than 15% so that I could make sure the code worked. Paste the following into Stata and then run the code and see if that gives you what you need.

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input byte id str11 firm str4 ticker double date float(TradeOpen TradeHigh TradeLow TradeClose)
                      1 "Apple"       "AAPL" 21458 227.95 229.42 226.35 227.26
                      1 "Apple"       "AAPL" 21459 227.25    230 226.63 229.28
                      1 "Apple"       "AAPL" 21460 230.05 233.47 229.78 232.07
                      1 "Apple"       "AAPL" 21461 230.78    235 226.73 227.99
                      1 "Apple"       "AAPL" 21462 227.96    240 220.58 224.29
                      2 "Tesla"       "TSLA" 21458 305.77 311.44 301.05  310.7
                      2 "Tesla"       "TSLA" 21459 313.95 316.84 299.15 301.02
                      2 "Tesla"       "TSLA" 21460 303.33  304.6 291.57  294.8
                      2 "Tesla"       "TSLA" 21461 293.95    331 277.67 281.83
                      2 "Tesla"       "TSLA" 21462 274.65    320    225 261.95
                      3 "PluralSight" "PS"   21458   32.2  32.93  30.91   31.1
                      3 "PluralSight" "PS"   21459  31.04     35  28.54  29.21
                      3 "PluralSight" "PS"   21460  29.31     33   27.8  28.99
                      3 "PluralSight" "PS"   21461  28.95     33  26.72  26.81
                      3 "PluralSight" "PS"   21462  27.08     33  25.43  26.48
                      4 "Dropbox"     "DBX"  21458  26.82  26.93   25.7  26.22
                      4 "Dropbox"     "DBX"  21459  26.38     30  26.03  26.32
                      4 "Dropbox"     "DBX"  21460  26.37     35  26.06  26.25
                      4 "Dropbox"     "DBX"  21461  26.07     33  24.17  24.54
                      4 "Dropbox"     "DBX"  21462  24.69     27  22.92  24.07
                      end
                      format %td date

                      Then here is the code to get the variables you want. NOTE: I only calculated these for percent increases. You would want to add or modify depending on how you want to handle percent decreases. Also note: it turns out we don't need TradeOpen[_n] for the current day because Stata asssumes you mean today unless you specify otherwise. In other words, TradeOpen[_n] = TradeOpen
                      Code:
                      // Create 4 variables that measure percent change from prior day
                      capture drop pct_change_open- change_12 // I put this here to make it easy for me to drop so I could fix code
                      bysort id (date): gen pct_change_open  = (TradeOpen - TradeClose[_n-1]) / TradeClose[_n-1]
                      bysort id (date): gen pct_change_high  = (TradeHigh - TradeClose[_n-1]) / TradeClose[_n-1]
                      bysort id (date): gen pct_change_low   = (TradeLow  - TradeClose[_n-1]) / TradeClose[_n-1]
                      bysort id (date): gen pct_change_close = (TradeClose - TradeClose[_n-1]) / TradeClose[_n-1]
                      
                      // Creating the variable if any of the change was >= 15% , >= 90% of 15% (so 13.5%), >= 80% of 15% (so 12%)
                      gen change_15 = 0
                      gen change_135 = 0 
                      gen change_12 = 0
                      
                      foreach var of varlist pct_change* {
                          replace change_15  = 1 if `var' >= 0.15  & `var' !=.
                          replace change_135 = 1 if `var' >= 0.135 & `var' < 0.15
                          replace change_12  = 1 if `var' >= 0.12  & `var' < 0.135
                          }
                      
                      * Just checking to make sure it worked how I wanted
                      brow id TradeOpen- change_12

                      I've attached a screenshot of what this looks like in Excel. NOTE: I've omitted pct_change_low and pct_change_close because I made it so that all the action was in pct_change_high.

                      Click image for larger version

Name:	Statalist - toy stock data.png
Views:	1
Size:	61.4 KB
ID:	1469194

                      Comment


                      • #12
                        Hi David,

                        Thank you for the example. I apologize for not making myself clear.
                        change_135 and change_12 take 1 if and only if we have at least one observation from the entire sample reaching change_15 on that specific day. Let us look at the below example:
                        Click image for larger version

Name:	illust.jpg
Views:	1
Size:	208.1 KB
ID:	1469262



                        If we look at DBX on 3-Oct- 2018, it had 33%, which is captured from the very first dummy. Now since we have one observation reached change_15, we need 2 dummies change_135 and change_12 where they look for other stocks on 3-Oct- 2018 that meet the conditions of change_135 and change_12. Thus, we find one observation for change_135 for PluralSight and one for tesla. Had we not had the 33% for DBX on 3-oct-2018, then the dummies should ignore both the 14% and 12.6%. Similarly, AAPL on 5-Oct-2018 had 1 for change_135 because PluralSight on 5-Oct-2018 had 1 for change_15. However, If we look at PluralSight on 2-Oct-2018, we see that change_135 did not take 1; why because there was no stock on 2-Oct-2018 reached 15% change. And that why I thought change_135 and change_12 had to be conditioned on change_15.
                        Last edited by Saad Al; 06 Nov 2018, 19:56.

                        Comment


                        • #13
                          Oh, thanks for clarifying. If you only want change_135 and change_12 set to 1 if at least 1 stock has a >=15% gain that day, then you could do something like this:

                          Code:
                          * This part remains the same as before
                          // Creating the variable if any of the change was >= 15% , >= 90% of 15% (so 13.5%), >= 80% of 15% (so 12%)
                          gen change_15 = 0
                          gen change_135 = 0 
                          gen change_12 = 0
                          foreach var of varlist pct_change* {
                              replace change_15  = 1 if `var' >= 0.15  & `var' !=.
                              replace change_135 = 1 if `var' >= 0.135 & `var' < 0.15
                              replace change_12  = 1 if `var' >= 0.12  & `var' < 0.135
                              }
                          
                          * Create count of stocks hitting at least 15% gain that day
                          bysort date: egen day_hit_15  = sum(change_15)
                          label var day_hit_15 "Count of stocks with >=15% gain that day"
                          
                          * Sets change_135 and change_12 back to 0 if no stocks hit >=15% that day
                          replace change_135 = 0 if day_hit_15 ==0
                          replace change_12  = 0 if day_hit_15 ==0

                          Comment


                          • #14
                            Hello David,

                            It seems that the count code is not doing the trick:

                            Code:
                            bysort date: egen day_hit_15 = sum(change_15)
                            label var day_hit_15 "Count of stocks with >=15% gain that day"
                            Because here is an observation that took 1 for change_12 but 0 for change_ 15:

                            Click image for larger version

Name:	limit.jpg
Views:	1
Size:	158.4 KB
ID:	1469527

                            Comment


                            • #15
                              Hi Saad,

                              So above you said that you wanted change_135 or change_12==1 if and only if at least 1 stock hit 15% that day. Because day_hit_15==1 (so at least 1 stock hit >=15%), and that particular stock went up by 12.06%, change_12==1.

                              Code:
                              * Sets change_135 and change_12 back to 0 if no stocks hit >=15% that day
                              replace change_135 = 0 if day_hit_15 ==0
                              replace change_12  = 0 if day_hit_15 ==0
                              But unfortunately, I've devoted all the time I can to this. I think you can take the code I've given you and make it work. Otherwise, add another post and hopefully someone else can help you.

                              Good luck!

                              Comment

                              Working...
                              X