Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Saving maximum values within a range and corresponding time identifier

    I have a data set that spans 5 years from 2008-2012. I have four variables year-month-day-hour that together uniquely identify a particular hour in the dataset.

    I used this code:

    foreach x in buschranch campbell {
    egen max_`x' = max(`x'), by (year month day)
    }

    Where buschranch and campbell are variables in my dataset (containing measures of output from wind turbines in each location) and I can generate new variables that uniquely identify the maximum value for each day in the dataset.

    However, I also want to know which corresponding hour in each day is the maximum value... I've looked around at indexing stuff but I'm honestly not finding a solution that gives me what I want.

    Essentially, I want to find the highest output and in what hour in each day, so eventually I can find on average, what is the most productive hour for a wind site. Hopefully that's clear enough, I appreciate any advice!

  • #2
    There is a problem in that you neglect to specify how a tie is to be broken. If we break it by choosing the latest hour, the following should do it.
    Code:
    foreach x in buschranch campbell {
    egen max_`x' = max(`x'), by (year month day)
    egen hour_`x' = max(cond(`x'==max_`x',hour,.))
    }

    Comment


    • #3
      So I tried what you've suggested, it is actually just returning the highest hour value i.e. 23 (since I count from 0 - 23 hours in each day), and not the hour associated with the max'x' value. As for the problem of not specifying how a tie is broken... that's a good point, and I do not know what STATA will do on default. I do not have a good argument for how the tie should be broken, perhaps the first hour that the max value occurs?

      Comment


      • #4
        Originally posted by William Lisowski View Post
        There is a problem in that you neglect to specify how a tie is to be broken. If we break it by choosing the latest hour, the following should do it.
        Code:
        foreach x in buschranch campbell {
        egen max_`x' = max(`x'), by (year month day)
        egen hour_`x' = max(cond(`x'==max_`x',hour,.))
        }
        Sorry didn't quote you in my reply! (Still new to using this forum). But I appreciate your suggestion!

        Comment


        • #5
          Found the solution by just slightly tweaking what you gave me, William.


          Code:

          foreach x in buschranch campbell {
          egen max_`x' = max(`x'), by (year month day)
          egen hour_`x' = max(cond(`x'==max_`x',hour,.)), by (year month day)
          }

          Last edited by Rohini Ghosh; 11 Jun 2017, 16:47.

          Comment


          • #6
            I am glad you were able to understand how the (untested) code I supplied worked and modify it to meet your needs.

            Also, it was not necessary or expected to quote the entire previous post in your answer. I do not know why some members do that.

            The Statalist FAQ linked to from the top of the page has more information on the expectations on Statalist. One of the reasons it recommends including sample data that demonstrates the problem is to allow other readers to easily test thier code rather than requiring them to invent data that the questioner already possesses.

            Finally, in post #3 you wrote

            As for the problem of not specifying how a tie is broken... that's a good point, and I do not know what STATA will do on default. I do not have a good argument for how the tie should be broken, perhaps the first hour that the max value occurs?
            To achieve the first hour, rather than the last hour, the max value occurs, you would change in the second egen command
            Code:
            egen hour_`x' = min(cond(`x'==max_`x',hour,.)), by (year month day)

            Comment


            • #7
              Back-tracking to #1 it seems likely that the wind blows when as well as where it wishes (quotation allusion) and therefore that you have yearly (seasonal) cycles and daily (diurnal) cycles. I'd therefore suggest that looking for hours of maximum whatever is at best a descriptive step and that some model in terms of

              year: you only have 2008 to 2012, so perhaps an indicator for each distinct year

              time of year: work with sine and cosine of day of year. I often use (day - 0.5) / (365 or 366). for days 1 to 365 or 366.

              time of day: work with sine and cosine of time of day. I often use (hour - 0.5) / 24 for hours 1 to 24.

              would be more instructive. That could be quite parsimonious, minimally 8 predictors for a location.

              Calendar month and day of month are irrelevant as such unless people operate differently according to the calendar. The wind really doesn't know it's 12 June, or whatever.

              Comment


              • #8
                Originally posted by William Lisowski View Post
                Finally, in post #3 you wrote



                To achieve the first hour, rather than the last hour, the max value occurs, you would change in the second egen command
                Code:
                egen hour_`x' = min(cond(`x'==max_`x',hour,.)), by (year month day)

                I did in fact change my code just this way to use the first hour. Thank you for your input!

                And I am new to the forum, so I apologize if I'm awkward in replying to posts. I just wanted to be sure I didn't miss something or was unclear.

                Comment

                Working...
                X