Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to collapse a dataset with day, month and year time information to get the annual maximums?

    I want to collapse the following dataset so that I get the annual maximum for the variable <vmax_sust> for each county (fips) year.

    In using the code: collapse (firstnm) storm_id (max) vmax_sust , by(fips year)

    But, it does not give me the annual maximums. In fact it returns spurious data for the vmax_sust variable.

    My goal is to get the maximum value for the variable vmax_sust (hurricane windspeeds) for each fips (county) for each year.

    clear
    input long fips int year byte(Month Day) double vmax_sust str15 storm_id
    1001 1988 8 5 .8327174690114912 "Alberto-1988"
    1001 1988 8 8 5.891180149342436 "Beryl-1988"
    1001 1988 8 28 2.835593650219381 "Chris-1988"
    1001 1988 9 10 6.558882994045002 "Florence-1988"
    1001 1988 9 19 2.158690444071083 "Gilbert-1988"
    1001 1988 11 22 2.1133002252330346 "Keith-1988"
    1001 1988 6 2 .25255478965267064 "AL13-1988"
    1001 1988 8 14 3.4597951584555204 "AL14-1988"
    1001 1988 9 4 .5790864756988259 "AL17-1988"
    1001 1989 7 1 1.7156892228310678 "Allison-1989"
    1001 1989 7 31 1.9089653343691888 "Chantal-1989"
    1001 1989 9 22 3.7151069123287637 "Hugo-1989"
    1001 1989 10 16 1.52327169695863 "Jerry-1989"
    1001 1990 5 25 .5206064533004958 "AL01-1990"
    1001 1990 7 24 .6744821215001495 "Bertha-1990"
    1001 1990 10 12 2.48408427220947 "Marco-1990"
    1001 1991 7 1 1.657726920449161 "Ana-1991"
    1001 1991 8 18 .9385441920011537 "Bob-1991"
    1001 1991 10 16 1.521363932730924 "Fabian-1991"
    1001 1991 10 31 .5480267483520787 "AL12-1991"
    1001 1992 6 26 1.5816559413063978 "AL02-1992"
    1001 1992 8 27 7.0863152511809195 "Andrew-1992"
    1001 1992 9 25 .5875413205291702 "Danielle-1992"
    1001 1992 9 29 1.5385736614871546 "Earl-1992"
    1001 1993 6 1 .5106843977863078 "AL01-1993"
    1001 1993 6 19 .9359790982203917 "Arlene-1993"
    1001 1993 8 31 .5926958111109527 "Emily-1993"
    1001 1994 7 6 12.246908289883358 "Alberto-1994"
    1001 1994 7 20 1.3012757408550957 "AL02-1994"
    1001 1994 8 16 10.549146769843336 "Beryl-1994"
    1001 1994 11 21 2.517942196013082 "Gordon-1994"
    1001 1995 6 5 7.302951638092455 "Allison-1995"
    1001 1995 7 28 1.5942668507082345 "Dean-1995"
    1001 1995 8 3 14.505203248147732 "Erin-1995"
    1001 1995 8 9 .3650826137750754 "Gabrielle-1995"
    1001 1995 8 26 5.0152580679006284 "Jerry-1995"
    1001 1995 10 4 24.522517545202966 "Opal-1995"
    1001 1996 6 18 1.4251182143496155 "Arthur-1996"
    1001 1996 7 12 1.833287843025197 "Bertha-1996"
    1001 1996 8 31 .3749269132817115 "Edouard-1996"
    1001 1996 9 6 1.8069739505183142 "Fran-1996"
    1001 1996 10 7 5.921564822933222 "Josephine-1996"
    1001 1997 5 31 .818220838559277 "AL01-1997"
    1001 1997 6 30 1.0191735892200384 "Ana-1997"
    1001 1997 7 21 11.39476656068109 "Danny-1997"
    1001 1998 8 26 1.5203057690579347 "Bonnie-1998"
    1001 1998 8 21 .9068415868488243 "Charley-1998"
    1001 1998 9 3 10.87367696022201 "Earl-1998"
    1001 1998 9 12 .8595113137319648 "Frances-1998"
    1001 1998 9 29 10.83781507827773 "Georges-1998"
    1001 1998 9 20 3.7449501515327754 "Hermine-1998"
    1001 1998 11 5 1.2437098627715475 "Mitch-1998"
    1001 1999 8 22 .41870880660694015 "Bret-1999"
    1001 1999 9 5 1.621244175413792 "Dennis-1999"
    1001 1999 9 6 .275698741322845 "AL07-1999"
    1001 1999 9 15 2.020706257467387 "Floyd-1999"
    1001 1999 9 20 2.4815852034918944 "Harvey-1999"
    1001 1999 10 17 2.553763303785787 "Irene-1999"
    1001 2000 8 10 .9039929245918998 "AL04-2000"
    1001 2000 8 13 .38969383907618577 "Beryl-2000"
    1001 2000 9 9 1.3997555362475977 "AL09-2000"
    1001 2000 9 18 4.50579709951368 "Gordon-2000"
    1001 2000 9 22 7.096998522056075 "Helene-2000"
    1001 2000 10 4 1.9356057365252906 "Leslie-2000"
    1001 2001 6 12 7.646661611016898 "Allison-2001"
    1001 2001 8 6 13.832581712491114 "Barry-2001"
    1001 2001 9 14 2.3746839154000936 "Gabrielle-2001"
    1001 2001 10 12 .8590523662842149 "Karen-2001"
    1001 2001 11 5 .460478153814916 "Michelle-2001"
    1001 2002 7 14 1.1018699470386013 "Arthur-2002"
    1001 2002 8 5 4.9979787835327 "Bertha-2002"
    1001 2002 8 5 1.074341387660061 "Cristobal-2002"
    1001 2002 9 6 2.5061863143721457 "Edouard-2002"
    1001 2002 9 5 1.0930976520406661 "Fay-2002"
    1001 2002 9 9 .824225121356799 "Gustav-2002"
    1001 2002 9 15 8.88202069017716 "Hanna-2002"
    1001 2002 9 26 8.022234573431478 "Isidore-2002"
    1001 2002 10 11 2.1884515122872754 "Kyle-2002"
    1001 2002 10 4 4.032301526740939 "Lili-2002"
    1001 2003 7 1 8.471467089264408 "Bill-2003"
    1001 2003 7 14 1.4474974304683268 "Claudette-2003"
    1001 2003 7 26 2.7631415622088173 "AL07-2003"
    1001 2003 8 14 2.510974277166262 "Erika-2003"
    1001 2003 9 2 1.3702786781176195 "Grace-2003"
    1001 2003 9 5 3.9498177819390135 "Henri-2003"
    1001 2003 9 18 1.3186509909849549 "Isabel-2003"
    1001 2004 8 1 2.1946031068109266 "Alex-2004"
    1001 2004 8 12 5.016294555544713 "Bonnie-2004"
    1001 2004 8 14 2.76303392153772 "Charley-2004"
    1001 2004 9 7 8.945014823783959 "Frances-2004"
    1001 2004 8 29 2.9234842317908516 "Gaston-2004"
    1001 2004 8 30 .5307764048338431 "Hermine-2004"
    1001 2004 9 16 20.98227052089195 "Ivan-2004"
    1001 2004 9 27 6.127103868075571 "Jeanne-2004"
    1001 2004 10 10 3.8890480216200305 "Matthew-2004"
    1001 2005 6 12 11.00373855775045 "Arlene-2005"
    1001 2005 7 6 10.757665507334547 "Cindy-2005"
    1001 2005 7 10 18.030644943663688 "Dennis-2005"
    1001 2005 7 18 .49804747144402295 "Emily-2005"
    1001 2005 8 29 12.271341171178507 "Katrina-2005"
    end
    [/CODE]


  • #2
    The problem is that -(firstnm) storm_id just gives you the name of the storm that happens to be first in the current sort order--it is in no way the storm associated with the maximum vmax_sust. What you want is best not done with -collapse-. You want:
    Code:
    assert !missing(vmax_sust)
    by fips year (vmax_sust), sort: keep if _n == _N

    Comment


    • #3
      Clyde,
      Can you please explain the functionality of your code? Is there a way to adapt it to means as well?

      Comment


      • #4
        So first the data are sorted into groups defined by combination of fips and year, and then within those groups in ascending order of vmax_sust. So, within each fips-year group, the event with the highest vmax_sust is sorted last. And -keep if _n == _N- retains that one from each group. I forgot to note that if two or more of the events are tied for the highest value of vmax_sust, then this code will select one of them at random. If that is unsatisfactory, you need to specify a rule for breaking ties and the code can be modified accordingly.

        I don't see how this code can be modified to deal with means, nor do I even understand how your question can be generalized to means. After all, except by rare coincidence, there will not be any one event whose value is the group mean, so there is no way to select a single observation to represent the fips-year group in that way.

        Comment


        • #5
          Clyde,

          Thanks for the explanation. What I mean, is that is there a way to replicate

          collapse (mean) vmax_sust, by (year fips)

          using your code as well? Also, if you have any pointers as to how can I replicate the same using the collapse command, that would be great.

          Comment


          • #6
            I am a little confused by the changing questions here. #1 states that you're interested in maxima. Now in #5 you have changed the query to wanting means.

            You can easily get both from a single collapse, but not I think also the identifier of the storm with the highest maximum, as Clyde Schechter explained.

            But here is some technique showing how you can have it both ways.

            Code:
            clear
            input long fips int year byte(Month Day) double vmax_sust str15 storm_id
            1001 1988 8 5 .8327174690114912 "Alberto-1988"
            1001 1988 8 8 5.891180149342436 "Beryl-1988"
            1001 1988 8 28 2.835593650219381 "Chris-1988"
            1001 1988 9 10 6.558882994045002 "Florence-1988"
            1001 1988 9 19 2.158690444071083 "Gilbert-1988"
            1001 1988 11 22 2.1133002252330346 "Keith-1988"
            1001 1988 6 2 .25255478965267064 "AL13-1988"
            1001 1988 8 14 3.4597951584555204 "AL14-1988"
            1001 1988 9 4 .5790864756988259 "AL17-1988"
            1001 1989 7 1 1.7156892228310678 "Allison-1989"
            1001 1989 7 31 1.9089653343691888 "Chantal-1989"
            1001 1989 9 22 3.7151069123287637 "Hugo-1989"
            1001 1989 10 16 1.52327169695863 "Jerry-1989"
            1001 1990 5 25 .5206064533004958 "AL01-1990"
            1001 1990 7 24 .6744821215001495 "Bertha-1990"
            1001 1990 10 12 2.48408427220947 "Marco-1990"
            1001 1991 7 1 1.657726920449161 "Ana-1991"
            1001 1991 8 18 .9385441920011537 "Bob-1991"
            1001 1991 10 16 1.521363932730924 "Fabian-1991"
            1001 1991 10 31 .5480267483520787 "AL12-1991"
            1001 1992 6 26 1.5816559413063978 "AL02-1992"
            1001 1992 8 27 7.0863152511809195 "Andrew-1992"
            1001 1992 9 25 .5875413205291702 "Danielle-1992"
            1001 1992 9 29 1.5385736614871546 "Earl-1992"
            1001 1993 6 1 .5106843977863078 "AL01-1993"
            1001 1993 6 19 .9359790982203917 "Arlene-1993"
            1001 1993 8 31 .5926958111109527 "Emily-1993"
            1001 1994 7 6 12.246908289883358 "Alberto-1994"
            1001 1994 7 20 1.3012757408550957 "AL02-1994"
            1001 1994 8 16 10.549146769843336 "Beryl-1994"
            1001 1994 11 21 2.517942196013082 "Gordon-1994"
            1001 1995 6 5 7.302951638092455 "Allison-1995"
            1001 1995 7 28 1.5942668507082345 "Dean-1995"
            1001 1995 8 3 14.505203248147732 "Erin-1995"
            1001 1995 8 9 .3650826137750754 "Gabrielle-1995"
            1001 1995 8 26 5.0152580679006284 "Jerry-1995"
            1001 1995 10 4 24.522517545202966 "Opal-1995"
            1001 1996 6 18 1.4251182143496155 "Arthur-1996"
            1001 1996 7 12 1.833287843025197 "Bertha-1996"
            1001 1996 8 31 .3749269132817115 "Edouard-1996"
            1001 1996 9 6 1.8069739505183142 "Fran-1996"
            1001 1996 10 7 5.921564822933222 "Josephine-1996"
            1001 1997 5 31 .818220838559277 "AL01-1997"
            1001 1997 6 30 1.0191735892200384 "Ana-1997"
            1001 1997 7 21 11.39476656068109 "Danny-1997"
            1001 1998 8 26 1.5203057690579347 "Bonnie-1998"
            1001 1998 8 21 .9068415868488243 "Charley-1998"
            1001 1998 9 3 10.87367696022201 "Earl-1998"
            1001 1998 9 12 .8595113137319648 "Frances-1998"
            1001 1998 9 29 10.83781507827773 "Georges-1998"
            1001 1998 9 20 3.7449501515327754 "Hermine-1998"
            1001 1998 11 5 1.2437098627715475 "Mitch-1998"
            1001 1999 8 22 .41870880660694015 "Bret-1999"
            1001 1999 9 5 1.621244175413792 "Dennis-1999"
            1001 1999 9 6 .275698741322845 "AL07-1999"
            1001 1999 9 15 2.020706257467387 "Floyd-1999"
            1001 1999 9 20 2.4815852034918944 "Harvey-1999"
            1001 1999 10 17 2.553763303785787 "Irene-1999"
            1001 2000 8 10 .9039929245918998 "AL04-2000"
            1001 2000 8 13 .38969383907618577 "Beryl-2000"
            1001 2000 9 9 1.3997555362475977 "AL09-2000"
            1001 2000 9 18 4.50579709951368 "Gordon-2000"
            1001 2000 9 22 7.096998522056075 "Helene-2000"
            1001 2000 10 4 1.9356057365252906 "Leslie-2000"
            1001 2001 6 12 7.646661611016898 "Allison-2001"
            1001 2001 8 6 13.832581712491114 "Barry-2001"
            1001 2001 9 14 2.3746839154000936 "Gabrielle-2001"
            1001 2001 10 12 .8590523662842149 "Karen-2001"
            1001 2001 11 5 .460478153814916 "Michelle-2001"
            1001 2002 7 14 1.1018699470386013 "Arthur-2002"
            1001 2002 8 5 4.9979787835327 "Bertha-2002"
            1001 2002 8 5 1.074341387660061 "Cristobal-2002"
            1001 2002 9 6 2.5061863143721457 "Edouard-2002"
            1001 2002 9 5 1.0930976520406661 "Fay-2002"
            1001 2002 9 9 .824225121356799 "Gustav-2002"
            1001 2002 9 15 8.88202069017716 "Hanna-2002"
            1001 2002 9 26 8.022234573431478 "Isidore-2002"
            1001 2002 10 11 2.1884515122872754 "Kyle-2002"
            1001 2002 10 4 4.032301526740939 "Lili-2002"
            1001 2003 7 1 8.471467089264408 "Bill-2003"
            1001 2003 7 14 1.4474974304683268 "Claudette-2003"
            1001 2003 7 26 2.7631415622088173 "AL07-2003"
            1001 2003 8 14 2.510974277166262 "Erika-2003"
            1001 2003 9 2 1.3702786781176195 "Grace-2003"
            1001 2003 9 5 3.9498177819390135 "Henri-2003"
            1001 2003 9 18 1.3186509909849549 "Isabel-2003"
            1001 2004 8 1 2.1946031068109266 "Alex-2004"
            1001 2004 8 12 5.016294555544713 "Bonnie-2004"
            1001 2004 8 14 2.76303392153772 "Charley-2004"
            1001 2004 9 7 8.945014823783959 "Frances-2004"
            1001 2004 8 29 2.9234842317908516 "Gaston-2004"
            1001 2004 8 30 .5307764048338431 "Hermine-2004"
            1001 2004 9 16 20.98227052089195 "Ivan-2004"
            1001 2004 9 27 6.127103868075571 "Jeanne-2004"
            1001 2004 10 10 3.8890480216200305 "Matthew-2004"
            1001 2005 6 12 11.00373855775045 "Arlene-2005"
            1001 2005 7 6 10.757665507334547 "Cindy-2005"
            1001 2005 7 10 18.030644943663688 "Dennis-2005"
            1001 2005 7 18 .49804747144402295 "Emily-2005"
            1001 2005 8 29 12.271341171178507 "Katrina-2005"
            end
            
            bysort fips year (vmax_sust) : gen vmax_max = vmax_sust[_N]
            by fips year : gen id_max = storm_id[_N]
             
            collapse (mean) vmax_sust, by(fips year vmax_max id_max)
            
            list 
            
                 +-----------------------------------------------------+
                 | fips   year   vmax_max           id_max   vmax_sust |
                 |-----------------------------------------------------|
              1. | 1001   1988   6.558883    Florence-1988   2.7424224 |
              2. | 1001   1989   3.715107        Hugo-1989   2.2157583 |
              3. | 1001   1990   2.484084       Marco-1990   1.2263909 |
              4. | 1001   1991   1.657727         Ana-1991   1.1664154 |
              5. | 1001   1992   7.086315      Andrew-1992   2.6985215 |
                 |-----------------------------------------------------|
              6. | 1001   1993   .9359791      Arlene-1993   .67978644 |
              7. | 1001   1994   12.24691     Alberto-1994   6.6538182 |
              8. | 1001   1995   24.52252        Opal-1995   8.8842133 |
              9. | 1001   1996   5.921565   Josephine-1996   2.2723743 |
             10. | 1001   1997   11.39477       Danny-1997   4.4107203 |
                 |-----------------------------------------------------|
             11. | 1001   1998   10.87368        Earl-1998   4.2838301 |
             12. | 1001   1999   2.553763       Irene-1999   1.5619511 |
             13. | 1001   2000   7.096999      Helene-2000   2.7053073 |
             14. | 1001   2001   13.83258       Barry-2001   5.0346916 |
             15. | 1001   2002   8.882021       Hanna-2002   3.4722708 |
                 |-----------------------------------------------------|
             16. | 1001   2003   8.471467        Bill-2003   3.1188325 |
             17. | 1001   2004   20.98227        Ivan-2004   5.9301811 |
             18. | 1001   2005   18.03065      Dennis-2005   10.512288 |
                 +-----------------------------------------------------+
            So,

            1. After sorting you can identify the highest maximum and its storm. Watch out for ties, as Clyde flagged. and indeed for missing values.

            2. You have two new variables that are constant by construction within county and year. To ensure that they are carried along in the collapse, the trick is to insert them in the call to by().

            Comment

            Working...
            X