Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Average by state and week with different start dates

    Hi. The data below has 4 columns - State, date, number of confirmed cases, and number of recovered cases. I want to calculate the average number of confirmed and recovered cases by week for each state. I have two data issues to deal with:
    1. Not all states have the same start date for the confirmed and recovered cases. For example, state X could start on 01/30/2020 and state Y could start on 04/03/2020.
    2. There are breaks in the dates. For example, the first entry for state X could be on 01/30/2020 and the next on 02/14/2020.

    Calculating this by determining the weekday corresponding to the date doesn't work because of the above issues. Is there anyway to do this? Any guidance would be much appreciated.


    Code:
     dataex state numdate confirmed recovered
    
    "Andhra Pradesh" 22013  348    6
    "Andhra Pradesh" 22014  363   10
    "Andhra Pradesh" 22015  381   10
    "Andhra Pradesh" 22016  405   10
    "Andhra Pradesh" 22017  420   12
    "Andhra Pradesh" 22018  439   12
    "Andhra Pradesh" 22019  484   16
    "Andhra Pradesh" 22020  525   20
    "Andhra Pradesh" 22021  534   20
    "Andhra Pradesh" 22022  572   35
    "Andhra Pradesh" 22023  603   42
    "Andhra Pradesh" 22024  647   65
    "Andhra Pradesh" 22025  722   92
    "Andhra Pradesh" 22026  757   96
    "Andhra Pradesh" 22027  813  120
    "Andhra Pradesh" 22028  893  141
    "Arunachal Pradesh" 22057   1   1
    "Arunachal Pradesh" 22058   1   1
    "Arunachal Pradesh" 22059   2   1
    "Arunachal Pradesh" 22060   2   1
    "Arunachal Pradesh" 22061   2   1
    "Arunachal Pradesh" 22062   2   1
    "Arunachal Pradesh" 22063   3   1
    "Arunachal Pradesh" 22064   3   1
    "Arunachal Pradesh" 22065   4   1
    "Arunachal Pradesh" 22066   4   1
    "Arunachal Pradesh" 22067  23   1
    "Arunachal Pradesh" 22068  25   1
    "Arunachal Pradesh" 22069  34   1
    "Arunachal Pradesh" 22070  39   1
    "Arunachal Pradesh" 22071  43   1
    "Arunachal Pradesh" 22072  45   1
    "Arunachal Pradesh" 22073  48   1
    "Arunachal Pradesh" 22074  54   1
    "Arunachal Pradesh" 22075  54   2
    "Arunachal Pradesh" 22076  58   4

  • #2
    Well, you have not described exactly you what you want; you have left open several possible approaches. Here is what I would probably do. I would state that for each state, week 1 begins on that state's first date in the data set and extends for 7 calendar days (whether they appear in the data or not). The eight day (again, whether it appears in the data or not) begins week 2, which goes for 7 days, etc.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str32 state int(numdate confirmed recovered)
    "Andhra Pradesh"    22013 348   6
    "Andhra Pradesh"    22014 363  10
    "Andhra Pradesh"    22015 381  10
    "Andhra Pradesh"    22016 405  10
    "Andhra Pradesh"    22017 420  12
    "Andhra Pradesh"    22018 439  12
    "Andhra Pradesh"    22019 484  16
    "Andhra Pradesh"    22020 525  20
    "Andhra Pradesh"    22021 534  20
    "Andhra Pradesh"    22022 572  35
    "Andhra Pradesh"    22023 603  42
    "Andhra Pradesh"    22024 647  65
    "Andhra Pradesh"    22025 722  92
    "Andhra Pradesh"    22026 757  96
    "Andhra Pradesh"    22027 813 120
    "Andhra Pradesh"    22028 893 141
    "Arunachal Pradesh" 22057   1   1
    "Arunachal Pradesh" 22058   1   1
    "Arunachal Pradesh" 22059   2   1
    "Arunachal Pradesh" 22060   2   1
    "Arunachal Pradesh" 22061   2   1
    "Arunachal Pradesh" 22062   2   1
    "Arunachal Pradesh" 22063   3   1
    "Arunachal Pradesh" 22064   3   1
    "Arunachal Pradesh" 22065   4   1
    "Arunachal Pradesh" 22066   4   1
    "Arunachal Pradesh" 22067  23   1
    "Arunachal Pradesh" 22068  25   1
    "Arunachal Pradesh" 22069  34   1
    "Arunachal Pradesh" 22070  39   1
    "Arunachal Pradesh" 22071  43   1
    "Arunachal Pradesh" 22072  45   1
    "Arunachal Pradesh" 22073  48   1
    "Arunachal Pradesh" 22074  54   1
    "Arunachal Pradesh" 22075  54   2
    "Arunachal Pradesh" 22076  58   4
    end
    format %td numdate
    
    by state (numdate), sort: gen week_num = 1 + floor((numdate-numdate[1])/7)

    The resulting "weeks" are internally consistent within states, but not across states. If that's not suitable for your purposes, post back with a clearer explanation of what you need.

    Comment


    • #3
      Hi Clyde. Thank you for your help, and I apologize for not being clear. What I want is to calculate the average number of confirmed and recovered cases by week for each state. The first week would start on 01/01/2020. Jan 1, 2020 was a Wednesday, and so week 1 would end on 01/05/2020, the following Sunday. The week description should also follow your approach above where once a week begins (except for week 1), it extends for 7 calendar days whether they appear in the data or not.

      Comment


      • #4
        For that, you can do:
        Code:
        gen int week = ceil((numdate-td(5jan2020))/7) + 1

        Comment


        • #5
          Thanks, Clyde. That worked perfectly.

          Comment


          • #6
            Clyde Schechter Hi Clyde. I have another question in the same vein. If I wanted to generate bi-weekly averages of the number of confirmed and recovered cases, is it possible for me to have a variable that counts every two weeks. This will also have to meet the above conditions that the first week would start on 01/01/2020. Here week 1 would end on 01/12/2020 and week 2 would begin on 01/13/2020 and end 01/26/2020. And also that once a week begins it extends for 14 calendar days whether they appear in the data or not.

            Comment


            • #7
              It's similar to the code in #4.
              Code:
              gen int period = ceil((numdate - td(12jan2020)/14) + 1
              Last edited by Clyde Schechter; 05 Apr 2022, 21:44.

              Comment


              • #8
                Thank you, Clyde.

                Comment

                Working...
                X