Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sum daily variables into monthly variables using loops

    Hello to all,

    I am processing weather data and I need to sum daily variables into monthly variables. Small problem: I have to do this for 524 individuals from 1981 to 2009 for which I have as many variables of as days. So I have to go through forvalues loops to avoid spending 2 months on it.

    By chance, all the variables have the same construction in terms of name, they are all of type: P`year'`month'`day'.

    To make these sums I had the idea to use this code :

    forvalue m=1981/2009 {
    forvalue g=1/9 {
    egen P`m'`g'=sum(P`m'0`g'*), by(idmc)
    }
    }
    forvalue m=1981/2009 {
    forvalue g=10/12 {
    egen P`m'`g'=sum(P`m'`g'*), by(idmc)
    }
    }

    the red asterix is not in the code, this is the reason of my post. As the number of days is not equal across months I can't use a loop like for years or months, otherwise stata will return me an error message. So I'm looking for something I should put in place of the red asterix to make stata understand that I want it to take all the possible values for the part concerning the days.

    (the fact that there are two loops, one for the first 9 months and another one for the last 3 months highlights another problem I had, which is that I can't make a loop with 01/12, it will be counted as 1/12 or the name of my variables is 01, 02, 03 for the first months and not 1, 2 3)

    if something is not clear don't hesitate to ask me questions, I can for example show an extract of the data !

    Thanks in advance !
    Last edited by Eliot Wendling; 14 Feb 2022, 05:24.

  • #2
    You can loop over 01 ... 09 10 ... 12 by spelling out those values or by

    Code:
    forvalue m=1981/2009 {
    forvalue g=1/12 {
        local G : di %02.0f `g' 
        egen P`m'`g'=sum(P`m'`G'*), by(idmc)
    }
    }
    See https://www.stata-journal.com/articl...article=pr0051

    For this kind of data the ideal data layout is almost always


    station year month day element1 element2 ....

    where year month day allow many other calculations e.g daily date,, monthly date or day of year

    and element1 element2 ... are climatic elements.

    Your data layout seems some way from that, and that means that almost everything henceforth will be awkward to arrange.

    Indeed, a data example would be helpful.

    Comment


    • #3
      Thank you very much for your answer. I will study this very carefully, even though I am not very familiar with "local G: di %02.0f `g'" notation. I will read the article you sent me carefully.

      To bring more precision, I enclose an extract of the data:

      idmc P19810101 P19810102 P19810103 P19810104 P19810105 P19810106 P19810107 P19810108 P19810109 P19810110
      10606 1.732680e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 1.045710e+01 4.027510e+01
      10703 1.732680e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 1.045710e+01 4.027510e+01
      10903 1.732680e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 1.045710e+01 4.027510e+01
      11202 1.469760e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 7.19581985474 2.584410e+01

      We have in this extract the rainfall data for the first 4 individuals (idmc) of my database, for the first 10 days of the first month of 1981. I have this for every day until 2009 with the same variable name format.

      Comment


      • #4
        idmc P19810101 P19810102 P19810103 P19810104 P19810105 P19810106 P19810107 P19810108 P19810109 P19810110
        10606 1.732680e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 1.045710e+01 4.027510e+01
        10703 1.732680e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 1.045710e+01 4.027510e+01
        10903 1.732680e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 1.045710e+01 4.027510e+01
        11202 1.469760e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 7.19581985474 2.584410e+01
        This table is more clear

        Comment


        • #5
          With your advice I was able to remove the problems I had with the 01 02 etc... Thank you very much.

          I come with a new version of the loop, more detailed (probably suboptimal compared to a version with egen) which allows to better understand the problem I have:

          Code:
          forvalue m=1981/2009 {
              forvalue g=1/12 {
                  local G : di %02.0f `g' 
                  by idmc : gen P`m'`g'=(P`m'`G'01+P`m'`G'02+P`m'`G'03+P`m'`G'04+P`m'`G'05+P`m'`G'06+P`m'`G'07+P`m'`G'08+P`m'`G'09+P`m'`G'10+P`m'`G'11+P`m'`G'12+P`m'`G'13+P`m'`G'14+P`m'`G'15+P`m'`G'16+P`m'`G'17+P`m'`G'18+P`m'`G'19+P`m'`G'20+P`m'`G'21+P`m'`G'22+P`m'`G'23+P`m'`G'24+P`m'`G'25+P`m'`G'26+P`m'`G'27+P`m'`G'28+P`m'`G'29+P`m'`G'30+P`m'`G'31)
          }
          }
          as detailed, the problem comes from the fact that the months do not have the same number of days, so the loop stops when the loop calculates February 29, 1981, which does not exist (error code : P19810229 not found). So I'm looking for a way to have a more adaptive loop, which won't calculate the monthly sum with days that don't exist

          Comment


          • #6
            That looks straightforward.

            Code:
            clear
            input idmc P19810101 P19810102 P19810103 P19810104 P19810105 P19810106 P19810107 P19810108 P19810109 P19810110
            10606 1.732680e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 1.045710e+01 4.027510e+01
            10703 1.732680e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 1.045710e+01 4.027510e+01
            10903 1.732680e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 1.045710e+01 4.027510e+01
            11202 1.469760e+01 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 0.00000000000 7.19581985474 2.584410e+01
            end
            
            reshape long P, i(idmc) j(date) string
            
            gen ddate = daily(date, "YMD")
            gen mdate = mofd(ddate)
            gen year = year(ddate)
            format ddate %td
            format mdate %tm
            
            list, sepby(idmc)
            
            
                +--------------------------------------------------------+
                 |  idmc       date         P       ddate    mdate   year |
                 |--------------------------------------------------------|
              1. | 10606   19810101   17.3268   01jan1981   1981m1   1981 |
              2. | 10606   19810102         0   02jan1981   1981m1   1981 |
              3. | 10606   19810103         0   03jan1981   1981m1   1981 |
              4. | 10606   19810104         0   04jan1981   1981m1   1981 |
              5. | 10606   19810105         0   05jan1981   1981m1   1981 |
              6. | 10606   19810106         0   06jan1981   1981m1   1981 |
              7. | 10606   19810107         0   07jan1981   1981m1   1981 |
              8. | 10606   19810108         0   08jan1981   1981m1   1981 |
              9. | 10606   19810109   10.4571   09jan1981   1981m1   1981 |
             10. | 10606   19810110   40.2751   10jan1981   1981m1   1981 |
                 |--------------------------------------------------------|
             11. | 10703   19810101   17.3268   01jan1981   1981m1   1981 |
             12. | 10703   19810102         0   02jan1981   1981m1   1981 |
             13. | 10703   19810103         0   03jan1981   1981m1   1981 |
             14. | 10703   19810104         0   04jan1981   1981m1   1981 |
             15. | 10703   19810105         0   05jan1981   1981m1   1981 |
             16. | 10703   19810106         0   06jan1981   1981m1   1981 |
             17. | 10703   19810107         0   07jan1981   1981m1   1981 |
             18. | 10703   19810108         0   08jan1981   1981m1   1981 |
             19. | 10703   19810109   10.4571   09jan1981   1981m1   1981 |
             20. | 10703   19810110   40.2751   10jan1981   1981m1   1981 |
                 |--------------------------------------------------------|
             21. | 10903   19810101   17.3268   01jan1981   1981m1   1981 |
             22. | 10903   19810102         0   02jan1981   1981m1   1981 |
             23. | 10903   19810103         0   03jan1981   1981m1   1981 |
             24. | 10903   19810104         0   04jan1981   1981m1   1981 |
             25. | 10903   19810105         0   05jan1981   1981m1   1981 |
             26. | 10903   19810106         0   06jan1981   1981m1   1981 |
             27. | 10903   19810107         0   07jan1981   1981m1   1981 |
             28. | 10903   19810108         0   08jan1981   1981m1   1981 |
             29. | 10903   19810109   10.4571   09jan1981   1981m1   1981 |
             30. | 10903   19810110   40.2751   10jan1981   1981m1   1981 |
                 |--------------------------------------------------------|
             31. | 11202   19810101   14.6976   01jan1981   1981m1   1981 |
             32. | 11202   19810102         0   02jan1981   1981m1   1981 |
             33. | 11202   19810103         0   03jan1981   1981m1   1981 |
             34. | 11202   19810104         0   04jan1981   1981m1   1981 |
             35. | 11202   19810105         0   05jan1981   1981m1   1981 |
             36. | 11202   19810106         0   06jan1981   1981m1   1981 |
             37. | 11202   19810107         0   07jan1981   1981m1   1981 |
             38. | 11202   19810108         0   08jan1981   1981m1   1981 |
             39. | 11202   19810109   7.19582   09jan1981   1981m1   1981 |
             40. | 11202   19810110   25.8441   10jan1981   1981m1   1981 |
                 +--------------------------------------------------------+

            Now for example your monthly and yearly totals are

            Code:
            egen double mtotal = total(P), by(idmc mdate)
            
            egen double ytotal = total(P), by(idmc  year)
            DIfferent numbers of days in months do not bite at all.
            Last edited by Nick Cox; 14 Feb 2022, 09:14.

            Comment


            • #7
              Thank you very much for solving my problem and for your time !

              Comment

              Working...
              X