Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generate new yearly variables by using for loops

    Dear all,

    I am trying to summarize variables on the form XXYYYYMM, where YYYYMM corresponds to year and month. I want to start summarizing at a start point (varies) and summarize 12 months from start point, generate a new variable containing the sum per year, and start counting again 12 months to generate a new variable with the sum of the following year. I want the counting to stop when there are not possible to generate a complete year (the variable “total_num_of_years” withholds the possible number of complete years).

    This is my code:
    forval p = 1/12 {
    gen target_`p' = 0
    local start_period = start_in_months + (`p'-1)*12
    local end_period = `start_period' + 11
    forval m = 552/708 {
    local M = strofreal(`m', "%tmCCYYNN")
    local varname = "XX`M'"
    replace target_`p' = target_`p' + `varname' if !missing(`varname') & `m' >= `start_period' & `m' <= `end_period' & `p' <= total_num_of_years
    }
    }

    Some of the targ* values are correct, but other values are not. I am wondering if you have any idea on what might be wrong here, or if you have a different solution to answer this?

  • #2
    For operations such as summing, the most efficient method is to use the -bysort- prefix. You should only loop over variables. There is no data example provided here, but the following example using the SP500 dataset may help.

    Code:
    sysuse sp500, clear
    replace date= date-90
    
    *START HERE
    gen year= year(date)
    foreach var in open close{
        bysort year: egen sum`var'= total(`var')
    }
    
    tabdisp year, cell(sum*)
    Res.:

    Code:
    . tabdisp year, cell(sum*)
    
    ----------------------------------
         year |    sumopen    sumclose
    ----------+-----------------------
         2000 |   79106.72     78946.4
         2001 |   217224.5      217210
    ----------------------------------

    Comment


    • #3
      Thanks a lot for your quick and helpful reply. In my dataset, the start year will not be the same for everyone (as the start point varies), and I therefore wanted to start counting years from every single start value. This means that the years will not correspond to the same "time year" (e.g., 2000 or 2001), but would rather be years from startdate 0, 1, 2, 3, 4 and so on. Maybe there is a similar way to do this anyway?

      Comment


      • #4
        You can create a new year variable. Consider this:

        Code:
        clear
        input float(id date)
        1 20148
        1 20365
        1 21000
        2 19500
        2 19999
        2 20777
        end
        format date %td
        
        bys id (date): gen year= date-date[1]
        replace year= year(year) - 1960
        Res.:

        Code:
        
        . l, sepby(id)
        
             +-----------------------+
             | id        date   year |
             |-----------------------|
          1. |  1   01mar2015      0 |
          2. |  1   04oct2015      0 |
          3. |  1   30jun2017      2 |
             |-----------------------|
          4. |  2   22may2013      0 |
          5. |  2   03oct2014      1 |
          6. |  2   19nov2016      3 |
             +-----------------------+
        Last edited by Andrew Musau; 29 Sep 2024, 12:53.

        Comment

        Working...
        X