Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting events in intervals (across variables and unique by observation)

    Dear Stata Users,

    I am working with a two-period panel of individual survey data (in long format). Each individual has been interviewed on different days in the survey sampling period. Based on the geographic location of the individuals I have merged daily weather data over a period of 30 years (in wide format), which looks something like this:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double id float(survey_date b1_2008 b2_2008 b3_2008 b4_2008 b5_2008 b6_2008 b7_2008 b8_2008 b9_2008 b10_2008)
     1 17905  7.552024  4.619773  3.448182 7.221909 7.731406 7.477907  6.721896  9.776484  8.882518  10.22792
     1 19016  7.552024  4.619773  3.448182 7.221909 7.731406 7.477907  6.721896  9.776484  8.882518  10.22792
     2 18127  7.663669  5.057653  4.334262 6.445424 6.486279 7.203562  4.021591 8.0055685  6.073585 4.5712934
     2 19239  7.663669  5.057653  4.334262 6.445424 6.486279 7.203562  4.021591 8.0055685  6.073585 4.5712934
     3 18127  7.874044   5.43817   4.33868 6.518519 6.899051 7.492062  3.785516  8.283708  6.532521  4.465155
     3 19220  7.874044   5.43817   4.33868 6.518519 6.899051 7.492062  3.785516  8.283708  6.532521  4.465155
     4 18330 10.081314  6.162988 2.8859046 7.558293 7.915502 8.341213  7.818742 10.259943  9.608234  9.788368
     4 19429 10.081314  6.162988 2.8859046 7.558293 7.915502 8.341213  7.818742 10.259943  9.608234  9.788368
     5 18474  9.556124  6.624081  2.664714  7.26975 7.671178 8.176036  7.846913 10.426918  9.642741  10.21776
     5 19591  9.556124  6.624081  2.664714  7.26975 7.671178 8.176036  7.846913 10.426918  9.642741  10.21776
     6 18449  8.467342 4.6309233  3.083597 7.518037 7.600839 7.774574  6.834048  9.931305  9.774855 10.392326
     6 19542  8.467342 4.6309233  3.083597 7.518037 7.600839 7.774574  6.834048  9.931305  9.774855 10.392326
     7 18438  8.467342 4.6309233  3.083597 7.518037 7.600839 7.774574  6.834048  9.931305  9.774855 10.392326
     7 19555  8.467342 4.6309233  3.083597 7.518037 7.600839 7.774574  6.834048  9.931305  9.774855 10.392326
     8 18553  9.031861  5.689291  4.402537 8.380325  8.29756 9.593754 9.4122715 10.620927 10.582848 10.907956
     8 19702  9.031861  5.689291  4.402537 8.380325  8.29756 9.593754 9.4122715 10.620927 10.582848 10.907956
     9 18563  9.929608  8.596379 3.2979546 6.980475  7.20517 6.755096  6.101957  8.037453  7.017509  5.959684
     9 19659  9.028942  6.848519  2.288981 5.800806 6.274816 6.036836  4.974687  7.403302  5.736906  5.278121
    10 18562  8.462659  5.483351  3.225748 7.804394 9.113503 8.557564  8.181511 11.001647 10.074936 11.177246
    10 19747  8.462659  5.483351  3.225748 7.804394 9.113503 8.557564  8.181511 11.001647 10.074936 11.177246
    end
    format %td survey_date
    Note: b1_2008 is the temperature on 1st January 2008 and so on. I only show 10 days here.

    I am now trying to compute different weather variables based on the daily weather observations in the year before the interview (e.g. counting the number of days the daily mean temperature was >10°C). If every individual had been interviewed on the same day (e.g. 01jan2009) this could be achieved with a simple foreach loop:

    Code:
    gen temp1 = 0
    qui foreach v of var b1_2008-b366_2008{
            replace temp1 = temp1 + (`v' > 10 & `v' !=.)
    }
    However, the difficulty is, that each individual has a different interview date, i.e. the varlist capturing the weather variables is slightly different for each individual. I would like to count the number of days fulfilling a certain condition (as shown above) based on the 365 days prior to each individual's survey date. Hence, I am wondering if there is any way of associating a unique varlist to each observation based on the interview date, or if there is any other work around.

    I have considered reshaping the weather data to long format, however, given the size of my dataset (30,000 individuals and at least 8 years of relevant daily weather data), it seems unfeasible.

    I would greatly appreciate your support.

    Thanks,
    Paul








  • #2
    Welcome to Statalist, Paul.

    I have considered reshaping the weather data to long format, however, given the size of my dataset (30,000 individuals and at least 8 years of relevant daily weather data), it seems unfeasible.
    You give up too easily.

    Before merging weather data to individual data, I would reshape the weather data long, identified by location and date, and then use rangestat (see the output of ssc describe rangestat) to compute, for each geography and date, the various "last 365 days" statistics. The results could then be merged by date and geography to the individual data, so that each individual would have just the statistics of interest added to the observation.

    Comment


    • #3
      Something like

      Code:
      gen temp1 = 0
      gen count1 = 0  
      local T = mdy(12, 31, 2007)  
      qui forval d = 1/366 {    
          replace temp1 = (b`d'_2008 > 10 & b`d'_2008 !=.) * inrange(`T' + `d', survey_date - 365, survey_date - 1)      
          replace count1 = count1 + inrange(`T' + `d', survey_date - 365, survey_date - 1)
      }
      For several years of data, you need an outer loop over years (I am guessing even more here). capture is to catch non-leap years with no day 366.

      Code:
      gen temp1 = 0
      gen count1 = 0  
      qui forval y = 2001/2008 {     
          local T = mdy(1, 1, `y') - 1      
          forval d = 1/366 {        
              capture replace temp1 = (b`d'_`y > 10 & b`d'_`y' !=.) * inrange(`T' + `d', survey_date - 365, survey_date - 1)        
              capture replace count1 = count1 + inrange(`T' + `d', survey_date - 365, survey_date - 1)    
          }
      }
      .

      Comment


      • #4
        Thank you both very much. Both approaches lead to desired result.

        A slight correction to the code is needed in the following line to make it work:

        Code:
        capture replace temp1 = temp1 + (b`d'_`y' > 10 & b`d'_`y' !=.) * inrange(`T' + `d', survey_date - 365, survey_date -1)

        Thanks,
        Paul

        Comment


        • #5
          You're right. Sorry about that.

          Comment

          Working...
          X