Dear Stata Users,
I am working with a two-period panel of individual survey data (in long format). Each individual has been interviewed on different days in the survey sampling period. Based on the geographic location of the individuals I have merged daily weather data over a period of 30 years (in wide format), which looks something like this:
Note: b1_2008 is the temperature on 1st January 2008 and so on. I only show 10 days here.
I am now trying to compute different weather variables based on the daily weather observations in the year before the interview (e.g. counting the number of days the daily mean temperature was >10°C). If every individual had been interviewed on the same day (e.g. 01jan2009) this could be achieved with a simple foreach loop:
However, the difficulty is, that each individual has a different interview date, i.e. the varlist capturing the weather variables is slightly different for each individual. I would like to count the number of days fulfilling a certain condition (as shown above) based on the 365 days prior to each individual's survey date. Hence, I am wondering if there is any way of associating a unique varlist to each observation based on the interview date, or if there is any other work around.
I have considered reshaping the weather data to long format, however, given the size of my dataset (30,000 individuals and at least 8 years of relevant daily weather data), it seems unfeasible.
I would greatly appreciate your support.
Thanks,
Paul
I am working with a two-period panel of individual survey data (in long format). Each individual has been interviewed on different days in the survey sampling period. Based on the geographic location of the individuals I have merged daily weather data over a period of 30 years (in wide format), which looks something like this:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double id float(survey_date b1_2008 b2_2008 b3_2008 b4_2008 b5_2008 b6_2008 b7_2008 b8_2008 b9_2008 b10_2008) 1 17905 7.552024 4.619773 3.448182 7.221909 7.731406 7.477907 6.721896 9.776484 8.882518 10.22792 1 19016 7.552024 4.619773 3.448182 7.221909 7.731406 7.477907 6.721896 9.776484 8.882518 10.22792 2 18127 7.663669 5.057653 4.334262 6.445424 6.486279 7.203562 4.021591 8.0055685 6.073585 4.5712934 2 19239 7.663669 5.057653 4.334262 6.445424 6.486279 7.203562 4.021591 8.0055685 6.073585 4.5712934 3 18127 7.874044 5.43817 4.33868 6.518519 6.899051 7.492062 3.785516 8.283708 6.532521 4.465155 3 19220 7.874044 5.43817 4.33868 6.518519 6.899051 7.492062 3.785516 8.283708 6.532521 4.465155 4 18330 10.081314 6.162988 2.8859046 7.558293 7.915502 8.341213 7.818742 10.259943 9.608234 9.788368 4 19429 10.081314 6.162988 2.8859046 7.558293 7.915502 8.341213 7.818742 10.259943 9.608234 9.788368 5 18474 9.556124 6.624081 2.664714 7.26975 7.671178 8.176036 7.846913 10.426918 9.642741 10.21776 5 19591 9.556124 6.624081 2.664714 7.26975 7.671178 8.176036 7.846913 10.426918 9.642741 10.21776 6 18449 8.467342 4.6309233 3.083597 7.518037 7.600839 7.774574 6.834048 9.931305 9.774855 10.392326 6 19542 8.467342 4.6309233 3.083597 7.518037 7.600839 7.774574 6.834048 9.931305 9.774855 10.392326 7 18438 8.467342 4.6309233 3.083597 7.518037 7.600839 7.774574 6.834048 9.931305 9.774855 10.392326 7 19555 8.467342 4.6309233 3.083597 7.518037 7.600839 7.774574 6.834048 9.931305 9.774855 10.392326 8 18553 9.031861 5.689291 4.402537 8.380325 8.29756 9.593754 9.4122715 10.620927 10.582848 10.907956 8 19702 9.031861 5.689291 4.402537 8.380325 8.29756 9.593754 9.4122715 10.620927 10.582848 10.907956 9 18563 9.929608 8.596379 3.2979546 6.980475 7.20517 6.755096 6.101957 8.037453 7.017509 5.959684 9 19659 9.028942 6.848519 2.288981 5.800806 6.274816 6.036836 4.974687 7.403302 5.736906 5.278121 10 18562 8.462659 5.483351 3.225748 7.804394 9.113503 8.557564 8.181511 11.001647 10.074936 11.177246 10 19747 8.462659 5.483351 3.225748 7.804394 9.113503 8.557564 8.181511 11.001647 10.074936 11.177246 end format %td survey_date
I am now trying to compute different weather variables based on the daily weather observations in the year before the interview (e.g. counting the number of days the daily mean temperature was >10°C). If every individual had been interviewed on the same day (e.g. 01jan2009) this could be achieved with a simple foreach loop:
Code:
gen temp1 = 0 qui foreach v of var b1_2008-b366_2008{ replace temp1 = temp1 + (`v' > 10 & `v' !=.) }
I have considered reshaping the weather data to long format, however, given the size of my dataset (30,000 individuals and at least 8 years of relevant daily weather data), it seems unfeasible.
I would greatly appreciate your support.
Thanks,
Paul
Comment