Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dynamic variable summation by date

    I have a dataset of individuals that has a variable for the date on which they were interviewed (interview_date) and their county of residence. I'll call this Dataset A.

    I have a second dataset of incidents that includes the date of the incident (inc_date) and the county in which the incident occurred. I'll call this Dataset B.

    For each individual in dataset A, I want to calculate the total number of incidents from dataset B that occurred in their county of residence over the five years prior to the interview.

    So if individual 1001 lives in Los Angeles County and was interviewed on March 1, 2020, I want to sum all incidents in dataset B in Los Angeles County between the dates of 3/1/2015 and 3/1/2020.

    If I were doing this in Python, I would loop through the individuals in A, grab the date, sum the variables in B, and enter the totals in a new variable in A. I think I could do this by having each dataset open in a different frame, looping through records in A, and then in the frame with B subset the records to the date range in the county and then collapse. But looping is wonky in Stata and I have a feeling that there should be an easier way to do this. Any suggestions would be greatly appreciated.

  • #2
    Yes, doing this in loops is both difficult to get right, and inefficient if you do. You want the -rangejoin- command, by Robert Picard, available from SSC. And to run that, you also must install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    As you do not provide example data, I cannot write the code for you here. The help file for -rangejoin- is pretty clear, although people often have trouble understanding which variables in the command refer to which data set. That's important: in -rangejoin date1 lower_bound upper_bound using using_dataset-, date1 refers to date variables in the using data set, and the two *_bound variables refer to the "master" data set. Once you have used -rangejoin- to link the two data sets, you will then want to -collapse- the data set to a count of the number of linked records for each individual.

    If you need more specific assistance, post back with example data, shown using the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Also, if you do post back, be sure to specify which version of Stata you are running if it is not the current version (18).

    Comment

    Working...
    X