
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to sum a variable over certains values only

    Hi everyone,

    This is my first time posting on Statalist, so I hope my question is clear.
    • Here is a short example of my dataset:
    - isoname variable: country name, here "Afghanistan"
    - date variable: I dont' know why it doesn't appear in the %td format but the first line below is 26sept2001, date takes date values for every day between 04jan2000 and 01jan2021 for each country
    - protest variable: takes a null or non-null value (can be larger than 1) depending on the number of protests that happened on this date in this country. Here on the 26th September 2001, 6 protests happened in Afghanistan. When 0, it means that no protest has occured on this date, in this country.

    Please note that my dataset is pretty heavy, with 1,510,596 rows.

    * Example generated by -dataex-. For more info, type help dataex
    input str24 isoname float(date protest)
    "Afghanistan" 15244 6
    "Afghanistan" 15245 0
    "Afghanistan" 15246 1
    "Afghanistan" 15247 0
    "Afghanistan" 15248 0
    "Afghanistan" 15249 0
    "Afghanistan" 15250 0
    "Afghanistan" 15251 0
    "Afghanistan" 15252 0
    "Afghanistan" 15253 0
    "Afghanistan" 15254 0
    "Afghanistan" 15255 0
    "Afghanistan" 15256 0
    "Afghanistan" 15257 1
    "Afghanistan" 15258 0
    "Afghanistan" 15259 1
    "Afghanistan" 15260 1
    "Afghanistan" 15261 0
    format %td date
    • My question is the following:
    I want to create several variables, "intensity_lag1", "intensity_lag2", "intensity_lead1" etc that are defined as follow: "intensity_lag1" is the sum of protests that happened during the 7 days before the date, i.e. the number of protests that happened during the previous week of the date we're considering, "intensity_lag2" is the number of protests that happened during the week before the previous week etc.

    Here is how I coded it until now:
    bys isoname : gen intensity_lag1 = protest[_n-1] + protest[_n-2] + protest[_n-3] + protest[_n-4] + protest[_n-5] + protest[_n-6] + protest[_n-7]
    bys isoname : gen intensity_lag2 = protest[_n-8] + protest[_n-9] + protest[_n-10] + protest[_n-11] + protest[_n-12] + protest[_n-13] + protest[_n-14]
    But I wanted to know if there is any way to automate this with a loop or another command? Indeed, at some point I am going to do the same process but with month as the reference time period (and not week), and I don't want to have to hand code "intensity_lag1" up to _n-28, _n-29, _n-30, _n-31 etc.

    Here is what I tried but didn't work: (I also tried with the sum() function and it didn't work)
    // Attempt1:
    bys isoname : gen id = _n
    bys isoname : egen intensity_lag1v2 = total(protest) if inrange(id, 7-id, id)
    foreach i of num 1/7 {
        bys isoname : egen intensity_lag1v3 = total(protest[_n-`i'])
    foreach i in 1/7 {
        bys isoname : egen intensity_lag1v4 = total(protest[_n-`i'])
    What would make my life much easier would be a sum function that works just as the Sigma, where I could do: \Sigma_{i=j}^{j_7} protest_i.

    If you have any suggestion, I would love to hear them, I have been struggling with that for a few days now!

    Thanks a lot,
    Last edited by Marie Bl; 21 Feb 2022, 11:20.

  • #2
    Thank you for using -dataex- on your very first post!

    rangestat (sum) intensity_lag1 = protest, by(isoname) interval(date -7 -1)
    rangestat (sum) intensity_lag2 = protest, by(isoname) interval(date -14 -8)
    -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

    In the above code I have interpreted "the 7 days before the date" as not including the current date, but extending through 7 days that precede it. If you meant to include the current date, change the numbers in the -interval()- options accordingly, e.g. to -interval(date -6 0)-. I think it is apparent how you would do this for longer intervals like 31 days, etc.

    I dont' know why it doesn't appear in the %td format but the first line below is 26sept2001,
    The output of -dataex- is not meant to be read by human eyes. It is actually a block of Stata code that can be copied and then pasted into the do-editor and run in Stata to re-create a complete and faithful replica of the example data as a Stata data set. The numbers like 15244 that you see are Stata's internal representation of the dates, and are what is needed for this purpose.


    • #3
      Thank you very much it works perfectly!

