Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to sum a variable over certains values only

    Hi everyone,

    This is my first time posting on Statalist, so I hope my question is clear.
    • Here is a short example of my dataset:
    - isoname variable: country name, here "Afghanistan"
    - date variable: I dont' know why it doesn't appear in the %td format but the first line below is 26sept2001, date takes date values for every day between 04jan2000 and 01jan2021 for each country
    - protest variable: takes a null or non-null value (can be larger than 1) depending on the number of protests that happened on this date in this country. Here on the 26th September 2001, 6 protests happened in Afghanistan. When 0, it means that no protest has occured on this date, in this country.

    Please note that my dataset is pretty heavy, with 1,510,596 rows.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str24 isoname float(date protest)
    "Afghanistan" 15244 6
    "Afghanistan" 15245 0
    "Afghanistan" 15246 1
    "Afghanistan" 15247 0
    "Afghanistan" 15248 0
    "Afghanistan" 15249 0
    "Afghanistan" 15250 0
    "Afghanistan" 15251 0
    "Afghanistan" 15252 0
    "Afghanistan" 15253 0
    "Afghanistan" 15254 0
    "Afghanistan" 15255 0
    "Afghanistan" 15256 0
    "Afghanistan" 15257 1
    "Afghanistan" 15258 0
    "Afghanistan" 15259 1
    "Afghanistan" 15260 1
    "Afghanistan" 15261 0
    end
    format %td date
    • My question is the following:
    I want to create several variables, "intensity_lag1", "intensity_lag2", "intensity_lead1" etc that are defined as follow: "intensity_lag1" is the sum of protests that happened during the 7 days before the date, i.e. the number of protests that happened during the previous week of the date we're considering, "intensity_lag2" is the number of protests that happened during the week before the previous week etc.

    Here is how I coded it until now:
    Code:
    bys isoname : gen intensity_lag1 = protest[_n-1] + protest[_n-2] + protest[_n-3] + protest[_n-4] + protest[_n-5] + protest[_n-6] + protest[_n-7]
    
    bys isoname : gen intensity_lag2 = protest[_n-8] + protest[_n-9] + protest[_n-10] + protest[_n-11] + protest[_n-12] + protest[_n-13] + protest[_n-14]
    But I wanted to know if there is any way to automate this with a loop or another command? Indeed, at some point I am going to do the same process but with month as the reference time period (and not week), and I don't want to have to hand code "intensity_lag1" up to _n-28, _n-29, _n-30, _n-31 etc.

    Here is what I tried but didn't work: (I also tried with the sum() function and it didn't work)
    Code:
    // Attempt1:
    bys isoname : gen id = _n
    bys isoname : egen intensity_lag1v2 = total(protest) if inrange(id, 7-id, id)
    //Attempt2:
    foreach i of num 1/7 {
        bys isoname : egen intensity_lag1v3 = total(protest[_n-`i'])
    }
    //Attempt3:
    foreach i in 1/7 {
        bys isoname : egen intensity_lag1v4 = total(protest[_n-`i'])
    }
    What would make my life much easier would be a sum function that works just as the Sigma, where I could do: \Sigma_{i=j}^{j_7} protest_i.

    If you have any suggestion, I would love to hear them, I have been struggling with that for a few days now!

    Thanks a lot,
    Marie
    Last edited by Marie Bl; 21 Feb 2022, 11:20.

  • #2
    Thank you for using -dataex- on your very first post!

    Code:
    rangestat (sum) intensity_lag1 = protest, by(isoname) interval(date -7 -1)
    rangestat (sum) intensity_lag2 = protest, by(isoname) interval(date -14 -8)
    -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

    In the above code I have interpreted "the 7 days before the date" as not including the current date, but extending through 7 days that precede it. If you meant to include the current date, change the numbers in the -interval()- options accordingly, e.g. to -interval(date -6 0)-. I think it is apparent how you would do this for longer intervals like 31 days, etc.

    I dont' know why it doesn't appear in the %td format but the first line below is 26sept2001,
    The output of -dataex- is not meant to be read by human eyes. It is actually a block of Stata code that can be copied and then pasted into the do-editor and run in Stata to re-create a complete and faithful replica of the example data as a Stata data set. The numbers like 15244 that you see are Stata's internal representation of the dates, and are what is needed for this purpose.



    Comment


    • #3
      Thank you very much it works perfectly!

      Comment

      Working...
      X