Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summing observations that fall below a score

    I am hitting a wall trying to create a variable that is a conditional sum where it looks at the score for a certain observation and sums all scores less than it in a particular day.

    Here is some code to recreate the issue:

    Code:
    set obs 10000
    set seed 1979
    
    * Creating sample dataset
    gen score = runiform()*100
    gen day = runiformint(1,31)
    bysort day (score): gen daily_rank = _n
    Therefore, for each sorted observation, there would be a variable that is a reducing amount until the next day. That variable would look at the score for that observation (say, 80, for example) and sum all observations in that day that have scores from 0 to 79. I have not been able to figure out how to do this.

    Thanks in advance!

  • #2
    As long as scores are distinct, then sorting and finding the cumulative sum of all previous scores will give you what you want.

    Code:
    clear
    set obs 10000
    set seed 1979
    gen score = runiform()*100
    gen day = runiformint(1,31)
    bysort day (score): gen daily_rank = _n
    isid score day
    bys day (score): gen wanted= sum(score) -score
    Res.:

    Code:
    . l in 1/10, sep(0)
    
         +--------------------------------------+
         |    score   day   daily_~k     wanted |
         |--------------------------------------|
      1. | .2408028     1          1          0 |
      2. | .3983518     1          2   .2408028 |
      3. | .6269974     1          3   .6391547 |
      4. | 1.272174     1          4   1.266152 |
      5. | 1.737346     1          5   2.538326 |
      6. | 1.966702     1          6   4.275672 |
      7. | 2.087387     1          7   6.242374 |
      8. | 2.654549     1          8   8.329761 |
      9. | 3.314878     1          9   10.98431 |
     10. | 3.757221     1         10   14.29919 |
         +--------------------------------------+

    Comment


    • #3
      Andrew, truly appreciate the response. You brought up a great point - the scores are not entirely distinct. Also, I forgot to provide the variable for the amount requested. In addition, the intent would be to rank the scores in descending order, and the variable in question, would be a declining amount every day. Therefore, I think that the code would look something like this:

      Code:
      clear all
      set obs 10000
      set seed 1979
      gen score = runiformint(1,100)
      gen day = runiformint(1,31)
      gen ask = runiformint(200, 800)
      gen score_ranking = -score
      bysort day (score_ranking): gen daily_rank = _n
      bysort day (score_ranking): gen X1 = sum(ask)
      bysort day (score_ranking): egen X2 = total(ask)
      gen wanted = X2-X1
      drop score_ranking X*
      I had to get rid of the "isid" command since the scores are not distinct in this example - the actual data has a much more detailed day variable that allows for distinctiveness.

      Nonetheless, many thanks for your input - really helped me get a solution for this issue.

      Comment

      Working...
      X