How to calculate the cumulative mean with different weights by groups?

Fred Lee

Join Date: Nov 2017
Posts: 473

How to calculate the cumulative mean with different weights by groups?

13 Dec 2021, 06:36

How to calculate the cumulative mean with different weights by groups?
The weight is variable: order

I know in order to calculate the cumulative mean by groups, we can use

Code:

 rangestat  (mean) cumulative=score ,interval(order -18 -1) by(ID)

However, if I want to set the weights of cumulative mean as order, how to calculate it?

For example, for observation 2, the expected value is 68.8*1, for observation 3, the expected value is 68.8*1/(1+2)+73.7*2(1+2).

Many thanks!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double score byte order float ID
             68.8  1 1
             73.7  2 1
76.46000000000001  3 1
            71.74  4 1
             57.8  5 1
             70.2  6 1
             62.4  7 1
77.97999999999999  8 1
69.46000000000001  9 1
             79.1 10 1
67.96000000000001 12 1
69.03999999999999 13 1
            68.16 15 1
76.03999999999999 16 1
             63.9  1 2
             68.8  2 2
               60  3 2
             64.8  4 2
             78.1  5 2
             75.9  6 2
             71.2  7 2
             58.2  8 2
             59.5  9 2
             64.2 10 2
             60.6 11 2
             67.9 12 2
             74.5 13 2
             70.3 14 2
             66.8 15 2
             64.4 16 2
             77.3 17 2
end

Last edited by Fred Lee; 13 Dec 2021, 06:53.

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35221

13 Dec 2021, 06:56

rangestat is from SSC, as you are asked to explain.

Does this help?

Code:

clear
input double score byte order float ID
             68.8  1 1
             73.7  2 1
76.46000000000001  3 1
            71.74  4 1
             57.8  5 1
             70.2  6 1
             62.4  7 1
77.97999999999999  8 1
69.46000000000001  9 1
             79.1 10 1
67.96000000000001 12 1
69.03999999999999 13 1
            68.16 15 1
76.03999999999999 16 1
             63.9  1 2
             68.8  2 2
               60  3 2
             64.8  4 2
             78.1  5 2
             75.9  6 2
             71.2  7 2
             58.2  8 2
             59.5  9 2
             64.2 10 2
             60.6 11 2
             67.9 12 2
             74.5 13 2
             70.3 14 2
             66.8 15 2
             64.4 16 2
             77.3 17 2
end

bysort ID (order) : gen double numer = sum(order * score)
by ID: gen denom = sum(order)

gen double wanted = numer / denom 

list, sepby(ID)

     +--------------------------------------------------+
     | score   order   ID     numer   denom      wanted |
     |--------------------------------------------------|
  1. |  68.8       1    1      68.8       1        68.8 |
  2. |  73.7       2    1     216.2       3   72.066667 |
  3. | 76.46       3    1    445.58       6   74.263333 |
  4. | 71.74       4    1    732.54      10      73.254 |
  5. |  57.8       5    1   1021.54      15   68.102667 |
  6. |  70.2       6    1   1442.74      21   68.701905 |
  7. |  62.4       7    1   1879.54      28   67.126429 |
  8. | 77.98       8    1   2503.38      36   69.538333 |
  9. | 69.46       9    1   3128.52      45   69.522667 |
 10. |  79.1      10    1   3919.52      55      71.264 |
 11. | 67.96      12    1   4735.04      67   70.672239 |
 12. | 69.04      13    1   5632.56      80      70.407 |
 13. | 68.16      15    1   6654.96      95   70.052211 |
 14. | 76.04      16    1    7871.6     111   70.915315 |
     |--------------------------------------------------|
 15. |  63.9       1    2      63.9       1        63.9 |
 16. |  68.8       2    2     201.5       3   67.166667 |
 17. |    60       3    2     381.5       6   63.583333 |
 18. |  64.8       4    2     640.7      10       64.07 |
 19. |  78.1       5    2    1031.2      15   68.746667 |
 20. |  75.9       6    2    1486.6      21   70.790476 |
 21. |  71.2       7    2      1985      28   70.892857 |
 22. |  58.2       8    2    2450.6      36   68.072222 |
 23. |  59.5       9    2    2986.1      45   66.357778 |
 24. |  64.2      10    2    3628.1      55   65.965455 |
 25. |  60.6      11    2    4294.7      66   65.071212 |
 26. |  67.9      12    2    5109.5      78    65.50641 |
 27. |  74.5      13    2      6078      91   66.791209 |
 28. |  70.3      14    2    7062.2     105   67.259048 |
 29. |  66.8      15    2    8064.2     120   67.201667 |
 30. |  64.4      16    2    9094.6     136   66.872059 |
 31. |  77.3      17    2   10408.7     153   68.030719 |
     +--------------------------------------------------+

.

Code:

With any loosely similar problem, I would probably prefer exponential smoothing.

Comment

Fred Lee

Join Date: Nov 2017
Posts: 473

13 Dec 2021, 07:02

Originally posted by Nick Cox View Post

rangestat is from SSC, as you are asked to explain.

Does this help?

Code:

clear
input double score byte order float ID
68.8 1 1
73.7 2 1
76.46000000000001 3 1
71.74 4 1
57.8 5 1
70.2 6 1
62.4 7 1
77.97999999999999 8 1
69.46000000000001 9 1
79.1 10 1
67.96000000000001 12 1
69.03999999999999 13 1
68.16 15 1
76.03999999999999 16 1
63.9 1 2
68.8 2 2
60 3 2
64.8 4 2
78.1 5 2
75.9 6 2
71.2 7 2
58.2 8 2
59.5 9 2
64.2 10 2
60.6 11 2
67.9 12 2
74.5 13 2
70.3 14 2
66.8 15 2
64.4 16 2
77.3 17 2
end

bysort ID (order) : gen double numer = sum(order * score)
by ID: gen denom = sum(order)

gen double wanted = numer / denom

list, sepby(ID)

+--------------------------------------------------+
| score order ID numer denom wanted |
|--------------------------------------------------|
1. | 68.8 1 1 68.8 1 68.8 |
2. | 73.7 2 1 216.2 3 72.066667 |
3. | 76.46 3 1 445.58 6 74.263333 |
4. | 71.74 4 1 732.54 10 73.254 |
5. | 57.8 5 1 1021.54 15 68.102667 |
6. | 70.2 6 1 1442.74 21 68.701905 |
7. | 62.4 7 1 1879.54 28 67.126429 |
8. | 77.98 8 1 2503.38 36 69.538333 |
9. | 69.46 9 1 3128.52 45 69.522667 |
10. | 79.1 10 1 3919.52 55 71.264 |
11. | 67.96 12 1 4735.04 67 70.672239 |
12. | 69.04 13 1 5632.56 80 70.407 |
13. | 68.16 15 1 6654.96 95 70.052211 |
14. | 76.04 16 1 7871.6 111 70.915315 |
|--------------------------------------------------|
15. | 63.9 1 2 63.9 1 63.9 |
16. | 68.8 2 2 201.5 3 67.166667 |
17. | 60 3 2 381.5 6 63.583333 |
18. | 64.8 4 2 640.7 10 64.07 |
19. | 78.1 5 2 1031.2 15 68.746667 |
20. | 75.9 6 2 1486.6 21 70.790476 |
21. | 71.2 7 2 1985 28 70.892857 |
22. | 58.2 8 2 2450.6 36 68.072222 |
23. | 59.5 9 2 2986.1 45 66.357778 |
24. | 64.2 10 2 3628.1 55 65.965455 |
25. | 60.6 11 2 4294.7 66 65.071212 |
26. | 67.9 12 2 5109.5 78 65.50641 |
27. | 74.5 13 2 6078 91 66.791209 |
28. | 70.3 14 2 7062.2 105 67.259048 |
29. | 66.8 15 2 8064.2 120 67.201667 |
30. | 64.4 16 2 9094.6 136 66.872059 |
31. | 77.3 17 2 10408.7 153 68.030719 |
+--------------------------------------------------+

.

Code:

With any loosely similar problem, I would probably prefer exponential smoothing.

Thank you! Since I want to cululative previous mean, this works:

Code:

bysort ID (order) : gen double numer = sum(order * score)-order * score
by ID: gen denom = sum(order)-oreder
gen double wanted = numer / denom

I am wondering whether there is more easier way, especially setting an option of weight when calculate mean?

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35221
#4

13 Dec 2021, 07:38

This is perhaps easier than what you had

Code:

bysort ID (order) : gen double numer = sum(order * score) by ID: gen denom = sum(order) by ID: gen double wanted = numer[_n-1] / denom[_n-1]

especially setting an option of weight when calculate mean

an option of which command?

You want several means at once. summarize for example can't help you, except within a loop.

Note that you could rewrite the code above as one line. After 4 years using Stata and > 300 posts here, that could be an exercise.....
Comment
Fred Lee

Join Date: Nov 2017

Posts: 473
#5

13 Dec 2021, 07:45

Originally posted by Nick Cox View Post

This is perhaps easier than what you had

Code:

bysort ID (order) : gen double numer = sum(order * score) by ID: gen denom = sum(order) by ID: gen double wanted = numer[_n-1] / denom[_n-1]

an option of which command?

You want several means at once. summarize for example can't help you, except within a loop.

Note that you could rewrite the code above as one line. After 4 years using Stata and > 300 posts here, that could be an exercise.....

Thanks, Nick! I will try!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35221
#6

13 Dec 2021, 07:56

OK

Detail: There is no point to quoting the entirety of a previous post. You can just refer to #5, or whatever The point of quotation is to be selective.
Comment

Announcement

How to calculate the cumulative mean with different weights by groups?

Comment

Comment

Comment

Comment

Comment