Mean by group and set as new column

Cere BrosuS

Join Date: Sep 2016

Posts: 7
#1

Mean by group and set as new column

12 Nov 2016, 03:01

Hello again,

I'm getting deeper and deeper with Stata/Statistics and love it more and more. :-)
I come from the classical programming area like c++/java/python and know how to solve problems with lists and loops, but Stata has some very useful command contructions doing complex works in just one command. That is what I am searching for here. So now to the problem:

I got for example the following data with just few rows for showing what I mean:

Nr GroupID Value

1 1 5

2 1 3

3 2 5

4 2 11

Now I want to calculate the mean value by GroupID and set all to a new column like it will be shown in the next table:

Nr GroupID Value Mean

1 1 5 4

2 1 3 4

3 2 5 8

4 2 11 8

Is this possible in with one nice Stata command and also with other results than mean?

In general...I try to use Stata for data manipulation and data analysis of chemical engineering process optimization.

Greetings and thx for your help
Christian
Tags: None
Lance Erickson

Join Date: Apr 2014

Posts: 144
#2

12 Nov 2016, 04:41

Give this a try...

Code:

bysort GroupID: egen Mean = mean(Value)

Lance
Comment
Lance Erickson

Join Date: Apr 2014

Posts: 144
#3

12 Nov 2016, 04:41

Give this a try...

Code:

bysort GroupID: egen Mean = mean(Value)

Lance
1 like
Comment
Cere BrosuS

Join Date: Sep 2016

Posts: 7
#4

12 Nov 2016, 05:03

Hey,

thank you very much as the command does what I want, except of one thing I forgot to explain:

The data is sorted as a time series where the groupID could possibly repeat with time. And these groups should have their own mean calculation cause they are a group later in time. Let's extend my last example:

Nr GroupID Value Mean

1 1 5 4

2 1 3 4

3 2 5 8

4 2 11 8

5 1 30 20

6 1 10 20

Might be a big issue that I have forgotten to explain this. Sry for that!

Greetings and thanks for your help.
Christian
Comment
Cere BrosuS

Join Date: Sep 2016

Posts: 7
#5

12 Nov 2016, 09:09

Hey,

for the reason no solution exist for a single command, I will post my solution with a loop. Maybe someone has code advantages for me. :-)

Code:

// Grouping variable X with respect to time gen int X_grp = 0 forvalues num = 1/`=_N' { replace X_grp = X_grp[_n-1] + 1 if X[_n] != X[_n-1] & Nr[_n] == _n & _n > 1 replace X_grp = X_grp[_n-1] if X[_n] == X[_n-1] & Nr[_n] == _n & _n > 1 }

The 'Nr' column is just an id to be sure that only this current line will be changed. In general the code looks very bad related to what I know in programming. But using things like incrementation variables or simple conditions just didn't work here in my editor. :-)

Greetings and thanks for the help.
Christian
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35432

13 Nov 2016, 09:09

Christian: Please contact the administrators to change to an identifier that is your full real name. FAQ Advice explains.

You don't need a loop over observations for your new problem.

Code:

clear 
input Nr    GroupID    Value    Mean
1    1    5    4
2    1    3    4
3    2    5    8
4    2    11    8
5    1    30    20
6    1    10    20
end 

gen block = sum(GroupID != GroupID[_n-1]) 
egen Mean2 = mean(Value), by(block) 

list, sepby(block) 


     +---------------------------------------------+
     | Nr   GroupID   Value   Mean   block   Mean2 |
     |---------------------------------------------|
  1. |  1         1       5      4       1       4 |
  2. |  2         1       3      4       1       4 |
     |---------------------------------------------|
  3. |  3         2       5      8       2       8 |
  4. |  4         2      11      8       2       8 |
     |---------------------------------------------|
  5. |  5         1      30     20       3      20 |
  6. |  6         1      10     20       3      20 |
     +---------------------------------------------+

Comment

Michael Olu

Join Date: Jan 2020

Posts: 21
#7

25 Jul 2022, 14:34

I am curious. How does STATA handle missing values when calculating mean by group in this way? Will observations with missing values count towards the denominator?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#8

25 Jul 2022, 19:26

Missing values are ignored in calculating numerical
summaries, both numerators and denominators.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#9

25 Jul 2022, 19:45

Missing values are ignored in calculating numerical
summaries, both numerators and denominators.
1 like
Comment
Reeju Guha

Join Date: May 2021

Posts: 12
#10

27 Oct 2023, 07:13

Hello, for this same example, how do I generate the 10th percentile values instead of the mean values? Thanks!
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#11

27 Oct 2023, 07:24

there is a pctile() function for the -egen- command; see

Code:

h egen

and look at the pctile(exp) function in that help file
1 like
Comment

Nr	GroupID	Value
1	1	5
2	1	3
3	2	5
4	2	11

Nr	GroupID	Value	Mean
1	1	5	4
2	1	3	4
3	2	5	8
4	2	11	8

Nr	GroupID	Value	Mean
1	1	5	4
2	1	3	4
3	2	5	8
4	2	11	8
5	1	30	20
6	1	10	20

Announcement

Mean by group and set as new column

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment