Summary statistics with conditions

Oded Mcdossi

Join Date: Jun 2014
Posts: 577

Summary statistics with conditions

05 Jan 2015, 00:35

Hello,
Following my previous topic 'Two way table' (and thanks again for Nick and Clyde for their help) I'm trying to modify the table under different scenario. I would like to build a table like attached below.
The first column in the table present the mean of group1 and group2 and the following columns present the mean of group1 or group2 after taking out the members belong to group1 and group2. In other words, calculate the mean without the observations of the other group.
Is there a simple way to do that?

Group (1-2)	Mean group (1-2)	Mean (1)	SD (1)	Mean (2)	SD (2)
		(without group2)	(without group2)	(without group1)	(without group1)
a-b	281.7	446.0	182.1	446.0	223.0
a-c	334.5	438.2	193.0	446.0	223.0
a-d	594.7	311.9	100.5	.	.
b-c	446.0	281.7	55.0	334.5	128.7

My data composed of two variables representing different groups (group1, group2) and one variable representing continues measure of the groups.

Code:

 clear
input str1  group1 str1  group2 x 
  a  b 223 
  a  b 332 
  a  b 290 
  a  c 223 
  a  c 446 
  a  c 223 
  a  c 446 
  a  d 669 
  a  d 446 
  a  d 669 
  b  c 223 
  b  c 669 
  b  c 446
end

Thanks and regards

Tags: summary statistics

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#2

05 Jan 2015, 01:24

Oded:
have you taken a look at -help if-?

Kind regards,
Carlo
(Stata 19.0)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35433

05 Jan 2015, 03:33

I am struggling to understand what you want, but this may indicate some technique.

Code:

 
clear
input str1  group1 str1  group2 x  
  a  b 223  
  a  b 332  
  a  b 290  
  a  c 223  
  a  c 446  
  a  c 223  
  a  c 446  
  a  d 669  
  a  d 446  
  a  d 669  
  b  c 223  
  b  c 669  
  b  c 446
end
levelsof group1, local(g1)
levelsof group2, local(g2)
local gg : list g1 | g2
gen mean1 = .
gen SD1 = . 
gen mean2 = .
gen SD2 = . 
quietly foreach g of local gg {
  su x if group2 != "`g'" 
  replace mean1 = r(mean) if group2 == "`g'" 
  replace SD1 = r(sd) if group2 == "`g'"
  su x if group1 != "`g'" 
  replace mean2 = r(mean) if group1 == "`g'"
  replace SD2 = r(sd) if group1 == "`g'"
}
collapse x mean1 mean2 SD1 SD2, by(group1 group2)
format x mean? SD? %2.1f
list group1 group2 x mean1 SD1 mean2 SD2

That produces

Code:

 
     +---------------------------------------------------------+
     | group1   group2       x   mean1     SD1   mean2     SD2 |
     |---------------------------------------------------------|
  1. |      a        b   281.7   446.0   182.1   446.0   223.0 |
  2. |      a        c   334.5   438.2   193.0   446.0   223.0 |
  3. |      a        d   594.7   352.1   148.3   446.0   223.0 |
  4. |      b        c   446.0   438.2   193.0   396.7   170.5 |
     +---------------------------------------------------------+

Among other details note that SD as calculated by summarize is based on using n - 1 in the denominator for sample size n.

What Stata code did you use to get your results? Why aren't you showing it to us?

Comment

Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#4

05 Jan 2015, 04:17

Thank you very much Nick,

I wish I had the code in Stata to produce the attached table, but I've calculated my example using Excel spreadsheet. The code you attached will help me very much to get the desired result. So many thanks for that.

I would like to calculate the average (and standard deviations) for each group when I take out the joint group from the calculation. For example, for a + d groups I would like to calculate the mean of "a" (group1) but without all the observations belonging to "d", and the same for group "d" (group2), I would like to calculate the mean for all observations belonging to "d" but not to "a". In the attached example, there are no observations "d" without "a" therefore it is missing.
Comment

Oded Mcdossi

Join Date: Jun 2014
Posts: 577

05 Jan 2015, 05:09

Assuming that the number of groups is larger than 4 as in the example, is there an option to insert the code below in a loop?

Code:

clear
input str1  group1 str1  group2 x  
  a  b 223  
  a  b 332  
  a  b 290  
  a  c 223  
  a  c 446  
  a  c 223  
  a  c 446  
  a  d 669  
  a  d 446  
  a  d 669  
  b  c 223  
  b  c 669  
  b  c 446
end
levelsof group1, local(g1)
levelsof group2, local(g2)
local gg : list g1 | g2
gen mean1 = .
gen SD1 = . 
gen mean2 = .
gen SD2 = . 

//The code that I would like to put in a loop
*Begin loop

*group (a-b)
su x if  inlist("a",group1,group2) &  group2!= "b"
replace mean1 = r(mean) if  inlist("a",group1,group2) &  group2== "b"
replace SD1 = r(sd) if  inlist("a",group1,group2) &  group2== "b"

su x if  inlist("b",group1,group2) &  group1!= "a"
replace mean2 = r(mean) if  inlist("b",group1,group2) &  group1== "a"
replace SD2 = r(sd) if  inlist("b",group1,group2) &  group1== "a"
*group (a-c)
su x if  inlist("a",group1,group2) &  group2!= "c"
replace mean1 = r(mean) if  inlist("a",group1,group2) &  group2== "c"
replace SD1 = r(sd) if  inlist("a",group1,group2) &  group2== "c"

su x if  inlist("c",group1,group2) &  group1!= "a"
replace mean2 = r(mean) if  inlist("c",group1,group2) &  group1== "a"
replace SD2 = r(sd) if  inlist("c",group1,group2) &  group1== "a"
*group (a-d)
su x if  inlist("a",group1,group2) &  group2!= "d"
replace mean1 = r(mean) if  inlist("a",group1,group2) &  group2== "d"
replace SD1 = r(sd) if  inlist("a",group1,group2) &  group2== "d"

su x if  inlist("d",group1,group2) &  group1!= "a"
replace mean2 = r(mean) if  inlist("d",group1,group2) &  group1== "a"
replace SD2 = r(sd) if  inlist("d",group1,group2) &  group1== "a"
*group (b-c)
su x if  inlist("b",group1,group2) &  group2!= "c"
replace mean1 = r(mean) if  inlist("b",group1,group2) &  group2== "c"
replace SD1 = r(sd) if  inlist("b",group1,group2) &  group2== "c"

su x if  inlist("c",group1,group2) &  group1!= "b"
replace mean2 = r(mean) if  inlist("c",group1,group2) &  group1== "b"
replace SD2 = r(sd) if  inlist("c",group1,group2) &  group1== "b"

*End loop.
//

collapse x mean1 SD1 mean2 SD2, by(group1 group2)
format x mean? SD? %2.1f
list group1 group2 x mean1 SD1 mean2 SD2

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#6

05 Jan 2015, 08:32

So I think you want

Code:

levelsof group1, local(first) levelsof group2, local(second) foreach g1 of local first { foreach g2 of local second { summ x if inlist("`g1'", group1, group2) & group2 != "`g2'" replace mean1 = r(mean) if inlist("`g1'", group1, group2) & group2 != "`g2'" replace sd1 = r(sd) if inlist("`g1'", group1, group2) & group2 != "`g2'" summ x if inlist("`g2'", group1, group2) & group1 != "`g1'" replace mean2 = r(mean) if inlist("`g2'", group1, group2) & group1 != "`g1'" replace sd2 = r(sd) if inlist("`g2'", group1, group2) & group1 != "`g1'" } }

Theoretically you could even go farther and replace the two "paragraphs" of code in the inner loop by a single, complicated loop over values 1 and 2, but the code becomes very opaque, and it would be difficult to later make modifications to it. So I think this is the "optimal" amount of code reduction through looping for this problem.

Just to be clear on what this code does: for what you referred to in your code comments as *group(b-c), this will set mean1 and sd1 to the mean and standard deviation of x calculated over all observations where group1 or group2 equals "b" and group2 is not equal to "c", and it sets mean2 and sd2 to the mean and standard deviation of x calculated over all observations where group1 or group2 is equal to "c" and group1 is not equal to "b".
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35433

05 Jan 2015, 08:38

I don't think Clyde's code is what you want, as you don't want means and SDs to be assigned to the observations for which they are calculated.

I got this far, but something seems wrong as too many results are missing.

Code:

 
clear
input str1  group1 str1  group2 x  
  a  b 223  
  a  b 332  
  a  b 290  
  a  c 223  
  a  c 446  
  a  c 223  
  a  c 446  
  a  d 669  
  a  d 446  
  a  d 669  
  b  c 223  
  b  c 669  
  b  c 446
end
levelsof group1, local(g1)
levelsof group2, local(g2)
local gg : list g1 | g2
gen mean1 = .
gen SD1 = . 
gen mean2 = .
gen SD2 = . 

quietly foreach g of local gg {
    local GG : list gg - g    
    foreach G of local GG {
        su x if inlist("`g'", group1, group2) & group2 != "`G'" 
        replace mean1 = r(mean) if inlist("`g'", group1, group2) & group2 == "`G'" 
        replace SD1 = r(sd) if inlist("`g'", group1, group2) & group2 == "`G'" 

        su x if inlist("`g'", group1, group2) & group1 != "`G'" 
        replace mean1 = r(mean) if inlist("`g'", group1, group2) & group1 == "`G'" 
        replace SD1 = r(sd) if inlist("`g'", group1, group2) & group1 == "`G'" 
   } 
}

collapse x mean1 mean2 SD1 SD2, by(group1 group2)
format x mean? SD? %2.1f
list group1 group2 x mean1 SD1 mean2 SD2

Comment

Oded Mcdossi

Join Date: Jun 2014
Posts: 577

05 Jan 2015, 17:41

Dear Clyde and Nick thank you very much for your time and effort. Following your suggestions I used Clyde's code with slight changes which help me to reach almost the desired results. The only problem is that now the standard deviation for Group A and B is higher than it should be (223 vs. 182.1). I guess it something in the second (inner) loop. I'll try to improve it and welcome any suggestion or insight.

Code:

clear
input str1  group1 str1  group2 x  
  a  b 223  
  a  b 332  
  a  b 290  
  a  c 223  
  a  c 446  
  a  c 223  
  a  c 446  
  a  d 669  
  a  d 446  
  a  d 669  
  b  c 223  
  b  c 669  
  b  c 446
end

levelsof group1, local(first)
levelsof group2, local(second)
gen mean1 = .
gen sd1 = . 
gen mean2 = .
gen sd2 = . 

foreach g1 of local first {
    foreach g2 of local second {
        summ x if inlist("`g1'", group1, group2) & group2 != "`g2'"
        replace mean1 = r(mean) if inlist("`g1'", group1, group2) & group2 == "`g2'"
        replace sd1 = r(sd) if inlist("`g1'", group1, group2) & group2 == "`g2'"
        
        summ x if inlist("`g2'", group1, group2) & group1 != "`g1'"
        replace mean2 = r(mean) if inlist("`g2'", group1, group2) & group1 == "`g1'"
        replace sd2 = r(sd) if inlist("`g2'", group1, group2) & group1 == "`g1'"
    }
}
collapse x mean1 sd1 mean2 sd2, by(group1 group2)
format x mean? sd? %2.1f
list group1 group2 x mean1 sd1 mean2 sd2

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29956

05 Jan 2015, 19:32

OK, I see the problem. The replace statements have the wrong -if- condition: they are placing the results of the preceding -summ- command into any observation where either group1 or group2 equals `g1' and group2 is equals to `g2'. This leads to the same observations being over-written multiple times through the loop, and you end up with the calculations corresponding to the last `g1' `g2' pair for which the observation meets those conditions. What I think you want is to save those results only when group1 ==" `g1'" and group2 ==" `g2'" If I have that right, you need:

Code:

 clear
input str1  group1 str1  group2 x 
  a  b 223 
  a  b 332 
  a  b 290 
  a  c 223 
  a  c 446 
  a  c 223 
  a  c 446 
  a  d 669 
  a  d 446 
  a  d 669 
  b  c 223 
  b  c 669 
  b  c 446
end
  levelsof group1, local(first)
levelsof group2, local(second)
gen mean1 = .
gen sd1 = .
gen mean2 = .
gen sd2 = .
  foreach g1 of local first {
    foreach g2 of local second {
        summ x if inlist("`g1'", group1, group2) & group2 != "`g2'"
        replace mean1 = r(mean) if group1 == "`g1'" & group2 == "`g2'"
        replace sd1 = r(sd) if group1 == "`g1'" & group2 == "`g2'"
       
        summ x if inlist("`g2'", group1, group2) & group1 != "`g1'"
        replace mean2 = r(mean) if group2 == "`g2'" & group1 == "`g1'"
        replace sd2 = r(sd) if group2 == "`g2'" & group1 == "`g1'"
    }
}
collapse x mean1 sd1 mean2 sd2, by(group1 group2)
format x mean? sd? %2.1f
list group1 group2 x mean1 sd1 mean2 sd2

Is this it?

Comment

Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#10

06 Jan 2015, 04:22

Bingo! Thank you very Clyde.
Comment

Announcement

Summary statistics with conditions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment