Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summary statistics with conditions

    Hello,
    Following my previous topic 'Two way table' (and thanks again for Nick and Clyde for their help) I'm trying to modify the table under different scenario. I would like to build a table like attached below.
    The first column in the table present the mean of group1 and group2 and the following columns present the mean of group1 or group2 after taking out the members belong to group1 and group2. In other words, calculate the mean without the observations of the other group.
    Is there a simple way to do that?

    Group (1-2) Mean group (1-2) Mean (1) SD (1) Mean (2) SD (2)
    (without group2) (without group2) (without group1) (without group1)
    a-b 281.7 446.0 182.1 446.0 223.0
    a-c 334.5 438.2 193.0 446.0 223.0
    a-d 594.7 311.9 100.5 . .
    b-c 446.0 281.7 55.0 334.5 128.7



    My data composed of two variables representing different groups (group1, group2) and one variable representing continues measure of the groups.

    Code:
     clear
    input str1  group1 str1  group2 x 
      a  b 223 
      a  b 332 
      a  b 290 
      a  c 223 
      a  c 446 
      a  c 223 
      a  c 446 
      a  d 669 
      a  d 446 
      a  d 669 
      b  c 223 
      b  c 669 
      b  c 446
    end
    Thanks and regards

  • #2
    Oded:
    have you taken a look at -help if-?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I am struggling to understand what you want, but this may indicate some technique.

      Code:
       
      clear
      input str1  group1 str1  group2 x  
        a  b 223  
        a  b 332  
        a  b 290  
        a  c 223  
        a  c 446  
        a  c 223  
        a  c 446  
        a  d 669  
        a  d 446  
        a  d 669  
        b  c 223  
        b  c 669  
        b  c 446
      end
      levelsof group1, local(g1)
      levelsof group2, local(g2)
      local gg : list g1 | g2
      gen mean1 = .
      gen SD1 = . 
      gen mean2 = .
      gen SD2 = . 
      quietly foreach g of local gg {
        su x if group2 != "`g'" 
        replace mean1 = r(mean) if group2 == "`g'" 
        replace SD1 = r(sd) if group2 == "`g'"
        su x if group1 != "`g'" 
        replace mean2 = r(mean) if group1 == "`g'"
        replace SD2 = r(sd) if group1 == "`g'"
      }
      collapse x mean1 mean2 SD1 SD2, by(group1 group2)
      format x mean? SD? %2.1f
      list group1 group2 x mean1 SD1 mean2 SD2
      That produces

      Code:
       
           +---------------------------------------------------------+
           | group1   group2       x   mean1     SD1   mean2     SD2 |
           |---------------------------------------------------------|
        1. |      a        b   281.7   446.0   182.1   446.0   223.0 |
        2. |      a        c   334.5   438.2   193.0   446.0   223.0 |
        3. |      a        d   594.7   352.1   148.3   446.0   223.0 |
        4. |      b        c   446.0   438.2   193.0   396.7   170.5 |
           +---------------------------------------------------------+
      Among other details note that SD as calculated by summarize is based on using n - 1 in the denominator for sample size n.

      What Stata code did you use to get your results? Why aren't you showing it to us?


      Comment


      • #4
        Thank you very much Nick,

        I wish I had the code in Stata to produce the attached table, but I've calculated my example using Excel spreadsheet. The code you attached will help me very much to get the desired result. So many thanks for that.

        I would like to calculate the average (and standard deviations) for each group when I take out the joint group from the calculation. For example, for a + d groups I would like to calculate the mean of "a" (group1) but without all the observations belonging to "d", and the same for group "d" (group2), I would like to calculate the mean for all observations belonging to "d" but not to "a". In the attached example, there are no observations "d" without "a" therefore it is missing.

        Comment


        • #5
          Assuming that the number of groups is larger than 4 as in the example, is there an option to insert the code below in a loop?

          Code:
          clear
          input str1  group1 str1  group2 x  
            a  b 223  
            a  b 332  
            a  b 290  
            a  c 223  
            a  c 446  
            a  c 223  
            a  c 446  
            a  d 669  
            a  d 446  
            a  d 669  
            b  c 223  
            b  c 669  
            b  c 446
          end
          levelsof group1, local(g1)
          levelsof group2, local(g2)
          local gg : list g1 | g2
          gen mean1 = .
          gen SD1 = . 
          gen mean2 = .
          gen SD2 = . 
          
          //The code that I would like to put in a loop
          *Begin loop
          
          *group (a-b)
          su x if  inlist("a",group1,group2) &  group2!= "b"
          replace mean1 = r(mean) if  inlist("a",group1,group2) &  group2== "b"
          replace SD1 = r(sd) if  inlist("a",group1,group2) &  group2== "b"
          
          su x if  inlist("b",group1,group2) &  group1!= "a"
          replace mean2 = r(mean) if  inlist("b",group1,group2) &  group1== "a"
          replace SD2 = r(sd) if  inlist("b",group1,group2) &  group1== "a"
          *group (a-c)
          su x if  inlist("a",group1,group2) &  group2!= "c"
          replace mean1 = r(mean) if  inlist("a",group1,group2) &  group2== "c"
          replace SD1 = r(sd) if  inlist("a",group1,group2) &  group2== "c"
          
          su x if  inlist("c",group1,group2) &  group1!= "a"
          replace mean2 = r(mean) if  inlist("c",group1,group2) &  group1== "a"
          replace SD2 = r(sd) if  inlist("c",group1,group2) &  group1== "a"
          *group (a-d)
          su x if  inlist("a",group1,group2) &  group2!= "d"
          replace mean1 = r(mean) if  inlist("a",group1,group2) &  group2== "d"
          replace SD1 = r(sd) if  inlist("a",group1,group2) &  group2== "d"
          
          su x if  inlist("d",group1,group2) &  group1!= "a"
          replace mean2 = r(mean) if  inlist("d",group1,group2) &  group1== "a"
          replace SD2 = r(sd) if  inlist("d",group1,group2) &  group1== "a"
          *group (b-c)
          su x if  inlist("b",group1,group2) &  group2!= "c"
          replace mean1 = r(mean) if  inlist("b",group1,group2) &  group2== "c"
          replace SD1 = r(sd) if  inlist("b",group1,group2) &  group2== "c"
          
          su x if  inlist("c",group1,group2) &  group1!= "b"
          replace mean2 = r(mean) if  inlist("c",group1,group2) &  group1== "b"
          replace SD2 = r(sd) if  inlist("c",group1,group2) &  group1== "b"
          
          *End loop.
          //
          
          collapse x mean1 SD1 mean2 SD2, by(group1 group2)
          format x mean? SD? %2.1f
          list group1 group2 x mean1 SD1 mean2 SD2

          Comment


          • #6
            So I think you want
            Code:
            levelsof group1, local(first)
            levelsof group2, local(second)
            foreach g1 of local first {
                foreach g2 of local second {
                    summ x if inlist("`g1'", group1, group2) & group2 != "`g2'"
                    replace mean1 = r(mean) if inlist("`g1'", group1, group2) & group2 != "`g2'"
                    replace sd1 = r(sd) if inlist("`g1'", group1, group2) & group2 != "`g2'"
            
                    summ x if inlist("`g2'", group1, group2) & group1 != "`g1'"
                    replace mean2 = r(mean) if inlist("`g2'", group1, group2) & group1 != "`g1'"
                    replace sd2 = r(sd) if inlist("`g2'", group1, group2) & group1 != "`g1'"
                }
            }
            Theoretically you could even go farther and replace the two "paragraphs" of code in the inner loop by a single, complicated loop over values 1 and 2, but the code becomes very opaque, and it would be difficult to later make modifications to it. So I think this is the "optimal" amount of code reduction through looping for this problem.

            Just to be clear on what this code does: for what you referred to in your code comments as *group(b-c), this will set mean1 and sd1 to the mean and standard deviation of x calculated over all observations where group1 or group2 equals "b" and group2 is not equal to "c", and it sets mean2 and sd2 to the mean and standard deviation of x calculated over all observations where group1 or group2 is equal to "c" and group1 is not equal to "b".

            Comment


            • #7
              I don't think Clyde's code is what you want, as you don't want means and SDs to be assigned to the observations for which they are calculated.

              I got this far, but something seems wrong as too many results are missing.

              Code:
               
              clear
              input str1  group1 str1  group2 x  
                a  b 223  
                a  b 332  
                a  b 290  
                a  c 223  
                a  c 446  
                a  c 223  
                a  c 446  
                a  d 669  
                a  d 446  
                a  d 669  
                b  c 223  
                b  c 669  
                b  c 446
              end
              levelsof group1, local(g1)
              levelsof group2, local(g2)
              local gg : list g1 | g2
              gen mean1 = .
              gen SD1 = . 
              gen mean2 = .
              gen SD2 = . 
              
              quietly foreach g of local gg {
                  local GG : list gg - g    
                  foreach G of local GG {
                      su x if inlist("`g'", group1, group2) & group2 != "`G'" 
                      replace mean1 = r(mean) if inlist("`g'", group1, group2) & group2 == "`G'" 
                      replace SD1 = r(sd) if inlist("`g'", group1, group2) & group2 == "`G'" 
              
                      su x if inlist("`g'", group1, group2) & group1 != "`G'" 
                      replace mean1 = r(mean) if inlist("`g'", group1, group2) & group1 == "`G'" 
                      replace SD1 = r(sd) if inlist("`g'", group1, group2) & group1 == "`G'" 
                 } 
              }
              
              collapse x mean1 mean2 SD1 SD2, by(group1 group2)
              format x mean? SD? %2.1f
              list group1 group2 x mean1 SD1 mean2 SD2

              Comment


              • #8
                Dear Clyde and Nick ​thank you very much for your time and effort. Following your suggestions I used Clyde's code with slight changes which help me to reach almost the desired results. The only problem is that now the standard deviation for Group A and B is higher than it should be (223 vs. 182.1). I guess it something in the second (inner) loop. I'll try to improve it and welcome any suggestion or insight.

                Code:
                clear
                input str1  group1 str1  group2 x  
                  a  b 223  
                  a  b 332  
                  a  b 290  
                  a  c 223  
                  a  c 446  
                  a  c 223  
                  a  c 446  
                  a  d 669  
                  a  d 446  
                  a  d 669  
                  b  c 223  
                  b  c 669  
                  b  c 446
                end
                
                levelsof group1, local(first)
                levelsof group2, local(second)
                gen mean1 = .
                gen sd1 = . 
                gen mean2 = .
                gen sd2 = . 
                
                foreach g1 of local first {
                    foreach g2 of local second {
                        summ x if inlist("`g1'", group1, group2) & group2 != "`g2'"
                        replace mean1 = r(mean) if inlist("`g1'", group1, group2) & group2 == "`g2'"
                        replace sd1 = r(sd) if inlist("`g1'", group1, group2) & group2 == "`g2'"
                        
                        summ x if inlist("`g2'", group1, group2) & group1 != "`g1'"
                        replace mean2 = r(mean) if inlist("`g2'", group1, group2) & group1 == "`g1'"
                        replace sd2 = r(sd) if inlist("`g2'", group1, group2) & group1 == "`g1'"
                    }
                }
                collapse x mean1 sd1 mean2 sd2, by(group1 group2)
                format x mean? sd? %2.1f
                list group1 group2 x mean1 sd1 mean2 sd2

                Comment


                • #9
                  OK, I see the problem. The replace statements have the wrong -if- condition: they are placing the results of the preceding -summ- command into any observation where either group1 or group2 equals `g1' and group2 is equals to `g2'. This leads to the same observations being over-written multiple times through the loop, and you end up with the calculations corresponding to the last `g1' `g2' pair for which the observation meets those conditions. What I think you want is to save those results only when group1 ==" `g1'" and group2 ==" `g2'" If I have that right, you need:

                  Code:
                   clear
                  input str1  group1 str1  group2 x 
                    a  b 223 
                    a  b 332 
                    a  b 290 
                    a  c 223 
                    a  c 446 
                    a  c 223 
                    a  c 446 
                    a  d 669 
                    a  d 446 
                    a  d 669 
                    b  c 223 
                    b  c 669 
                    b  c 446
                  end
                    levelsof group1, local(first)
                  levelsof group2, local(second)
                  gen mean1 = .
                  gen sd1 = .
                  gen mean2 = .
                  gen sd2 = .
                    foreach g1 of local first {
                      foreach g2 of local second {
                          summ x if inlist("`g1'", group1, group2) & group2 != "`g2'"
                          replace mean1 = r(mean) if group1 == "`g1'" & group2 == "`g2'"
                          replace sd1 = r(sd) if group1 == "`g1'" & group2 == "`g2'"
                         
                          summ x if inlist("`g2'", group1, group2) & group1 != "`g1'"
                          replace mean2 = r(mean) if group2 == "`g2'" & group1 == "`g1'"
                          replace sd2 = r(sd) if group2 == "`g2'" & group1 == "`g1'"
                      }
                  }
                  collapse x mean1 sd1 mean2 sd2, by(group1 group2)
                  format x mean? sd? %2.1f
                  list group1 group2 x mean1 sd1 mean2 sd2
                  Is this it?


                  Comment


                  • #10
                    Bingo! Thank you very Clyde.

                    Comment

                    Working...
                    X