Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two way table of summary statistics

    Dear Statalisters,
    I need your help in creating summary table like the one attached below.

    My data composed of two variables representing different groups (group1, group2) and one variable representing continues measure of the groups.

    Code:
           group1   group2     x  
      1.        a        b   223  
      2.        a        b   332  
      3.        a        b   290  
      4.        a        c   223  
      5.        a        c   446  
      6.        a        c   223  
      7.        a        c   446  
      8.        a        d   669  
      9.        a        d   446  
     10.        a        d   669  
     11.        b        c   223  
     12.        b        c   669  
     13.        b        c   446
    I would like to create a table that the first column presents the mean of the measure for every two groups (in group1 and group2) and a second and third column for any of the groups regardless of the other group. The desired table:
    Group (1-2) Mean group (1-2) Mean (1) Mean (2)
    a-b 281.7 396.7 363.8
    a-c 334.5 396.7 382.3
    a-d 594.7 396.7 594.7
    b-c 446.0 363.8 382.3

    I find it too difficult to produce such a table.

    Code:
    tab group1 group2, s(x) nost nof ​​
    or
    table group1 group2, c(mean x) col row​
    didn't produce exactly the desired results mainly because of the large number of subgroups
    Many thanks for your help!

  • #2
    Code:
    . clear
    
    . input str1  group1 str1  group2 x  
    
            group1     group2          x
      1.   a  b 223  
      2.   a  b 332  
      3.   a  b 290  
      4.   a  c 223  
      5.   a  c 446  
      6.   a  c 223  
      7.   a  c 446  
      8.   a  d 669  
      9.   a  d 446  
     10.   a  d 669  
     11.   b  c 223  
     12.   b  c 669  
     13.   b  c 446
     14. end
    
    . levelsof group1, local(g1)
    `"a"' `"b"'
    
    . levelsof group2, local(g2)
    `"b"' `"c"' `"d"'
    
    . local gg : list g1 | g2
    
    . gen mean1 = .
    (13 missing values generated)
    
    . gen mean2 = .
    (13 missing values generated)
    
    . quietly foreach g of local gg {
    
    . collapse x mean1 mean2, by(group1 group2)
    
    . format x mean? %2.1f
    
    . list group1 group2 x mean1 mean2
    
         +-----------------------------------------+
         | group1   group2       x   mean1   mean2 |
         |-----------------------------------------|
      1. |      a        b   281.7   396.7   363.8 |
      2. |      a        c   334.5   396.7   382.3 |
      3. |      a        d   594.7   396.7   594.7 |
      4. |      b        c   446.0   363.8   382.3 |
         +-----------------------------------------+
    Here's the code concisely:

    Code:
    clear
    input str1  group1 str1  group2 x  
      a  b 223  
      a  b 332  
      a  b 290  
      a  c 223  
      a  c 446  
      a  c 223  
      a  c 446  
      a  d 669  
      a  d 446  
      a  d 669  
      b  c 223  
      b  c 669  
      b  c 446
    end
    levelsof group1, local(g1)
    levelsof group2, local(g2)
    local gg : list g1 | g2
    gen mean1 = .
    gen mean2 = .
    quietly foreach g of local gg {
      su x if inlist("`g'", group1, group2), meanonly
      replace mean1 = r(mean) if group1 == "`g'"
      replace mean2 = r(mean) if group2 == "`g'"
    }
    collapse x mean1 mean2, by(group1 group2)
    format x mean? %2.1f
    list group1 group2 x mean1 mean2
    Last edited by Nick Cox; 03 Jan 2015, 18:21.

    Comment


    • #3
      I think you want the following:

      Code:
      clear*
      input str1 group1   str2 group2     x 
       a        b   223 
       a        b   332 
       a        b   290 
       a        c   223 
       a        c   446 
       a        c   223 
       a        c   446 
       a        d   669 
       a        d   446 
       a        d   669 
       b        c   223 
       b        c   669 
       b        c   446
       end
       
      // FIRST BUILD A DATA SET OF ALL PAIRS OF GROUPS AND ASSOCIATED MEAN x
      preserve
      collapse (mean) mean12 = x, by(group1 group2)
      tempfile building
      save `building'
      
      // NOW DO THE SAME FOR EACH SINGLE GROUP
      restore
      tempfile single_groups
      capture postutil clear
      postfile handle str1 group float mean_x using `single_groups'
      // BUILD A LIST OF ALL LEVELS OF GROUP1 & GROUP2
      levelsof group1, local(g1)
      levelsof group2, local(g2)
      local groups: list g1 | g2
      local groups: list uniq groups
      // LOOP OVER THOSE LEVELS AND ACCUMULATE THE MEANS
      foreach g of local groups {
           summ x if inlist("`g'", group1, group2)
           post handle ("`g'") (`r(mean)')
      }
      postclose handle
      
      // NOW MERGE THE SINGLE GROUP FILE TWICE INTO THE PAIRED GROUP FILE
      preserve
      use `single_groups', clear
      rename group group1
      rename mean_x mean1
      merge 1:m group1 using `building', nogenerate
      save "`building'", replace
      use `single_groups', clear
      rename group group2
      rename mean_x mean2
      merge 1:m group2 using `building', nogenerate
      
      // BEAUTIFY
      order group1 group2 mean12 mean1 mean2
      drop if missing(group1, group2)
      sort group1 group2
      format mean* %2.1f
      
      // AND THE RESULT:
      list, noobs clean // RESTORE THE ORIGINAL DATA
      restore
      NOTE: In the -postfile- command I typed the variable group as str1 following your example. But if your group names are longer, then an appropriate string size should be used. Also, if the levels of group include quotation marks, then "`g'" has to be replaced by `"`g'"' throughout.

      I have the nagging feeling there is a simpler way to do this, but this at least works.

      Comment


      • #4
        Thank you very much Nick and Clyde that was very helpful!

        Comment

        Working...
        X