Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Observed and expected frequencies in tabulate twoway

    Dear Statalist members:

    I'm using Stata 12. I'm interested in obtaining the ratio between the observed and expected frequencies in a x by y table obtained by a two-way tabulation. Below you will find a code to generate a 3x3 table similar to my original dataset (my problem is with a 7x7 table, but I believe I can extend a solution to a 3x3 to my original data)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long cluster_series_np float(first second)
    12 3 3
     0 3 5
     4 3 6
     0 5 3
     1 5 5
     2 5 6
     3 6 3
     2 6 5
    43 6 6
    end
    I thought on capturing the matrices of results from my tab command in two different matrices ("o" and "e") and then dividing o/e. I'm using the matcell command to do such task:

    Code:
    tabulate first second [fweight = cluster_series_np],  matcell(o)
    matrix list o
    However, I have not been able to store my expected frequencies in a similar matrix. Even though the command below displays my expected frequencies on each cell, the matrix it stores contains the observed (and not the expected) frequencies.

    Code:
    tabulate first second [fweight = cluster_series_np],  expected nofreq matcell(e)
    matrix list e
    I don't take this as being a problem with the command, but as me expecting it to do something it was not set up to (the help tabulate twoway file says the matcell command store frequencies on a matrix, and not the printed results). In any case, I was planing on obtaining both matrices and then dividing them elementwise calling mata and using a command such as:

    Code:
    mata : st_matrix("result", st_matrix("o") :/ st_matrix("e"))
    mat li result
    I believe I might be overly complicating things. I read the tabulate help file but couldn't find a way for it to show the observed/expected ratio, and working with matrices (as I was planing) also brings problems regarding formatting of the output (which I can probably deal with, but it adds another layer of complexity).

    Is there something simple that I'm overlooking? Any advices on how to obtain the said o/e ratio?

    Thank you in advance for any help/tips.

    Best;

  • #2
    You're along the right lines in thinking of Mata. This is one way to work:

    Code:
    clear
    input long cluster_series_np float(first second)
    12 3 3
     0 3 5
     4 3 6
     0 5 3
     1 5 5
     2 5 6
     3 6 3
     2 6 5
    43 6 6
    end
    
    tabulate first second [fweight = cluster_series_np],  matcell(o)
    matrix list o
    
    mata 
    o = st_matrix("o")
    st_matrix("e", (rowsum(o)* colsum(o)) / sum(o)) 
    st_matrix("oOVERe", o :/ e) 
    end 
    
    mat li e 
    
    e[3,3]
               c1         c2         c3
    r1  3.5820896  .71641791  11.701493
    r2  .67164179  .13432836  2.1940299
    r3  10.746269  2.1492537  35.104478
    
    mat li oOVERe 
    
    oOVERe[3,3]
               c1         c2         c3
    r1       3.35          0  .34183673
    r2          0  7.4444444  .91156463
    r3  .27916667  .93055556   1.224915

    Comment


    • #3
      Thank you for your reply Nick.

      When I run the code, I get an error here:
      Code:
      st_matrix("oOVERe", o :/ e)
      It returns " <istmt>: 3499 e not found
      r(3499);"

      After that, when asking to list the matrix oOVERe, it does not find said matrix ("matrix oOVERe not found').

      The commands make sense to me, you are creating a matrix "e" which is total value for lines*total values for rows divided by total observations in matrix "o". Then you are creating a new matrix called oOVERe, which is the result of division of matrix "o" by "e". I made some adjustments to your code and came to this:

      Code:
      clear
      input long cluster_series_np float(first second)
      12 3 3
       0 3 5
       4 3 6
       0 5 3
       1 5 5
       2 5 6
       3 6 3
       2 6 5
      43 6 6
      end
      
      tabulate first second [fweight = cluster_series_np],  matcell(o)
      matrix list o
      
      mata 
      o = st_matrix("o")
      st_matrix("e", (rowsum(o)* colsum(o)) / sum(o)) 
      st_matrix("oOVERe", st_matrix("o") :/ st_matrix("e"))
      end 
      
      mat li e 
      
      mat li oOVERe
      Thank you so much for your help. Wish you a nice day!

      Best;

      Comment


      • #4
        Sorry, yes. My example worked for me because I had a matrix e in Mata memory from a previous iteration of the code. Here's a fixed version:

        Code:
        clear
        input long cluster_series_np float(first second)
        12 3 3
         0 3 5
         4 3 6
         0 5 3
         1 5 5
         2 5 6
         3 6 3
         2 6 5
        43 6 6
        end
        
        tabulate first second [fweight = cluster_series_np],  matcell(o)
        matrix list o
        
        mata 
        o = st_matrix("o")
        e = (rowsum(o)* colsum(o)) / sum(o)
        st_matrix("e", e) 
        st_matrix("oOVERe", o :/ e) 
        end 
        
        mat li e 
        
        mat li oOVERe

        Comment


        • #5
          It's perhaps relevant that the commands tabchi and tabchii from tab_chi (SSC) give more kinds of tabular output (but not matrix output) for chi-square tests than does tabulate.

          I've never written them up. They are for Stata 6. If I rewrote them, I would likely add options for more kinds of output.

          Comment

          Working...
          X