Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing frequency tables

    I need some pointers as to what type of test I should be looking at to compare two frequency tables. For a concrete example, consider this:

    Code:
    . clear
    
    . set seed 1
    
    . set obs 10000
    Number of observations (_N) was 0, now 10,000.
    
    . gen str stream = "B"
    
    . replace stream = "A" in 1/500
    (500 real changes made)
    
    . gen y = runiformint(1, 10)
    
    . bys stream: tab y
    
    ---------------------------------------------------------------------------------
    -> stream = A
    
              y |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |         44        8.80        8.80
              2 |         44        8.80       17.60
              3 |         43        8.60       26.20
              4 |         55       11.00       37.20
              5 |         35        7.00       44.20
              6 |         51       10.20       54.40
              7 |         45        9.00       63.40
              8 |         66       13.20       76.60
              9 |         65       13.00       89.60
             10 |         52       10.40      100.00
    ------------+-----------------------------------
          Total |        500      100.00
    
    ---------------------------------------------------------------------------------
    -> stream = B
    
              y |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |        915        9.63        9.63
              2 |        933        9.82       19.45
              3 |        928        9.77       29.22
              4 |        929        9.78       39.00
              5 |        989       10.41       49.41
              6 |        919        9.67       59.08
              7 |        940        9.89       68.98
              8 |      1,037       10.92       79.89
              9 |        943        9.93       89.82
             10 |        967       10.18      100.00
    ------------+-----------------------------------
          Total |      9,500      100.00
    My variable y is categorical with k unordered levels. I need some way to either test formally that the distribution of y in stream A is the same as the distribution of y in stream B or else have some measure of the similarity or difference between the two distributions. It seems that having k more than two complicates things, since then stream A could differ from stream B in multiple "directions" (for lack of a better term). I do not have covariates that I can use as relevant controls; I just have the variable y. I want to know if what I see in stream A is similar to what I see in stream B. I will of course use some type of bar chart to show the similarity or difference visually, but a formal hypothesis test would be nice.

    In my application stream B will always be much larger than stream A, and I am willing to assume stream B is fixed and has no sampling variation itself.

    I imagine this is a solved problem, but I have much time with DuckDuckGo searching for things like "comparing frequency tables" with little success.

    I just need to know what methodology I should read up on to proceed with this.

    Thanks!

  • #2
    Pearson's chi2 test of homogeneity (or independence) would be appropriate here and is pretty entry level as far as statistical tests go. The hypothesis of the test is the two distributions are same, and its rejection would provide evidence that they are not.

    Code:
    tab y stream, chi2

    Comment


    • #3
      Originally posted by Leonardo Guizzetti View Post
      Pearson's chi2 test of homogeneity (or independence) would be appropriate here and is pretty entry level as far as statistical tests go. The hypothesis of the test is the two distributions are same, and its rejection would provide evidence that they are not.
      Thank you Leonardo, I knew there had to be a simple test for this. I just complete forgot about this one.

      Comment

      Working...
      X