Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Chi Squared Test Help

    Hi,

    I'm trying to manually perform a chi squared test on Stata, and I'm struggling to work out how to calculate a variable for expected values.

    Say I was trying to find out if something was assigned to different people based on the last digit of an ID (i.e. anything with an ID ending 7 would go to a certain person) as opposed to random assignment, how would I calculate the expected value of the number of '7's' assigned. If I had 865 observations this would in theory be 865/10 = 86.5, rounded down to 86 since 5<7. How would I incorporate the final step in case the observations was not a multiple of 10?

    Thanks in advance.

  • #2
    I can't quite follow what you want to do here, but in general the expected frequencies in a chi-square test will not be integers, which is not at all a problem. Rounding is never needed or appropriate.


    Here is a moderately silly test on the last digits of a price variable against a hypothesis of uniformity on the digits 0 ... 9.

    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . gen last = mod(price, 10)
    
    . chitest last , count
    
    observed frequencies of last; expected frequencies equal
    
             Pearson chi2(9) =  17.6216   Pr =  0.040
    likelihood-ratio chi2(9) =  16.2719   Pr =  0.061
    
      +--------------------------------------------------+
      | last   observed   expected   obs - exp   Pearson |
      |--------------------------------------------------|
      |    0          8      7.400       0.600     0.221 |
      |    1          3      7.400      -4.400    -1.617 |
      |    2          7      7.400      -0.400    -0.147 |
      |    3          4      7.400      -3.400    -1.250 |
      |    4          7      7.400      -0.400    -0.147 |
      |--------------------------------------------------|
      |    5         11      7.400       3.600     1.323 |
      |    6          7      7.400      -0.400    -0.147 |
      |    7          7      7.400      -0.400    -0.147 |
      |    8          4      7.400      -3.400    -1.250 |
      |    9         16      7.400       8.600     3.161 |
      +--------------------------------------------------+
    I used chitest from tab_chi on SSC. If any digit had not occurred I would have reached for chitesti from the same place (because of the need to specify any zeros for observed frequency). . The package also includes tabchi and tabchii which go beyond the chi-square test machinery provided by tabulate.

    It's a small oddity that chi-square tests of this kind appear in many introductory texts and courses but are rarely asked for here.
    Last edited by Nick Cox; 26 Jan 2022, 19:16.

    Comment

    Working...
    X