Pearson's chi square test when data are in contingency table format

Patrick Schober

Join Date: Mar 2016

Posts: 17
#1

Pearson's chi square test when data are in contingency table format

26 Feb 2020, 00:57

Dear Statalist,

I have an apparently very simple problem, but hours of google, discussion with a statistician who regularly uses Stata and searching this forum have not provided an answer...

I have the following dataset with 7 schools, as well as the number of male and female students:

Code:

input School Males Females 1 430 598 2 87 203 3 43 35 4 278 143 5 720 1347 6 388 613 7 702 610 end

I am trying to do a chi-square test to determine whether there is evidence for a difference in gender across schools, and I am aware that this can be done with the immediate tab (tabi) command:

Code:

tabi 430 598 \ 87 203 \ 43 35 \ 278 143 \ 720 1347 \ 388 613 \ 702 610, chi2

However, I would like to have a code that can also be used on similar but different datasets, using the variables that are in the dataset rather than an immediate command that needs to be changed each time.

I have not been able to find a simple way of doing this. As a workaround, I expanded the dataset to one observation per subject, with a categorical variable indicating to which school the subject belongs, and a second binary variable coding gender. The data can then easily be analysed with "tab..., chi2", giving an identical result as the immediate command. However, this approach is quite tedious (there is probably a simpler way of doing this, but it cost me about 20 lines of code...), and I guess there must be some simpler way of telling Stata that the data are in contingency table format?

Thank you and best regards,
Patrick

Last edited by Patrick Schober; 26 Feb 2020, 01:07.
Tags: None

Maarten Buis

Join Date: Mar 2014
Posts: 3456

26 Feb 2020, 01:21

Code:

clear
input School Males Females
1    430    598
2    87    203
3    43    35
4    278    143
5    720    1347
6    388    613
7    702    610
end

rename Males freq1
rename Females freq2

reshape long freq, i(School) j(gender)
label define gender 1 "Males" 2 "Females"
label value gender gender

tab School gender [fw=freq], chi

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Patrick Schober

Join Date: Mar 2016

Posts: 17
#3

26 Feb 2020, 01:24

Thank you very much Maarten! I greatly appreciate your assistance!
Best regards,
Patrick
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35697

26 Feb 2020, 02:26

Here's another way to do it with a quick-and-dirty use of Mata as calculator. (I checked that neither tabchi or chitest in tab_chi from SSC support this data structure.)

Code:

clear

input School Males Females
1    430    598
2    87    203
3    43    35
4    278    143
5    720    1347
6    388    613
7    702    610
end

tabi 430 598 \ 87 203 \ 43 35 \ 278 143 \ 720 1347 \ 388 613 \ 702 610, chi2

return list

scalars:
                  r(p) =  8.14672848677e-49
               r(chi2) =  239.2425724266586
                  r(c) =  2
                  r(r) =  7
                  r(N) =  6197


*! 1.0.0 NJC 26 Feb 2020 Statalist thread 1538476
program anotherchicmd
    version 16
    syntax varlist(numeric min=2) [if] [in]
    
    marksample touse
    quietly count if `touse'
    if r(N) == 0 error 2000
    
    mata : obs = st_data(., "`varlist'", "`touse'")
    mata : exp = (rowsum(obs) * colsum(obs)) :/ sum(obs)
    mata : chi2 = sum((obs :- exp):^2 :/ exp)
    di
    di "Chi-square statistic: " _c  
    mata : chi2
    di "d.f.:                 " _c
    mata : (rows(obs) - 1) * (cols(obs) - 1)
    di "P-value:              " _c
    mata : chi2tail((rows(obs) - 1) * (cols(obs) - 1), sum((obs :- exp):^2 :/ exp))
    
end

anotherchicmd Males Females


Chi-square statistic:   239.2425724
d.f.:                   6
P-value:                8.14673e-49

.

This will work with some versions before 16, provided you change the version number, but I didn't check any others. It doesn't oblige a reshape.

Last edited by Nick Cox; 26 Feb 2020, 02:42.

Comment

Patrick Schober

Join Date: Mar 2016

Posts: 17
#5

26 Feb 2020, 05:35

Thank you very much, Nick!
To be honest, Mata is still beyond my capabilities, but perhaps this is a good opportunity to get started!
Best regards,
Patrick
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#6

26 Feb 2020, 05:39

The point is that you don't need to know Mata. You have a program you can run. It's not much tested, and there won't be a help file forthcoming, but you should see what it does.

Incidentally, it allows 2 or more variables as input, so is not restricted to 2 x k tables.
Comment
Patrick Schober

Join Date: Mar 2016

Posts: 17
#7

02 Jun 2020, 08:40

Thank you Nick, and sorry for the delayed response!
Comment

Announcement