Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pearson's chi square test when data are in contingency table format

    Dear Statalist,

    I have an apparently very simple problem, but hours of google, discussion with a statistician who regularly uses Stata and searching this forum have not provided an answer...

    I have the following dataset with 7 schools, as well as the number of male and female students:

    Code:
    input School Males Females
    1    430    598
    2    87    203
    3    43    35
    4    278    143
    5    720    1347
    6    388    613
    7    702    610
    end
    I am trying to do a chi-square test to determine whether there is evidence for a difference in gender across schools, and I am aware that this can be done with the immediate tab (tabi) command:

    Code:
    tabi 430 598 \ 87 203 \ 43 35 \ 278 143 \ 720 1347 \ 388 613 \ 702 610, chi2
    However, I would like to have a code that can also be used on similar but different datasets, using the variables that are in the dataset rather than an immediate command that needs to be changed each time.

    I have not been able to find a simple way of doing this. As a workaround, I expanded the dataset to one observation per subject, with a categorical variable indicating to which school the subject belongs, and a second binary variable coding gender. The data can then easily be analysed with "tab..., chi2", giving an identical result as the immediate command. However, this approach is quite tedious (there is probably a simpler way of doing this, but it cost me about 20 lines of code...), and I guess there must be some simpler way of telling Stata that the data are in contingency table format?

    Thank you and best regards,
    Patrick
    Last edited by Patrick Schober; 26 Feb 2020, 01:07.

  • #2
    Code:
    clear
    input School Males Females
    1    430    598
    2    87    203
    3    43    35
    4    278    143
    5    720    1347
    6    388    613
    7    702    610
    end
    
    rename Males freq1
    rename Females freq2
    
    reshape long freq, i(School) j(gender)
    label define gender 1 "Males" 2 "Females"
    label value gender gender
    
    tab School gender [fw=freq], chi
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you very much Maarten! I greatly appreciate your assistance!
      Best regards,
      Patrick

      Comment


      • #4
        Here's another way to do it with a quick-and-dirty use of Mata as calculator. (I checked that neither tabchi or chitest in tab_chi from SSC support this data structure.)

        Code:
        clear
        
        input School Males Females
        1    430    598
        2    87    203
        3    43    35
        4    278    143
        5    720    1347
        6    388    613
        7    702    610
        end
        
        tabi 430 598 \ 87 203 \ 43 35 \ 278 143 \ 720 1347 \ 388 613 \ 702 610, chi2
        
        return list
        
        scalars:
                          r(p) =  8.14672848677e-49
                       r(chi2) =  239.2425724266586
                          r(c) =  2
                          r(r) =  7
                          r(N) =  6197
        
        
        *! 1.0.0 NJC 26 Feb 2020 Statalist thread 1538476
        program anotherchicmd
            version 16
            syntax varlist(numeric min=2) [if] [in]
            
            marksample touse
            quietly count if `touse'
            if r(N) == 0 error 2000
            
            mata : obs = st_data(., "`varlist'", "`touse'")
            mata : exp = (rowsum(obs) * colsum(obs)) :/ sum(obs)
            mata : chi2 = sum((obs :- exp):^2 :/ exp)
            di
            di "Chi-square statistic: " _c  
            mata : chi2
            di "d.f.:                 " _c
            mata : (rows(obs) - 1) * (cols(obs) - 1)
            di "P-value:              " _c
            mata : chi2tail((rows(obs) - 1) * (cols(obs) - 1), sum((obs :- exp):^2 :/ exp))
            
        end
        
        anotherchicmd Males Females
        
        
        Chi-square statistic:   239.2425724
        d.f.:                   6
        P-value:                8.14673e-49
        
        .
        This will work with some versions before 16, provided you change the version number, but I didn't check any others. It doesn't oblige a reshape.
        Last edited by Nick Cox; 26 Feb 2020, 02:42.

        Comment


        • #5
          Thank you very much, Nick!
          To be honest, Mata is still beyond my capabilities, but perhaps this is a good opportunity to get started!
          Best regards,
          Patrick

          Comment


          • #6
            The point is that you don't need to know Mata. You have a program you can run. It's not much tested, and there won't be a help file forthcoming, but you should see what it does.

            Incidentally, it allows 2 or more variables as input, so is not restricted to 2 x k tables.

            Comment


            • #7
              Thank you Nick, and sorry for the delayed response!

              Comment

              Working...
              X