Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • weights application on chi square test

    Hello,

    I have a large regional dataset with a weight variable ready. I am trying to conduct a chi-square test that would be weighted by the weight variable, but I can't seem to get it right.

    The command I normally use for chi-square is the following: tab fcg country, exp chi2 cchi2.

    When I tried adding [aweight = weight], it did not work. Any suggestions?

    Thanks,

  • #2
    Saying something "did not work" is not helpful. There are many ways in which a command might not work--you need to explain, or better still, show by pasting Stata's output and error messages into your post, exactly what went wrong.

    That said, I can tell you that the chi square calculation simply does not allow aweights. What is your weighting variable? What does it mean? If it is a country size variable, then probably you need to use it as an fweight. The chi square calculation does allow fweights, but no other weights.

    Comment


    • #3
      Thank you Clyde. The problem is that the weights have been calculated individually for each country based on different criteria and sampling designs, so I don't want to re-calculate a weight, but rather just apply that weight variable as is.

      The output I get is "weights not allowed" or "chi2 not allowed with aweights".

      I tried fweight, instead of aweight, but it still gave me an error message "weights not allowed" or "may not use noninteger frequency weights".

      So is there another command I can use to apply chi square while applying that weight variable without any manipulation by stata?

      Comment


      • #4
        Let's approach this from the opposite direction. tabulate, chi2 expects as input integer observed frequencies, as otherwise the test concerned, covered in many introductory courses, makes no sense. That is why only frequency weights are supported. It's not a quirk or arbitrary limitation of tabulate; it's standard statistical logic.

        If you can calculate or at the very least approximate integer observed frequencies then other commands can be used to get a chi-square test. Here for example is tabchi from package tab_chi (SSC). Here we imagine a known population size and percent breakdown, so that we can approximate integer frequencies. The capture noisily shows the error message but lets the script continue.


        Code:
        clear
        input A B percent
        1  1   1.8
        1  2   2.2  
        2  1  47.8
        2  2  48.2
        end
        
        gen observed = 9876 * percent/100
        capture noisily tabchi A B [fw=observed]  
        
        gen fudged = round(observed, 1)
        
        tabchi A B [fw=fudged]
        
                  observed frequency
                  expected frequency
        
        ------------------------------
                  |         B        
                A |        1         2
        ----------+-------------------
                1 |      178       217
                  |  195.940   199.060
                  |
                2 |     4721      4760
                  | 4703.060  4777.940
        ------------------------------
        
                  Pearson chi2(1) =   3.3952   Pr = 0.065
         likelihood-ratio chi2(1) =   3.4013   Pr = 0.065
        That said, your summary

        the weights have been calculated individually for each country based on different criteria and sampling designs
        does not convey to me that a standard chi-square test makes any sense without a great deal of arm-waving and secondary argument.

        Comment


        • #5
          Thank you Nick. So if I understand you correctly, applying chi-square tests on such a dataset does not make sense because weights have been calculated differently and there is no common sampling design between the different countries, correct?

          Does that also apply for ANOVA? and regressions?

          Thanks,

          Comment


          • #6
            Again, I would turn the question round. Fill in X and Y below:


            If weights are calculated differently in different parts of the dataset then it still makes sense to do X because Y.

            There may be a defence in which the word approximately figures. I don't think any reader can tell you what the defence is without more information.

            Comment


            • #7
              I want to do chi square test on a large sample using sample weights.But stata is not allowing for aweights or pweights.Is there any other alternative

              Comment

              Working...
              X