Comparing 2 distributions: chi2fit, csgof, or something else?

Jen Zhen

Join Date: Sep 2014

Posts: 10
#1

Comparing 2 distributions: chi2fit, csgof, or something else?

01 Sep 2014, 02:34

Dear listers,

I'd like to test whether the distribution of observations in my sample across 30 different regions is representative of that of the population. So for both sample and population I have 30 percentage values (most non-integer, some 0). Now I'm unsure which test or command I should use.

My best guess was to use a Chi2 test, but I've had trouble implementing that in Stata:

When I use -chi2fit freq_population freq_sample- , I get the error "Are you sure you want to run this program with so few observations?". 30 different categories doesn't strike me as few, or does chi2fit need to see all observations rather than just 1 percentage per category?

When I try -csgof freq_sample, expperc(12.38 5.32 ... 0.26)-, I get an error message claiming that freq_sample has fewer observations than the 30 specified in expperc, although I've recounted and this is not true. I also get the error message when replacing all zeros with 0.000001 to check whether the problem might be Stata not counting 0 frequencies, but that wouldn't solve the problem.

Can anyone help here?

Thank you so much and kind regards,
JZ
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

01 Sep 2014, 03:18

We can't judge between your comments and the program messages you get. Your posting is close to the form "I have been advised that I am wrong, but I think I am right." Perhaps so, but we need to see the complete story, especially because some parts of your story don't obviously make sense:

1. Neither chi2fit nor csgof is part of official Stata, so you are expected to make their provenance clear. Please see FAQ Advice Section 12.

2. With just 30 regions and and one set of frequencies, if I understand you correctly, you should be able to post the relevant data, or reduction of the data, here for others to see what you mean. Please use CODE mark-up, not a table, and not an attachment. See same section.

3. You phrase the problem as one of comparing percentages, but chi-square tests for this problem compare observed and expected frequencies, not percentages. It seems that csgof allows input of expected percentages, but I don't accept the task you pose of finding where it comes from and what it does.

4. It's not clear where your zeros occur (see #2 again: we can't see your data!) but zeros for expected frequencies are fatal to chi-square tests. Changing observed or expected frequencies or even percents of 0 to 0.000001 would never be right.
Comment

Announcement

Comparing 2 distributi​ons: chi2fit, csgof, or something else?

Comment

Comparing 2 distributions: chi2fit, csgof, or something else?