Chi-square Crosstab Significant Group Differences with 3 or More Groups

Mark Shadden

Join Date: Jun 2015

Posts: 6
#1

Chi-square Crosstab Significant Group Differences with 3 or More Groups

05 Jun 2015, 09:17

I am using Stata 13. Is there a way to find out which specific groups have statistically significant differences in proportions when running a crosstab chi-square test in Stata similar to the subscripts displayed in SPSS output for crosstab chi-square test?

Example Stata code:
tab edu bmi_cat3, chi V co

Corresponding example SPSS code that gives the subscripts for individual cell differences:
CROSSTABS
/TABLES=edu BY bmi_cat3
/FORMAT=AVALUE TABLES
/STATISTICS=CHISQ PHI ETA
/CELLS=COUNT COLUMN PROP
/COUNT ROUND CELL.

I am also attaching the corresponding outputs. As you can see, both outputs tell us that there is a moderately significant relationship (p=.07) between these two variables overall, but only the SPSS output shows us that relationship only exists for High School or Higher. Is there a way to get this from Stata?
Attached Files
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

05 Jun 2015, 09:24

Mark may be interested in taking a look at -help tab2- and related entry in Stata 13.1 .pdf manual.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mark Shadden

Join Date: Jun 2015

Posts: 6
#3

05 Jun 2015, 09:50

Carlo,

I have looked through the help page extensively and the only option that comes close to this is the cchi2 option, but this still does not give the full information.
Comment
Mark Shadden

Join Date: Jun 2015

Posts: 6
#4

05 Jun 2015, 10:06

Perhaps this is a better example.

Stata code (output attached):
tab2 edu sedtime, chi V co cchi2

SPSS code (output attached):
CROSSTABS
/TABLES=edu BY sedtime
/FORMAT=AVALUE TABLES
/STATISTICS=CHISQ PHI ETA
/CELLS=COUNT COLUMN PROP
/COUNT ROUND CELL.

We can see from the SPSS output that the significant differences in proportions spread throughout many of the categories for each variable, and there is an indication of this in the Stata output but you still can't tell exactly which categories are significantly different just from the contribution of the cell to the chi-square value. For example, it is abundantly clear from the SPSS output that the proportion of every category of Sedentary Hours per Day (Leisure) is significantly different for participants with Elementary or Less education because each cell has a different letter for the subscript (i.e. a, b, c, and d), but how could I tell from the Stata output that the proportions for Sedentary Hours per Day (Leisure) categories 3 to 4 Hours and 5 Hours or More are significantly different for participants with Elementary or Less education just by seeing their large (yet relatively similar) contribution to the chi-square value?
Attached Files
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

05 Jun 2015, 10:32

I think the short answer is simply that Stata doesn't support anything like this within tabulate or its kin. It's not clear to me exactly what extra tests SPSS is performing, with what assumptions, and how far there is any adjustment for the shotgun approach.

The implication, I think, is that to do something even loosely similar you would need to set up the equivalent Poisson regression or generalized linear model for frequencies of different category combinations.

In these examples, the response and the predictors are both ordered, so it's an interesting question whether the SPSS analysis takes account of that.
More importantly how should you model the relationship here? I suspect that most researchers using Stata for this kind of data would prefer a model for an ordered response any way.
Comment
Mark Shadden

Join Date: Jun 2015

Posts: 6
#6

05 Jun 2015, 10:47

Nick,

You are correct in your observation that both variables are ordered and that would matter for a substantively meaningful analysis, but I would like to set that aside as these are just examples that I quickly pulled out of some data that I had available. I humbly request that future commenters graciously giving their time to help me on this issue assume these variables are strictly categorical and not ordered. As for the extra test that SPSS is performing, I believe it is a z-test.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#7

05 Jun 2015, 10:49

Understood. As said, I don't know that Stata supports anything similar. Presumably whatever SPSS does is programmable.
Comment

Announcement

Chi-square Crosstab Significant Group Differences with 3 or More Groups

Comment

Comment

Comment

Comment

Comment

Comment