max # of variables in Cochran's Q test?

Amaranta Losey

Join Date: Feb 2020

Posts: 12
#1

max # of variables in Cochran's Q test?

27 Aug 2022, 15:00

Hello- I am trying to evaluate multiple choice (non-exclusive) binary questions. Ex: on a survey, a participant can choose "yes" box on any of 6 medical conditions (diabetes, high cholesterol, stroke etc). I wanted to evaluate the relationships- thought I could use the cochran's Q test, but the command is telling me I am imputing too many variables. How many can you do at a time?
I
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#2

27 Aug 2022, 16:08

So far as I can tell, there is no built-in Cochran's Q implemented in Stata, so presumably you are trying to use some community-contributed program to calculate it, but you haven't told us what that is. Further, you didn't show us the syntax you actually used. Also, you didn't show us a sample of your data, which is another thing requested in the StataList FAQ for new users. (See the description of the -dataex- command in the FAQ.) All those things make it difficult to give you a useful answer. Nevertheless, I'll offer a guess that whatever command you are trying to use expects your data to be in the long layout, but that your data is in wide layout, and the syntax you are trying to use therefore doesn't fit with whatever is described in the help for whatever command you might be using.

All that being said: While Cochran's Q can be used with your data, I'm inclined to think that what it would tell you would not be very interesting to you. In your context, Cochran's Q would test the null hypothesis that in the population, prevalence of all of these conditions is equally likely, vs. the alternative that at least one of them has higher prevalence than some other one. Many null hypotheses are uninteresting, but that strikes me as less interesting than usual <grin>.

You say that you want to "evaluate the relationships." If by that you have in mind questions like "Is the population proportion of high cholesterol different between persons with vs. without diabetes," that's a different and I'd say more interesting question, and could be addressed in Stata by (among other commands)
-tabulate cholesterol diabetes, col chi2-
Comment
Amaranta Losey

Join Date: Feb 2020

Posts: 12
#3

29 Aug 2022, 12:23

Thank you for pointing out the many deficits.
I am using the community contributed program recommended in Stata's help function- cochranq. Cochran's Q test for stochastic dominance in blocked binary data
Program by Alexis Dinno.
Support: [email protected]
Version 1.3.6 (Updated: August 31, 2021)

The sample of my data is listed above: each medical problam is checked/unchecked (0/1) but are not mutually exclusive. so any one participant may be yes to 3 items (eg diabetes, hypertension, cholesterol).
comorb1 comorb2 comorb3
Checked Checked Unchecked
Unchecked Unchecked Checked
Unchecked Checked Unchecked
Unchecked Unchecked Unchecked

When I run the cochran program- it gives me this output:
. cochranq comorb1 comorb2 comorb3 comorb4 comorb5 comorb6
too many variables specified

I don't want to perform individual pairwise comparisons because there is no baseline reference to compare each to relative to each other. I thought the Cochran Q would test the overall proportion- but perhaps a binary logistic regression is better.

Thank you for taking the time to answer.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#4

29 Aug 2022, 20:43

Thanks for indicating which program you used and showing your syntax. I don't, however, see a -dataex- example of your data in your posting; perhaps it was accidentally omitted or you didn't understand what I meant. If the latter is true, please do take a look at the FAQ re the -dataex- command. It does appear you have the wide layout, and it appears you have string rather than numeric data, which might be wrong. If you had given a data example with -dataex-, I'd have been able to verify those things and would have given you precise code for what I think you want.

"I don't want to perform individual pairwise comparisons because there is no baseline reference to compare each to relative to each other."

I don't understand what you mean by that ("baseline reference"??), but regarding individual comparisons (e.g., diabetes vs stroke, diabetes vs high cholesterol, etc.), I believe that the -cochranq- program offers p-values for such comparisons, but I believe it also offers the omnibus test to detect if there is any departure from equal proportions across all comparisons.

Yes, you could use a binary conditional logit model here, if your data were put into long layout. You'd have something like:

Code:

clogit comorb i.condition, group(id)

(assuming for illustration that id is a subject identifier, comorb is a 0/1 indicator for absence/presence, and condition (1/6) is a variable indicating which condition that observation pertains to.)

That would give you a test of the proportion of presence of each condition vs. what it is in the reference category, and would also give an omnibus test of "no differences." I still would wonder whether answers to questions like "Is the population proportinon of stroke less prevalent than diabetes?" would be interesting to you.
Comment

Announcement

max # of variables in Cochran's Q test?

Comment

Comment

Comment