Validity of composite variable

Gabriel Fernandez

Join Date: Apr 2014

Posts: 12
#1

Validity of composite variable

10 Apr 2014, 06:02

Hello everyone,

I'm trying do define a population with specific characteristics (eg: eating well).
I have many dummy variables that i wish to use, positively or negatively (eg: having everyday -breakfast -fruits -vegetable -sweets)
I generated a composite variable "eatingwell= breakfast + fruits + veget +1-sweets"
Now, i want to make sure that this variable is really valid, that is to say my dummy variables are correlated together.

Now the questions:
1) I tried to use Cronbach's alpha, but i'm having a hard time to interpret it. I know that 0.7 is a good level, but i get results ranging from 0.4 to 0.6. Is that enough to use a composite variable?

2) I also tried the command "factor", but i dont understand the meaning of the results, and i dont know if i should use principal factor or principal component factor method.
Is the screeplot a good method to visualize the results of this factor analysis?

3) Finally, i tried to use "candisc list_of_dummy_var , group(composite_var)" , but i get an error message if i put the exact variables that were used to generate my composite variable (pooled within-group SSCP matrix has rank 4 instead of rank 5). If i choose to drop one of the dummy variable, i can use scandisc and the screeplot, but i'm not sure of the meaning of the screeplot. I initially thought it would give me the validity of the dummy variables, but now i wonder if it is not rather the validity of the functions used by the discriminant analysis.

4)Last, is manually generating my composite variable a good way of doing it, or should i use another command such as ".egen compvar "? And then how can i make stata understand that some variable have to be used positively and other negatively?

To sum up: What is the best way to generate a composite variable, and what is the best way to assess it's validity.

Thank you for your help.

Gabriel FERNANDEZ
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35210
#2

10 Apr 2014, 06:09

I would just use the indicators as they come in a model. If one is redundant, your results will show that.

PCA or factor analysis on discrete variables divides the experts, but many have reservations about that, to put it politely. In any talk on this, someone is all too likely to spring up and say "But that's known to be wrong!"

The important thing is, unless your data are exceptional, that no composite variable can capture all the information in the indicators. For that and other reasons, a composite variable is just all too likely to confuse the issue and to complicate interpretations.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3405
#3

10 Apr 2014, 06:57

Originally posted by Gabriel Fernandez View Post

I have many dummy variables that i wish to use, positively or negatively (eg: having everyday -breakfast -fruits -vegetable -sweets)
I generated a composite variable "eatingwell= breakfast + fruits + veget +1-sweets"
Now, i want to make sure that this variable is really valid, that is to say my dummy variables are correlated together.

I don't think that for this construct correlation is not required. I would consider the observed variables as components that together add up to "eatingwell" rather than that there is a latent variable "eatingwell" that makes people eat breakfast, fruit, etc. See for example:

Bollen, K. A. 1984. Multiple indicators: Internal consistency or no necessary relationship? Quality & Quantity 18(4): 399–385.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3405
#4

10 Apr 2014, 08:12

Also, adding the sum of a set of variables is equivalent to adding all these variables and constraining their effects to be equal, which is a testable constraint. See for example:
M.L. Buis (2012) "Stata tip 108: On adding and constraining", The Stata Journal, 12(2), pp. 342-344.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#5

10 Apr 2014, 09:19

To some extent, the answer depends on how you are using the variable eatingwell. If it is going to be an independent variable in a model (e.g., do people who eat well have fewer health problems?) then I would start with Nick's advice and keep them separate. If it is the outcome of an intervention (e.g., do people who take a nutrition class in high school eat better than those who don't?), then it may be useful to construct a composite and investigate whether the composite is valid. This however, should not preclude looking at each of the components separately to see if the intervention affects one component more than others. (Traditionally, in biomedical research, for the purposes of sample size calculation and maintaining a low type-I error, one might consider having the composite as the primary outcome of the study and the components as secondary outcomes.)
Comment
Gabriel Fernandez

Join Date: Apr 2014

Posts: 12
#6

10 Apr 2014, 09:34

Just so you know, my final goal is to describe a specific population with bad toothbrushing habit, and i thought that using a few composite variables (not eating well, having addictive habits, low socio economic status, etc...) could be a clearer (if not better) way to show this than making a simple logistic regression. I changed my mind after having read your messages.

To quote your message:
"If it is going to be an independent variable in a model (e.g., do people who eat well have fewer health problems?) then I would start with Nick's advice and keep them separate."

My answer is, yes, exactly.

EDIT: uh oh, editing delete the previous message, so i say it again:

I thank you all for your help again, and i will try to use variables as they came as you advised. Bollen article was really useful.

Last edited by Gabriel Fernandez; 10 Apr 2014, 09:53.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3405
#7

10 Apr 2014, 10:26

In that case, here is another reference: Alan S. Blinder (1974) “The Economics of Brushing Teeth” Journal of Political Economy, 82(4): 887-891. (it is not serious, but really funny)

You could also consider using sheaf coeficients as a compromise: you still add all the variables and get the summary effect in one coefficient. You can read more here: http://www.maartenbuis.nl/software/sheafcoef.html

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

Validity of composite variable

Comment

Comment

Comment

Comment

Comment

Comment