Originally posted by Timo Leise
View Post
It's the threshold of five most numerous of their combinations that appears to be semi-arbitrary. It's possible that at least some of those combinations that you've excluded will be more strongly associated with profit.
Also, the particular combinations of characteristics displayed by firms in these five most numerous combinations might be for underlying causal reasons . . . and they might not, i.e., the presence of at least some of the characteristics among the five most numerous combinations might be coincidental.
My research question is what combination of criteria are actually used by companies (that's why I only choose the most commons ones and not combinations that are only used by one company) . . .
. . . and what combinations (that why I look at the clusters) work best
I'm no economist, but I could see that in the long run, as a result of the expected association between profitability and firm survival, you could conjecture that particular combinations of characteristics would be naturally selected for survivability. And you might observe that these are more or less associated with the value of profit that you measure. But, selection / endogeneity considerations aside, when you choose the five most survived (prevalent) combinations, you've truncated your range of predictors and thereby attenuated the ability of your regression model to reveal the association overall.
In addition, by looking only at combinations, you risk that at least some of the characteristics you observe among those five most numerous combinations might be analogous to evolutionarily neutral traits, unrelated to survival advantage (long-term profitability).
Could you maybe add some more information on how you would exactly test all possible combinations of characteristics? I thought to test this I need to go with interaction terms.
Comment