analysis of multiple rank variables

Giovanni Russo

Join Date: Sep 2015

Posts: 14
#1

analysis of multiple rank variables

27 Dec 2021, 11:25

Dear Statalister,

I have cross sectional data in which respondent are asked to rank the importance of 4 features (price, quality, customisation, innovativeness). The data contains 4 variables (price, quality, customisation, innovativeness) and each variable contains numbers from 1 to 4 marking their position, 1 if ranked first, 2 if ranked second, and so on. All respondents rank the 4 characteristics of the same item.

The purpose of the analysis is to "explain" the ranks using a set of respondent characteristics X (age, sex, education, ......).

I could use a multinomial logit if I would limit the analysis to the MOST important feature, only the features which are ranked 1.

However, I would like to understand how the rank of each feature covary with observed respondent's characteristics (would women give more importance to price?) taking into consideration the correlation between the 4 variables.

It is an instance of an ipsative measure: the sum of the 4 variables is always 10, for all respondents. That is, once the 3 out of the 4 features are ranked, the rank of the remaining feature is automatically determined. The four variables are linearly dependent. This introduces a negative correlation between the four variables. Therefore, I was looking for a way to model the 4 variables jointly, as ordered choices. Ideally I would have liked to be able to run something like:

Code:

cmp (price = age, sex, education) (quality = age, sex, education) (customisation = age, sex, education) (innovativeness = age, sex, education), ind(5 5 5 5) cov(unstructured)

of course, the code does not work of fully ranked data (respondent that actually ranked the 4 features from 1 to 4).

Does anyone have suggestions on how to go about this particular set of variables?

Since there are about 20% of the respondents that do not rank the features perfectly (there are ties), there is some variation in the sum of the ranks of the 4 features. For example, some respondents say that all 4 features are very important (1,1,1,1) some say the all 4 features have little importance (4,4,4,4). When I include these observations I can actually run the cmp model above. However, this is possible only because of the non-perfectly ranked observations. If I were to break the ties the model would not run.

I would like to find a way to correctly model the fully ranked observations and that is also able to handle the partially ranked ones.

My respondents rank only the features for the same object, so I do not have multiple ranks of the same 4 features across various objects/items/vignettes for respondent as in the case of a conjoint analysis.

I do not want to use the ranks given to the 4 features to derive an underlying latent variable as it would be the case using IRT.

Anyone any idea on how to go about these data? Any reference I should be looking into?

Would it make sense to take the difference in rank from a baseline feature? For example, the importance of price is the baseline, then i take difference between the rank of quality and that of price (quality - price, customisation - price, innovativeness - price). The new variables, containing the difference in rank of 3 features from a baseline feature, would then be used in the cmp model. Would this work?

Thanks in advance for your input.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#2

28 Dec 2021, 03:49

The -cmp- command that you show above, is it a user-written contribution?

I take it that you are aware of the CM volume of the set of user manuals and have looked through the entries for the commands that are in it. There is nothing contained in that volume that you could use?
Comment
Giovanni Russo

Join Date: Sep 2015

Posts: 14
#3

28 Dec 2021, 12:38

Dear Joseph,

indeed the cmp command is user written, it was developed by David Roodman, and it is documented here

Thank you for having sent me back to the CM manual, when I checked it I somehow I focused on the ordered logit model for ranked data that does not allow for case specific variables (my Xs, sex, education.....). Going through the material again I realised that its cousin, the rank ordered probit (cmroprobit), admits case specific variables and it is indeed the model I am looking for.

However, with north of 18K observations, when I re-shaped the data set in the long form I got a very large dataset and the full model includes about 150 case specific regressors, so it takes incredibly long to run. In four hours it did not complete the first step of the algorithm to maximise the likelihood. I will try to run it on a server so that it can go as long as it needs. Brute force.

In the meantime, are there simpler or more ingenious alternatives to do the same.....maybe?

Ideas are very much welcome.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#4

28 Dec 2021, 20:15

Originally posted by Giovanni Russo View Post

. . . are there simpler or more ingenious alternatives to do the same.....maybe?

Ideas are very much welcome.

Perhaps you could take a random 10% sample from your 18 000-observation dataset and see whether that allows convergence in a reasonable period of time. If you're worried about its representativeness, then take another random 10% sample and compare the conclusion drawn from it to that drawn from the first.
Comment

Announcement

analysis of multiple rank variables

Comment

Comment

Comment