cmxtmixlogit for best-worst-scaling method - Case I

Aida Ardebili

Join Date: Mar 2022
Posts: 3

cmxtmixlogit for best-worst-scaling method - Case I

09 Mar 2022, 07:15

Hi,

I am trying to start using cm commands using Stata/MP 17.0 instead of clogit or mixlogit.

I have a data set that is obtained from a fully randomized case I (object case) best-worst-scalling method. The experiment in total contains 18 objects and each individual were faced with 6 different choice sets and each choice set contained 3 objects from which the respondents was asked to choose the best and worst objects. The models I am trying to estimat are similar to models estimated by Lusk and Briggeman (2009) or Bazzani et al. (2018), only with different numbers of items, set size, and experimental design. This is how the data looks like:

HTML Code:

list id set alt gid B W choice $items in 1/12, table ab(6) sep(6) noobs nol

  +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | id   set   alt   gid   B    W   choice   att1   att2   att3   att4   att5   att6   att7   att8   att9   att10   att11   att12   att13   att14   att15   att16   att17   att18 |
  |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  |  1     1     1     1   2   13        1      0      1      0      0      0      0      0      0      0       0       0       0      -1       0       0       0       0       0 |
  |  1     1     2     1   2   13        0      0      1      0      0      0      0      0      0      0       0       0       0       0       0       0      -1       0       0 |
  |  1     1     3     1   2   13        0      0      0      0      0      0      0      0      0      0       0       0       0       1       0       0      -1       0       0 |
  |  1     1     4     1   2   13        0      0     -1      0      0      0      0      0      0      0       0       0       0       1       0       0       0       0       0 |
  |  1     1     5     1   2   13        0      0     -1      0      0      0      0      0      0      0       0       0       0       0       0       0       1       0       0 |
  |  1     1     6     1   2   13        0      0      0      0      0      0      0      0      0      0       0       0       0      -1       0       0       1       0       0 |
  |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  |  1     2     1     2   3   17        0      0      0      1      0      0      0      0      0      0       0      -1       0       0       0       0       0       0       0 |
  |  1     2     2     2   3   17        1      0      0      1      0      0      0      0      0      0       0       0       0       0       0       0       0      -1       0 |
  |  1     2     3     2   3   17        0      0      0      0      0      0      0      0      0      0       0       1       0       0       0       0       0      -1       0 |
  |  1     2     4     2   3   17        0      0      0     -1      0      0      0      0      0      0       0       1       0       0       0       0       0       0       0 |
  |  1     2     5     2   3   17        0      0      0     -1      0      0      0      0      0      0       0       0       0       0       0       0       0       1       0 |
  |  1     2     6     2   3   17        0      0      0      0      0      0      0      0      0      0       0      -1       0       0       0       0       0       1       0 |
  +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Where the group id is obtained as egen gid = group(id set) and objects; att1-att18 are effect coded such that the object takes value 1 if it is chosen as best, -1 if chosen as worst, and 0 if not chosen.

I have no trouble replicating the results from clogit using cmlogit:

Code:

global var "att1 att2 att3 att4 att5 att6 att7 att8 att9 att10 att11 att12 att14 att15 att16 att17 att18"

cmset id set alt
clogit choice $var, group(gid)
cmclogit choice $var, noconstant // same results as clogit but clustor robust s.e.

Although I get the note that

HTML Code:

 variable att1 has # cases that are not alternative-specific; there is no within-case variability.

for all $var but I assume that is due to these objects being effect-coded and are zero for many alternatives. I do not get such note when using clogit.

The problem starts with estimating a random parameter logit (RPL) model. I can easily estimate an RPL model with the following code:

Code:

mixlogit choice, rand($var) id(id) group(gid) nrep(500) burn(50)

or any other number of halton draw (nrep), it just takes longer as the number of halton draws increase. However, I cannot get the same model to converge using cmxtmixlogit. Following are a set of failed attempts in which I tried to incorporate some advices from @Hong Il Yoo or others. These models all get stuck in numerous iterations that either (backed up) or (not concave):

Code:

cmset id set alt
cmxtmixlogit choice, random($var) intpoints(50) intmethod(halton) intburn(15) noconstant
cmxtmixlogit choice, random($var) intpoints(50) intmethod(halton) intburn(15) technique(bfgs) noconstant
cmxtmixlogit choice, random($var) intpoints(50) intmethod(halton, antithetics) intburn(15) technique(bfgs) noconstant
cmxtmixlogit choice, random($var) intpoints(10) noconstant
cmxtmixlogit choice, random($var) intpoints(10) intmethod(halton) technique(bhhh) noconstant collinear
cmxtmixlogit choice, random($var) intpoints(10) intmethod(halton) technique(bfgs) noconstant
cmxtmixlogit choice, random($var) intpoints(10) technique(bhhh) noconstant  
cmxtmixlogit choice, random($var) intpoints(10) technique(bfgs) noconstant
cmxtmixlogit choice, random($var) intpoints(10) intmethod(halton, antithetics) technique(bfgs) noconstant  
cmxtmixlogit choice, random($var) intpoints(10) intmethod(halton, antithetics) technique(bhhh) noconstant

I tried to the feed the model with estimates from mixlogit:

Code:

cmset id set alt
egen gid = group(id set)

mixlogit choice, rand($var) id(id) group(gid)
mat b = e(b)

cmxtmixlogit choice, random($var) from(b, skip) noconstant
cmxtmixlogit choice, random($var) from(b, skip) intmethod(halton) intpoints(50) intburn(15) noconstant
cmxtmixlogit choice, random($var) from(b, skip) intmethod(halton, antithetics) intpoints(50) intburn(15) noconstant
cmxtmixlogit choice, random($var) from(b, skip) intmethod(halton, antithetics) noconstant
* Tried with from(b, copy) as well

and I get the following error for all of them:

HTML Code:

Fitting fixed parameter model:

Fitting full model:

initial values not feasible
r(1400);

So, I wonder:

Is there anything wrong with the data set up or the codes that I am using or mixlogit is better at estimating such a model?
If there is nothing wrong with my set up/codes, is there any benefit in trying to fit model with cmxtmixlogit, given that in this particular case, the main postestimation command I am interested in is to predict the individual-specific parameters which can be abtained by mixlbeta after mixlogit - which I assume is the equivalent to predict scores stub* after cmxtmixlogit.

P.S. I doubt (1) is the case here because using a different but similar data set that has fewer number of objects (11 instead of 17), I managed to replicate the results of mixlogit with cmxtmixlogit, as well as feeding cmxtmixlogit with mixlogit estimate using the exact same codes above. Therefore, I am suspecting that it has to do with increased number of variables and model complexity that cmxtmixlogit seem to get in trouble.

Thanks in advance for your help.

Last edited by Aida Ardebili; 09 Mar 2022, 07:19.

Tags: None

Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 113
#2

09 Mar 2022, 09:09

Hi Aida,

In theory, cmxtmixlogit should be able to fit any model you can fit with mixlogit. One difference, however, is that cmxtmixlogit constrains the variance of the random parameters to be positive which can lead to potential convergence issues in borderline cases where the variance of one or more parameters is close to zero. Regarding your model specifications that do not use starting values from mixlogit, what happens if you add option scalemetric(unconstrained)? This (undocumented) option bypasses the constraint on the variance. As for the models with starting values, since mixlogit uses different names (i.e., stripes) in e(b), cmxtmixlogit won't recognize any of those. If e(b) from mixlogit has the same dimension as the one from cmxtmixlogit, and if the parameters are in the same order, you could try using from(b, copy). Now, that all being said, notice that cmxtmixlogit currently has no postestimation command for predicting individual-specific parameters (predict, scores computes first derivatives of the log likelihood with respect to the model parameters).

I hope this helps,
Joerg
1 like
Comment
Aida Ardebili

Join Date: Mar 2022

Posts: 3
#3

09 Mar 2022, 12:41

Hi Joerg,

Thanks a lot for you response.

It was spot on. I was aware of my standard deviations (SD) that are close to zero (and some that mixlogit return them in negative signs), but did not know about the cmxtmixlogit constraint. I used scalemetric(unconstrained) and the model finally convereged, although, with low integration points and help of these options: intmethod(halton, antithetics) technique(bfgs) so far.

I have some new questions:
Should I be worried about the use of options such as intmethod(halton, antithetics) technique(bfgs), etc.In other words, could these options alter the accuracy of results or lead to unstable results/local maxima?

Assuming that my model convereged with high integration points and the use of scalemetric(unconstrained). Is there any advantage in using one of the cmxtmixlogit and mixlogit over the other? Except the specific postestimation commands, or options that each command offer? For example, is cmxtmixlogit supposed to be faster when the random parameters are specified to be correlated?

This question might be a bit out of the scope, but regarding the SDs that are close to zero and insignificnat, I encountered a strange behavior earlier. An object has an insignificant and almost zero SD when I fit mixlogit; which indicates no preference heterogeneity over that object in the sample. When I fit my model using a latent class logit model with lclogit2, a two segment model emerges in which that very object turns out to be very different across the segments! Is there any reason for this that you can think of?

Also, regarding:

As for the models with starting values, since mixlogit uses different names (i.e., stripes) in e(b), cmxtmixlogit won't recognize any of those. If e(b) from mixlogit has the same dimension as the one from cmxtmixlogit, and if the parameters are in the same order, you could try using from(b, copy).

I do not think the names are an issue in e(b), because I have tried from(b, copy) and got the same error. However, now that I know about the cmxtmixlogit constraint on the variance of the parameters, I suspect the problem is that some of these values in e(b) are negative and therefore initial values not feasible. I have not tried to change these negative values to positive and feed it to cmxtmixlogit. However, when I used my other data set that does not have negative SDs, feeding initial values from mixlogit to cmxtmixlogit worked fine even with from(b, skip).

Finally, thanks a lot on clarifying what the postestimation command predict, scores does after cmxtmixlogit.
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 113
#4

11 Mar 2022, 13:43

Should I be worried about the use of options such as intmethod(halton, antithetics) technique(bfgs), etc.In other words, could these options alter the accuracy of results or lead to unstable results/local maxima?

Assuming that my model convereged with high integration points and the use of scalemetric(unconstrained). Is there any advantage in using one of the cmxtmixlogit and mixlogit over the other? Except the specific postestimation commands, or options that each command offer? For example, is cmxtmixlogit supposed to be faster when the random parameters are specified to be correlated?

This question might be a bit out of the scope, but regarding the SDs that are close to zero and insignificnat, I encountered a strange behavior earlier. An object has an insignificant and almost zero SD when I fit mixlogit; which indicates no preference heterogeneity over that object in the sample. When I fit my model using a latent class logit model with lclogit2, a two segment model emerges in which that very object turns out to be very different across the segments! Is there any reason for this that you can think of?

Re (1): There is, generally speaking, no reason to worry per se. In part, different integration methods are there for models with convergence difficulties. That said, it is always a good idea to check and see how convergence and solutions behave under different methods. The mixed logit model with a lot of random parameters is prone to having a rather "flat" likelihood surface where small differences in the likelihood can correspond to relatively large differences in the parameters. As a consequence, even local maxima that are close to the "true" maximum can yield parameter estimates that are quite off, and simulated likelihood can sometimes cover up such identification issues by converging to a solution when it should not. In your case, I would consider treating the parameters with close-to-zero variance as fixed, rather than random, which should make empirical identification a bit easier (18 random coefficients is a lot, especially if the number of observations is not huge).

Re (2): cmxtmixlogit, when used with the same number of integration points, should generally be a bit faster than mixlogit.

Re (3): There could be many reasons for this behavior, I suppose, both meaningful or simply due to artefacts. How big is your sample, and how big are the differences between coefficients in the latent classes relative to their standard errors? And, are the differences between latent groups of a magnitude that would be of substantial interest? If the sample size is not small, and the difference is of substantial interest, and the model appears to be properly identified, then it might be possible that the discrete mixture model is able to pick up some nonlinearities that the continuous mixture (i.e., mixed logit) cannot. But this is all hard to gauge from the distance...

I hope this helps,
Joerg
1 like
Comment
Aida Ardebili

Join Date: Mar 2022

Posts: 3
#5

15 Mar 2022, 03:48

Thanks a lot. This is very helpful.

Just a little update on my case :

When I specify the random coefficinets to be correlated, -cmxtmixlogit- , as below:

Code:

cmxtmixlogit choice, random($var, correlated) corrmetric(correlation) intpoints(50) intmethod(halton) intburn(15) scalemetric(unconstrained) noconstant

I get the similar error after some 200-300 iteration:

HTML Code:

Refining estimates: initial values not feasible r(1400);

This happens even with the scalemetric(unconstrained)option. I have tried different techniques and all is the same. I realized however that if the corrmetric() is specified as corrmetric(cholesky)which instead of giving the SD and correlations will give the elemest of cholsky factorization of the variance covariance matrix, you manage to solve the problem. It seems like the option scalemetric(unconstrained)does not work for some reason for corrmetric(correlated). The problem then is that I do not see any postestimation command would calculate the SD and correlations with their s.e.
Comment

Announcement

cmxtmixlogit for best-worst-scaling method - Case I

Comment

Comment

Comment

Comment