Hi everyone, here is a more substantial update. I have been running models for more than two weeks now and tried several alternative options. Though none of the models achieved convergence, I believe I made significant steps forward throughout the various attempts. Here are some of the things I found:
- The default maximization -technique- (nr) doesn't work well with my models. Even when setting a low number of integration points (e.g., 50), it iterates extremely slowly. In 120 hours (which is the maximum run time in the cluster I am using) it just completes one iteration. Conversely, as suggested by Hong Il Yoo, -bhhh- and -bfgs- proceed must faster: even with the default number of integration points, they do tens of iterations. They have not converged yet, but more on this below.
- As expected, using the default number of integration points results in a much more effective maximization as compared to using 50 or 100 integration points. Therefore, it might be better to choose a faster -technique- and avoid setting a low number of points. (Please let me know in case you disagree with this statement.)
- Regardless of the -technique- used, after about 80 iterations, improvements in the likelihood function are extremely small.
Based on these findings, there are a couple of additional questions I would like to ask:
1. As mentioned by Joerg Luedicke (StataCorp), it is possible to feed the starting matrix of coefficients to -cmmixlogit- to facilitate the maximization. This thread suggests that -cmmixlogit- accepts the starting matrix even from different estimators such as -mixlogit- having only random coefficients. If true, this might be really helpful as our last models before switching to -cmmixlogit- were estimated successfully using -mixlogit-. Can anyone confirm whether it is indeed possible to use -mixlogit- coefficient matrices (only random coeffs) in -cmmixlogit- models?
2. Since our models didn't achieve convergence, I thought of increasing the -tolerance- a bit. In particular, I set a slightly higher -ltolerance(0.0001)- for the likelihood function. However, this parameter was ignored. Do you have any idea why? Is this perhaps due to the fact that the function was -not concave- or -backed up-?
3. Only once, when using -technique(bhhh)- after 300 iterations (the default maximum number), did Stata produce some results. Since the model had not effectively converged, Stata issued an error message (-r(430)-) and failed to execute the following -estimates save- instruction (this is normal behavior of course when encountering an error). Do you think that by -capture-ing a model that achieves the maximum number of iterations without converging I would be able to save the estimates and use them later on (e.g. by -esttab-ing them)? Also, since after about 80-100 iterations the likelihood function does not increase substantially anymore, do you find the idea of setting a lower -iterate(#)- acceptable?
4. Until now I assumed that changing -intmethod- to Hanlon would not produce great effects. From experience do you think this assumption might be incorrect?
Thanks so much again for any idea you are able to share.
- The default maximization -technique- (nr) doesn't work well with my models. Even when setting a low number of integration points (e.g., 50), it iterates extremely slowly. In 120 hours (which is the maximum run time in the cluster I am using) it just completes one iteration. Conversely, as suggested by Hong Il Yoo, -bhhh- and -bfgs- proceed must faster: even with the default number of integration points, they do tens of iterations. They have not converged yet, but more on this below.
- As expected, using the default number of integration points results in a much more effective maximization as compared to using 50 or 100 integration points. Therefore, it might be better to choose a faster -technique- and avoid setting a low number of points. (Please let me know in case you disagree with this statement.)
- Regardless of the -technique- used, after about 80 iterations, improvements in the likelihood function are extremely small.
Based on these findings, there are a couple of additional questions I would like to ask:
1. As mentioned by Joerg Luedicke (StataCorp), it is possible to feed the starting matrix of coefficients to -cmmixlogit- to facilitate the maximization. This thread suggests that -cmmixlogit- accepts the starting matrix even from different estimators such as -mixlogit- having only random coefficients. If true, this might be really helpful as our last models before switching to -cmmixlogit- were estimated successfully using -mixlogit-. Can anyone confirm whether it is indeed possible to use -mixlogit- coefficient matrices (only random coeffs) in -cmmixlogit- models?
2. Since our models didn't achieve convergence, I thought of increasing the -tolerance- a bit. In particular, I set a slightly higher -ltolerance(0.0001)- for the likelihood function. However, this parameter was ignored. Do you have any idea why? Is this perhaps due to the fact that the function was -not concave- or -backed up-?
3. Only once, when using -technique(bhhh)- after 300 iterations (the default maximum number), did Stata produce some results. Since the model had not effectively converged, Stata issued an error message (-r(430)-) and failed to execute the following -estimates save- instruction (this is normal behavior of course when encountering an error). Do you think that by -capture-ing a model that achieves the maximum number of iterations without converging I would be able to save the estimates and use them later on (e.g. by -esttab-ing them)? Also, since after about 80-100 iterations the likelihood function does not increase substantially anymore, do you find the idea of setting a lower -iterate(#)- acceptable?
4. Until now I assumed that changing -intmethod- to Hanlon would not produce great effects. From experience do you think this assumption might be incorrect?
Thanks so much again for any idea you are able to share.
Comment