I am working on a model that aims to analyze the determinants of the probability for a group of companies to position their new plants at a specific location within a list of possible locations, as a function of a set of location-specific characteristics. The sample contains 354 possible locations and 1270 investments in new plants for a total of 449.580 observations.
To estimate this model, so far we have used a mixed-logit model containing only variables with random coefficients. The regression call read something like this:
I ran this model on a supercomputer and it was successfully estimated in about an hour.
Things got complicated when we decided to add also case-specific variables, i.e. firm characteristics. At this point, we started using an alternative-specific mixed logit model (the command used to be -asmixlogit- but the Stata help file mentions it has been replaced by -cmmixlogit-, so I used the latter one). After realizing how hard it is to get these models to finish a single estimation, and building on the valuable insights contained in this post, I tried to run a trial model by using the command:
However, I haven't been able to see the end of a single estimation. On the supercomputer we are using, the maximum run time is 120 hours, and during this time so far the model has not converged. Therefore, my questions are the following:
1. Is it correct that -asmixlogit- is equivalent to -cmmixlogit-?
2. In the linked post above, it is suggested that it's possible to speed up the computation by opting for a different integration method (-intmethod-), setting a lower number of integration points (-intpoints-), and using a different maximization algorithm (-technique-). What could be the best combination of these options? What could be for instance a lower, yet acceptable, number of integration points? And what could be a possible combination of maximization algorithms with respective iterations?
Any help is extremely appreciated.
To estimate this model, so far we have used a mixed-logit model containing only variables with random coefficients. The regression call read something like this:
Code:
mixlogit location, group(investment_id) cluster(firm_id) id(firm_id) nrep(50) rand(loc_charact_1 loc_charact_2...loc_charact_15)
Things got complicated when we decided to add also case-specific variables, i.e. firm characteristics. At this point, we started using an alternative-specific mixed logit model (the command used to be -asmixlogit- but the Stata help file mentions it has been replaced by -cmmixlogit-, so I used the latter one). After realizing how hard it is to get these models to finish a single estimation, and building on the valuable insights contained in this post, I tried to run a trial model by using the command:
Code:
cmset investment_id location_id cmmixlogit location, random(loc_charact_1 loc_charact_2...loc_charact_15) casevars(firm_charact_1...firm_charact_3) favor(speed) intmethod(halton, antithetics)
1. Is it correct that -asmixlogit- is equivalent to -cmmixlogit-?
2. In the linked post above, it is suggested that it's possible to speed up the computation by opting for a different integration method (-intmethod-), setting a lower number of integration points (-intpoints-), and using a different maximization algorithm (-technique-). What could be the best combination of these options? What could be for instance a lower, yet acceptable, number of integration points? And what could be a possible combination of maximization algorithms with respective iterations?
Any help is extremely appreciated.
Comment