Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Normalize one coefficient in choice model (mixlogit)

    Hi there,

    I am running a discrete choice model and I would like to normalize the coefficient of one of the explanatory variables (numeraire) to be equal to 1. This seems trivial as in most econometric packages one would simply omit to specify the associated parameter of the numeraire and estimate the remaining coefficients (possibly including the scale of the logit error). Does anybody know a sensible way of doing this in Stata?

    Thanks,

    Matteo

  • #2
    If it's a parametric constraint per se that you're interested in, you can use the option -constraint()- as you do with other Stata commands. But am I right in guessing that what you're really interested in is estimating the mixed logit model in the WTP space? If yes, you may use Arne Risa Hole's -mixlogitwtp- command directly, without worrying about reparametrisation. You can download the command by typing -ssc install mixlogitwtp-.

    Comment


    • #3
      Thanks a lot Hong, this is very much helpful, In fact, I am not interested in estimating the distribution of the random coefficient for the monetary/numeraire regressor (I do specificy random coefficients for other regressors though). So I guess that I can go with the constraint option using standard mixlogit. Thanks again,

      Matteo

      Comment


      • #4
        By the way, the option constraints() in mixlogit requires one to specify the initial values for the random coefficients via the form() option. Does anybody know how this can be done? here is my code:
        mixlogit choice z, random(x y) constraints(z=-1) form(b0)

        I have tried both with a vector b0 with initials for mean being the estimated coefficients of the corresponding clogit and the standard deviations being equal to 0.1 and several other variants but none of the seem to be working. I usually get this error message:

        initial vector: extra parameter r1 found

        Thanks,

        Matteo

        Comment


        • #5
          Matteo Bobba: Just out of curiosity, is there any theoretical or practical reason why you'd like to constrain the coefficient on the cost attribute to -1? You're not normalising the coefficient, since the maximised log-likelihood value may change depending on whether you impose the constraint or not. I am aware that specifying the cost coefficient as a non-random coefficient makes it easier to derive the joint distribution of the implied WTP measures and calculate their moments. But constraining the non-random coefficient further to -1 without re-parametrising the index function is something new to me, and I would appreciate your advice.

          P.S. If you search -mixlogit constraints- in this forum, you'll find a few code examples that you may find useful.

          Comment


          • #6
            I am estimating a model of school choice and in that literature it is common to specify a distance-metric utility function, which is quasi-linear in distance with a coefficient of -1 (see, e.g., Abdulkadiroglu et al., AER 2017). The additive separable form in distance with a normalized coefficient embeds a scale normalization, which allows the researcher to measure utility in distance units, expressed as a “willingness-to-travel.”

            Can mixlogit in stata (or variants thereof) accommodate such model environment? If so, on top of the constraint() option should I include additional options in order to re-parametrize the likelihood according to the scale normalization?

            Comment


            • #7
              Matteo Bobba: I assume that you are referring to

              Abdulkadiroğlu, Atila, Nikhil Agarwal, and Parag A. Pathak. 2017. "The Welfare Effects of Coordinated Assignment: Evidence from the New York City High School Match." American Economic Review, 107 (12): 3635-89.DOI: 10.1257/aer.20151425

              Note that they have estimated the scale of the error term (sigma_{epsilon} in their notation) as a model parameter, instead of normalizing it to 1. In simple, what they have estimated is a probit version of a special case of the model that -mixlogitwtp- fits: by default -mixlogitwtp- specifies the scale parameter as a random coefficient but their model has it as a non-random coefficient.

              The -mixlogit- command normalizes the scale to 1 and estimates other parameters. To estimate a logit analogue to their model using -mixlogit-, one must estimate the cost (distance in as their application) coefficient as a non-random coefficient, instead of constraining it to -1. The estimated cost coefficient can be interpreted as -1/sigma_{epsilon} and you can use this information to construct -nlcom- statements to recover WTP distributions (willingness to travel distributions in their application).

              For more structured explanations please see the Train and Weeks reference in -mixlogitwtp-'s help file.
              Last edited by Hong Il Yoo; 02 Apr 2020, 02:36.

              Comment


              • #8
                Thanks again. One related issue I am experiencing with mixlogit is that once I specify more than two or three random coefficients, the likelihood becomes non-concave and hence more difficult to maximize -- sometimes it takes 30-40 iterations if it converges at all. I have tried different options within -maximize-, such "difficult" or to change the maximization algorithm every 3-5 iterations but the problem seems to persist. Any suggestion on how to improve the optimization performance?

                Comment


                • #9
                  Matteo Bobba: In my experience, one or more of the following often turn out to be helpful.

                  (1) Change the number of replications (e.g. -nrep(500)-)

                  (2) Change the number of pseudo-random number draws to burn (e.g. -burn(100)-)

                  (3) My pet "kludge" is -technique(bfgs 5 nr 5)-, inspired by Stephen Jenkins's -technique(dfp 5 nr 5)- that I saw in this thread some time ago: https://www.statalist.org/forums/for...ion-techniques

                  (4) When you choose starting values without any prior information, experiment with "wild" guesses. For example, suppose that your best guess is that the population standard deviation of parameter beta_{1} is 0.5. Instead of using 0.5 as a starting value, you can try 2.5 or 5 or even 10.

                  (5) Once again, when you choose starting values without any prior information, try to alternate signs of your guesses. For example, when you choose starting values for Cholesky factors for the population standard deviations of three independent random coefficients, you may try something like 1, -1, 1 or -1, 1, -1 (by default, -mixlogit- goes for 0.25, 0.25, 0.25).

                  (6) Arne Risa Hole and I have a paper on using optimisation heuristics to find "good" starting values for -gmnl- and -mixlogit- (https://doi.org/10.1111/rssc.12209). We implemented our estimation strategy in Stata, and you can download the software component from [here].


                  Last edited by Hong Il Yoo; 06 Apr 2020, 14:35.

                  Comment


                  • #10
                    In the AER paper that I have mentioned above, the authors use Gibbs' sampling rather than simulated maximum likelihood to estimate a ranked-order probit model with random coefficients. Are you aware of any Stata implementation for this estimator?

                    Comment


                    • #11
                      I am not aware of any. But you can use Matthew Baker's -bayesmixedlogit- carry out Bayesian estimation of the random coefficient rank-ordered logit model. You can download the command from the SSC archives as usual. You will have to "explode" your rank-ordered responses into "pseudo-choice" responses prior to estimation; please see a minimal example that I provide in section 6 of the background paper for -lclogit2- (http://dro.dur.ac.uk/29867/1/29867.pdf) and the Train reference therein.

                      Comment


                      • #12
                        Thanks so much, Hong II Yoo. Your feedback is very useful and very much appreciated. One alternative for my model may be fitting a multinomial probit and/or its rank-ordered version, which allows to relax the IIA through the covariance of the normal error terms without having to estimate random coefficients. Unfortunately, the Stata 14 commands -asmprobit- and -asroprobit- only allow for a maximum of 20 alternatives/choices, whereas in my data individuals choose among 200-300 schools. Do you know if the updated Stata 16 versions of these commands -cmmprobit- and -cmroprobit- allow for such large choice sets?

                        Comment


                        • #13
                          Thanks for your kind words!

                          I'm afraid that I have not used those commands. But I don't think they will help, even if they allow you to specify a large number of alternatives. Maximum (simulated) likelihood estimation of an unrestricted variance-covariance matrix of some 300 error components is a scary task, and you may need to wait for several months, if not years, before seeing a single coefficient estimate.

                          The -mixlogit- model may be interpreted as a normal error component (NEC) model, as well as a random coefficient model. Under the NEC interpretation, one may use -mixlogit- to approximate -asmprobit- and -asroprobit- by choosing an appropriate specification of alternative-specific constants. See for example, Walker, Ben-Akiva and Bolduc (2007, https://doi.org/10.1002/jae.971). Perhaps you can think about a set of structural restrictions that you would like to place on your multivariate normal error covariance matrix, and use -mixlogit- to estimate a NEC logit mixture model that allows for the desired patterns of correlation and heteroskedasticity.
                          Last edited by Hong Il Yoo; 11 Apr 2020, 11:00.

                          Comment


                          • #14
                            Matteo Bobba have you solved your problem? I'm grappling with a similar issue, trying to estimate a mixed logit model with a large number of parameters. I believe the issue is the number of alternative-specific constants (200+ in my case) and the corresponding size of the variance-covariance matrix, rather than the number of random parameters. I was just curious to see if you had found a solution.

                            Comment


                            • #15
                              Originally posted by Hong Il Yoo View Post
                              Matteo Bobba: In my experience, one or more of the following often turn out to be helpful.

                              (1) Change the number of replications (e.g. -nrep(500)-)

                              (2) Change the number of pseudo-random number draws to burn (e.g. -burn(100)-)

                              (3) My pet "kludge" is -technique(bfgs 5 nr 5)-, inspired by Stephen Jenkins's -technique(dfp 5 nr 5)- that I saw in this thread some time ago: https://www.statalist.org/forums/for...ion-techniques

                              (4) When you choose starting values without any prior information, experiment with "wild" guesses. For example, suppose that your best guess is that the population standard deviation of parameter beta_{1} is 0.5. Instead of using 0.5 as a starting value, you can try 2.5 or 5 or even 10.

                              (5) Once again, when you choose starting values without any prior information, try to alternate signs of your guesses. For example, when you choose starting values for Cholesky factors for the population standard deviations of three independent random coefficients, you may try something like 1, -1, 1 or -1, 1, -1 (by default, -mixlogit- goes for 0.25, 0.25, 0.25).

                              (6) Arne Risa Hole and I have a paper on using optimisation heuristics to find "good" starting values for -gmnl- and -mixlogit- (https://doi.org/10.1111/rssc.12209). We implemented our estimation strategy in Stata, and you can download the software component from [here].

                              Hi, Hong II Yoo, thanks a lot for your advices, I am wondering how to incorporate the code of mat b(v), mat b(e) in the mixlogitwtp code that needs convergence, since from (b,v) does not work in that specific code.

                              Comment

                              Working...
                              X