Postestimation mixed logit with unlabeled alternatives

Andrea Baldin

Join Date: Feb 2016

Posts: 35
#1

Postestimation mixed logit with unlabeled alternatives

25 Jan 2022, 09:30

Dear Stata users,
I have a panel data set with unlabeled alternatives. I know that for labeled alternatives the command -margins allows to estimate the probability to choose a particular alternative. In the case of unlabeled alternatives, does exist a command that shows the probability to choose an alternative with a given level of an attribute based on the values of the other variables? Or, in general, which postestimation command can be used for a multinomial or mixed logit model when the alternatives are unlabeled?
Thank you!
Tags: choice models, mixed logit, postestimation

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

25 Jan 2022, 10:12

Andrea:
you can go -predict-:

Code:

set obs 10
g id=_n
g categorical=0 in 1/2
replace categorical=1 in 3/4
replace categorical=2 if categorical==.
g y=runiform()*1000
g x=runiform()*10
mlogit categorical y x

Iteration 0:   log likelihood = -9.5027054 
Iteration 1:   log likelihood = -7.1965965 
Iteration 2:   log likelihood = -5.7726835 
Iteration 3:   log likelihood = -4.5912685 
Iteration 4:   log likelihood = -4.0989536 
Iteration 5:   log likelihood = -3.9666641 
Iteration 6:   log likelihood = -3.9420437 
Iteration 7:   log likelihood = -3.9380981 
Iteration 8:   log likelihood =  -3.937661 
Iteration 9:   log likelihood = -3.9375726 
Iteration 10:  log likelihood = -3.9375536 
Iteration 11:  log likelihood = -3.9375493 
Iteration 12:  log likelihood = -3.9375483 
Iteration 13:  log likelihood =  -3.937548 

Multinomial logistic regression                         Number of obs =     10
                                                        LR chi2(4)    =  11.13
                                                        Prob > chi2   = 0.0251
Log likelihood = -3.937548                              Pseudo R2     = 0.5856

------------------------------------------------------------------------------
 categorical | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0            |
           y |  -.0044691   .0049188    -0.91   0.364    -.0141097    .0051716
           x |   .1412349   .3308003     0.43   0.669    -.5071216    .7895915
       _cons |  -.1256983   1.986501    -0.06   0.950    -4.019168    3.767772
-------------+----------------------------------------------------------------
1            |
           y |  -.7364417   335.5885    -0.00   0.998    -658.4777    657.0048
           x |   18.80708    9983.02     0.00   0.998    -19547.55    19585.17
       _cons |    10.2523   23940.81     0.00   1.000    -46912.87    46933.37
-------------+----------------------------------------------------------------
2            |  (base outcome)
------------------------------------------------------------------------------
Note: 2 observations completely determined. Standard errors questionable.


. predict probability, probability

. predict p1 if e(sample), outcome(1)
(option pr assumed; predicted probability)

. predict p2 if e(sample), outcome(2)
(option pr assumed; predicted probability)

. predict p0 if e(sample), outcome(0)
(option pr assumed; predicted probability)

. list

     +---------------------------------------------------------------------------------+
     | id          y          x   catego~l  probability        p1        p2         p0 |
     |---------------------------------------------------------------------------------|
  1. |  1   348.8717   2.047095          0   .1984976          0   .8015024   .1984976 |
  2. |  2   266.8857   8.927587          0   .4856181   5.30e-09   .5143819   .4856181 |
  3. |  3   136.6463   5.844658          1   3.56e-09          1   3.26e-09   3.56e-09 |
  4. |  4   28.55687   3.697791          1   3.93e-26          1   3.01e-26   3.93e-26 |
  5. |  5   868.9333    8.50631          2   .0569128          0   .9430872   .0569128 |
     |---------------------------------------------------------------------------------|
  6. |  6   350.8549   3.913819          2   .2421501          0   .7578499   .2421501 |
  7. |  7   71.10509   1.196613          2   .4318103   1.73e-09   .5681896   .4318103 |
  8. |  8    323.368   7.542434          2   .3762258   2.69e-38   .6237742   .3762258 |
  9. |  9   555.1031   6.950233          2   .1645329          0   .8354671   .1645329 |
 10. | 10    875.991   6.866152          2   .0443267          0   .9556733   .0443267 |
     +---------------------------------------------------------------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#3

26 Jan 2022, 03:55

If you're working with data from a discrete choice experiment which randomises attributes across unlabelled alternatives, applying -margins- or -predict- in the usual fashion doesn't generate interesting statistics because the alternatives do not have any meaning. Prior to applying either post-estimation command, you must first think about a choice scenario where each alternative represents some package of attributes which makes sense in the context of your analysis; input that information into Stata either as part of a post-estimation command or through the -replace- procedure; and obtain the results pertaining to that scenario that you have made up. For example, your experiment may be about unlabelled job choices by nurses and once you have fitted your model to the actual data where the job characteristics vary randomly across three alternatives, you may use the estimates to answer the question of how choice probabilities vary across the three alternatives when the first job has typical characteristics of a rural job, the second has typical characteristics of an urban job at a public hospital, and the third has typical characteristics of an urban job at a private hospital.

More often than not, when working with a discrete choice experiment data, one is interested in willingness-to-pay (WTP) or other measures of marginal rates of substitution rather than marginal effects and predicted probabilities. For conditional logit models, there's a post-estimation command -wtp- command by Arne Risa Hole. For mixed logit models, there is no one stop package that you can apply. Perhaps the simplest approach is to specify your model in the WTP space and apply Arne Ria Hole's -mixlogitwtp- command, which directly estimates the WTP distribution rather than requiring you to derive that based on your preference space estimates.
2 likes
Comment

Andrea Baldin

Join Date: Feb 2016
Posts: 35

26 Jan 2022, 05:16

Thank you Carlo and Hong for your input.
I clarify the context of my work: I have different tennis players that each week choose a tournament. The tournaments are the unlabeled alternatives, they are described through their attributes. I report here a simpler version of the model. The tournament characteristic is given by the variable "Level" (from 1 to 9), while the player's characteristics that interact with the variable Level are Age and Ranking.

Code:

cmxtmixlogit Choice Level c.Level#c.AGE c.Level#c.RANKING if wctot==0, noconstant vce(cluster id)
note: 3284 cases (8959 obs) dropped due to no positive outcome, multiple positive outcomes, or a single observation per case.
note: alternatives are unbalanced.

Fitting fixed parameter model:

Fitting full model:

Iteration 0:   log pseudolikelihood = -42409.585  
Iteration 1:   log pseudolikelihood = -42409.585  

Mixed logit choice model                     Number of obs        =    172,584
                                             Number of cases      =     34,421
Panel variable: id                           Number of panels     =        548

Time variable: TOURNPL~D                     Cases per panel: min =          1
                                                              avg =      100.3
                                                              max =        177

                                             Alts per case:   min =          2
                                                              avg =        5.0
                                                              max =         10

Integration points:              0                Wald chi2(3)    =    1872.62
Log pseudolikelihood = -42409.585                 Prob > chi2     =     0.0000

                                        (Std. err. adjusted for 548 clusters in id)
-----------------------------------------------------------------------------------
                  |               Robust
           Choice | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
------------------+----------------------------------------------------------------
Choice            |
            Level |   .6386856   .0535036    11.94   0.000     .5338205    .7435507
                  |
    c.Level#c.AGE |   .0040039   .0018027     2.22   0.026     .0004708    .0075371
                  |
c.Level#c.RANKING |  -.0036298   .0000961   -37.79   0.000    -.0038181   -.0034416
-----------------------------------------------------------------------------------

My question is: how can I calculate the probability to increase the tournament level (from 1 to 2) as a function of the ranking?

Thank you!

Comment

Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#5

27 Jan 2022, 01:53

To evaluate the effects of an increase in the tournament level on the predicted probability, you must first think about (1) how many tournaments are in your choice set; (2) what characteristics each tournament has; and (3) whether you're interested in own effects or cross effects. Before proceeding to post-estimation analyses, you'll need to sit back and think through each of these issues: Directly applying the -predict- or -margins- command to your raw data is unlikely to generate interesting estimates because the choice set dimension and composition varies from case to case.

For example, you may be interested in evaluating a choice scenario where (1) each player faces 3 tournaments; (2) one tournament has a level of 1; one has a level of 3; and one has a level of 5; and (3) the outcome of interest is the own (cross) effect of increasing the level of the first tournament on the probability of participating in the first (second or third) tournament.

Once you have developed your question, perhaps the easiest way to proceed will be to (a) construct a new postestimation data set which includes actual AGE and RANKING of each player; (b) add three rows per player, and set Level to 1 in row 1, 3 in row 2 and 5 in row 3 where the row number is counted within a player; (c) load your estimates based on the actual data, using the -estimates use- or -estimates load- command as appropriate; and (d) apply a post-estimation command of interest in the usual fashion.
1 like
Comment
Andrea Baldin

Join Date: Feb 2016

Posts: 35
#6

29 Jan 2022, 03:49

.

Last edited by Andrea Baldin; 29 Jan 2022, 03:52.
Comment
Andrea Baldin

Join Date: Feb 2016

Posts: 35
#7

29 Jan 2022, 03:50

Thank you very much Hong Il Yoo for your reply. The choice set varies among players, and for each player the choice set varies from week to week (ie. I have a panel data structure). So there are around 300 tournaments distributed among weeks (that's why the alternatives are unlabeled).
I am not sure I totally understand your suggestion: what I should do is: 1) estimate the model as I did; 2) store the results using

Code:

estimates save model

3) build a new dataset using the same player's characteristics and adding 3 labeled alternatives that are 3 different tournaments with 3 different levels for all players.
4) Reload my previous estimation using

Code:

estimates use model

5) apply a post estimation analysis, for instance

Code:

margins, at(RANKING=(1(20)200)) pr(out(1))

This is what I've done, however Stata tells me that the "alternative variable not found; the current estimation results do not identify the alternative variable in a way that margins understands"

Where is my mistake?

Thank you!
Andrea

Last edited by Andrea Baldin; 29 Jan 2022, 04:21.
Comment

Announcement