Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choice model margins problem: margins after cmclogit producing _outcome#alt

    Dear all,

    I am using data from a choice experiment with an unbalanced panel of 2-3 alternatives per choice set, and either 6 or 9 choice sets per respondent. I am using Stata 16 and primarily the commands - cmclogit - and - margins -. This relates to transport mode choices.

    I am trying to obtain the marginal effect on the probability of a given alternative being chosen. My primary approach has been to use - cmclogit -, not using the - casevars() - option but instead directly interacting the case vars with an alternative dummy to avoid overloading the model - for a number of variables I only want to estimate the effect on one alternative, eg. impact of car type on car use (relative to the base alternative), not also on the other alternative.

    After running the model, using margins produces output that I cannot understand and I have been unable to find explanation for or clarification of in the manuals or on statalist. Specifically, instead of producing the marginal effect on the probability of each mode/alternative, it produces interactions of the alternatives _outcome#alt. What is this? Why am I unable to get just _outcome? The interactions make no sense - eg. what does the estimate mean for PT#CR. And then perhaps another issue is that it only gives margins estimations for some of these, not all (see below). Adding the - at() - option doesn't help in this case.

    Can anyone help me understand what is going on here? Or what I can do to obtain margins estimates for the alternatives?


    A brief data extract is shown here for one respondent:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int id byte alt float(rep chosen) byte type float(totcost tottime) byte region float(hh_size romand commute ln_comm_dist biosph_import car_inhh)
    7 1 1 0 1   23  46 3 1 0 1 3.3322046 1 1
    7 2 1 1 1  2.6  18 3 1 0 1 3.3322046 1 1
    7 1 2 0 1   23  23 3 1 0 1 3.3322046 1 1
    7 2 2 1 1  2.6  53 3 1 0 1 3.3322046 1 1
    7 1 3 0 1   23  69 3 1 0 1 3.3322046 1 1
    7 2 3 1 1  2.6  18 3 1 0 1 3.3322046 1 1
    7 1 1 0 2  6.8  10 3 1 0 0 3.3322046 1 1
    7 2 1 1 2 .312  21 3 1 0 0 3.3322046 1 1
    7 3 1 0 2    0  60 3 1 0 0 3.3322046 1 1
    7 1 2 0 2  6.8  30 3 1 0 0 3.3322046 1 1
    7 2 2 1 2 .312  21 3 1 0 0 3.3322046 1 1
    7 3 2 0 2    0  20 3 1 0 0 3.3322046 1 1
    7 1 3 0 2  6.8  20 3 1 0 0 3.3322046 1 1
    7 2 3 1 2 .312  21 3 1 0 0 3.3322046 1 1
    7 3 3 0 2    0  60 3 1 0 0 3.3322046 1 1
    7 1 1 0 3   31  81 3 1 0 0 3.3322046 1 1
    7 2 1 1 3 6.24  35 3 1 0 0 3.3322046 1 1
    7 1 2 0 3   31  41 3 1 0 0 3.3322046 1 1
    7 2 2 1 3 6.24 105 3 1 0 0 3.3322046 1 1
    7 1 3 0 3   31 122 3 1 0 0 3.3322046 1 1
    7 2 3 1 3 6.24  35 3 1 0 0 3.3322046 1 1
    end
    label values alt choice_lab
    label def choice_lab 1 "PT", modify
    label def choice_lab 2 "CR", modify
    label def choice_lab 3 "BF", modify
    label values type type_lab
    label def type_lab 1 "Commute", modify
    label def type_lab 2 "Leisure", modify
    label def type_lab 3 "Weekend", modify
    label values region region_lab
    label def region_lab 3 "Countryside", modify
    A modified (shortened) example of my code is below, where CR and BF are dummies for those two alternatives (the base alternative being PT):
    Code:
    cmclogit chosen totcost tottime CR BF c.CR#i.type c.BF#i.type c.CR#ib1.region c.BF#ib1.region ///
             c.CR#ib2.hh_size c.BF#ib2.hh_size c.CR#i.romand c.BF#i.romand c.CR#c.commute#c.ln_comm_dist  ///
             c.BF#c.commute#c.ln_comm_dist c.CR#c.biosph_import c.BF#c.biosph_import c.CR#car_inhh c.BF#car_inhh ///
             , vce(cl id2) nocons allbaselevels
    Using the simplest version of margins I can as an example, I write:
    Code:
    margins , dydx( region) vce(unconditional) outcome(, altsubpop) 
    
    Average marginal effects                        Number of obs     =     19,275
    
    Expression   : Pr(alt|1 selected), predict()
    dy/dx w.r.t. : 2.region 3.region
    
                                          (Std. Err. adjusted for 2,604 clusters in id2)
    ------------------------------------------------------------------------------------
                       |            Unconditional
                       |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------------+----------------------------------------------------------------
    1.region  |  (base outcome)
    -------------------+----------------------------------------------------------------
    2.region  |
          _outcome#alt |
                PT#PT  |          0  (empty)
                PT#CR  |   -.068618   .0130747    -5.25   0.000     -.094244    -.042992
                PT#BF  |  -.0123124   .0077902    -1.58   0.114    -.0275809     .002956
                CR#PT  |          0  (empty)
                CR#CR  |          0   .0006932     0.00   1.000    -.0013586    .0013586
                CR#BF  |          0   .0002497     0.00   1.000    -.0004893    .0004893
                BF#PT  |          0  (empty)
                BF#CR  |          0     .00075     0.00   1.000    -.0014699    .0014699
                BF#BF  |          0   .0002425     0.00   1.000    -.0004752    .0004752
    -------------------+----------------------------------------------------------------
    3.region  |
          _outcome#alt |
                PT#PT  |          0  (empty)
                PT#CR  |  -.0611379   .0145159    -4.21   0.000    -.0895886   -.0326872
                PT#BF  |  -.0249792   .0092992    -2.69   0.007    -.0432053   -.0067531
                CR#PT  |          0  (empty)
                CR#CR  |          0   .0006148     0.00   1.000     -.001205     .001205
                CR#BF  |          0   .0005097     0.00   1.000     -.000999     .000999
                BF#PT  |          0  (empty)
                BF#CR  |          0   .0006668     0.00   1.000    -.0013069    .0013069
                BF#BF  |          0   .0004572     0.00   1.000     -.000896     .000896
    ------------------------------------------------------------------------------------
    Note: dy/dx for factor levels is the discrete change from the base level.

  • #2
    Hi Jeremy,

    First, there should be no need to manually include all these alternatives dummies/interactions as cmclogit will do this for you. For example, say we have the following repeated choice dataset and cmclogit specification (with the data cmset as panel data):
    Code:
    . webuse transport
    (Transportation choice data)
    
    . cmset id t alt
    panel data: panels id and time t
    note: case identifier _caseid generated from id t
    note: panel by alternatives identifier _panelaltid generated from id alt
    
                         caseid variable:  _caseid
                   alternatives variable:  alt
          panel by alternatives variable:  _panelaltid (strongly balanced)
                           time variable:  t, 1 to 3
                                   delta:  1 unit
    
    note: data have been xtset
    
    . tab alt, gen(alt__)
    
    Alternative |
              s |      Freq.     Percent        Cum.
    ------------+-----------------------------------
            Car |      1,500       25.00       25.00
         Public |      1,500       25.00       50.00
        Bicycle |      1,500       25.00       75.00
           Walk |      1,500       25.00      100.00
    ------------+-----------------------------------
          Total |      6,000      100.00
    
    . cmclogit choice alt__2 alt__3 alt__4 c.alt__2#i.alt c.alt__3#i.alt ///
    >          c.alt__4#i.alt trcost, nocons
    note: data were cmset as panel data, and the default vcetype for panel data
          is vce(cluster id); see cmclogit
    note: 2.alt#c.alt__2 omitted because of collinearity
    note: 3.alt#c.alt__2 omitted because of collinearity
    note: 4.alt#c.alt__2 omitted because of collinearity
    note: 2.alt#c.alt__3 omitted because of collinearity
    note: 3.alt#c.alt__3 omitted because of collinearity
    note: 4.alt#c.alt__3 omitted because of collinearity
    note: 2.alt#c.alt__4 omitted because of collinearity
    note: 3.alt#c.alt__4 omitted because of collinearity
    note: 4.alt#c.alt__4 omitted because of collinearity
    
    Iteration 0:   log pseudolikelihood = -1352.3329  
    Iteration 1:   log pseudolikelihood = -1253.3514  
    Iteration 2:   log pseudolikelihood = -1246.1403  
    Iteration 3:   log pseudolikelihood = -1246.1335  
    Iteration 4:   log pseudolikelihood = -1246.1335  
    
    Conditional logit choice model                 Number of obs      =      6,000
    Case ID variable: _caseid                      Number of cases    =       1500
    
    Alternatives variable: alt                     Alts per case: min =          4
                                                                  avg =        4.0
                                                                  max =          4
    
                                                      Wald chi2(4)    =     822.66
    Log pseudolikelihood = -1246.1335                 Prob > chi2     =     0.0000
    
                                       (Std. Err. adjusted for 500 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
          choice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    alt          |
          alt__2 |  -2.585322   .1108936   -23.31   0.000    -2.802669   -2.367974
          alt__3 |  -3.770022   .1402095   -26.89   0.000    -4.044828   -3.495217
          alt__4 |  -4.478327   .1878525   -23.84   0.000    -4.846511   -4.110142
                 |
    alt#c.alt__2 |
         Public  |          0  (omitted)
        Bicycle  |          0  (omitted)
           Walk  |          0  (omitted)
                 |
    alt#c.alt__3 |
         Public  |          0  (omitted)
        Bicycle  |          0  (omitted)
           Walk  |          0  (omitted)
                 |
    alt#c.alt__4 |
         Public  |          0  (omitted)
        Bicycle  |          0  (omitted)
           Walk  |          0  (omitted)
                 |
          trcost |  -.6123677   .0354651   -17.27   0.000     -.681878   -.5428574
    ------------------------------------------------------------------------------
    We would get the same results with the following, much simpler specification:
    Code:
    . cmclogit choice trcost
    note: data were cmset as panel data, and the default vcetype for panel data
          is vce(cluster id); see cmclogit
    
    Iteration 0:   log pseudolikelihood = -1352.3329  
    Iteration 1:   log pseudolikelihood = -1253.3514  
    Iteration 2:   log pseudolikelihood = -1246.1403  
    Iteration 3:   log pseudolikelihood = -1246.1335  
    Iteration 4:   log pseudolikelihood = -1246.1335  
    
    Conditional logit choice model                 Number of obs      =      6,000
    Case ID variable: _caseid                      Number of cases    =       1500
    
    Alternatives variable: alt                     Alts per case: min =          4
                                                                  avg =        4.0
                                                                  max =          4
    
                                                      Wald chi2(1)    =     298.14
    Log pseudolikelihood = -1246.1335                 Prob > chi2     =     0.0000
    
                                       (Std. Err. adjusted for 500 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
          choice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    alt          |
          trcost |  -.6123677   .0354651   -17.27   0.000     -.681878   -.5428574
    -------------+----------------------------------------------------------------
    Car          |  (base alternative)
    -------------+----------------------------------------------------------------
    Public       |
           _cons |  -2.585322   .1108936   -23.31   0.000    -2.802669   -2.367974
    -------------+----------------------------------------------------------------
    Bicycle      |
           _cons |  -3.770022   .1402095   -26.89   0.000    -4.044828   -3.495217
    -------------+----------------------------------------------------------------
    Walk         |
           _cons |  -4.478327   .1878525   -23.84   0.000    -4.846511   -4.110142
    ------------------------------------------------------------------------------
    Now, regarding marginal effects: if we wish to compute marginal effects for an alternative-specific continuous covariate, then we can do this by computing a derivative of the predicted probability of some alternative with respect to a covariate from the same or another alternative. This means that we can compute both direct as well as cross marginal effects. Getting back to our example above, the alternatives here are car, public transportation, bicycle, and walking. If we wanted to estimate marginal effects for the price variable trcost, we can take the derivative of the probability of choosing car with respect to a change in car travel cost, or with respect to a change in public transportation cost, or with respect to the cost of any of the remaining two alternatives. And we could do the same for the remaining three alternatives. That is, if we have four alternatives, we can estimate a total of 4x4=16 marginal effects for a single alternative-specific continuous covariate. In case of discrete covariates, the marginal effects are differences between levels rather than derivatives, but the same logic applies. So here, if we simply type margins, dydx(trcost) after the above cmclogit fit, we get all 16 marginal effects:
    Code:
    . margins, dydx(trcost)
    
    Average marginal effects                        Number of obs     =      6,000
    Model VCE    : Robust
    
    Expression   : Pr(alt|1 selected), predict()
    dy/dx w.r.t. : trcost
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    trcost       |
    _outcome#alt |
        Car#Car  |   -.101222   .0036761   -27.54   0.000     -.108427    -.094017
     Car#Public  |   .0480825   .0024267    19.81   0.000     .0433263    .0528388
    Car#Bicycle  |   .0291269    .002192    13.29   0.000     .0248305    .0334232
       Car#Walk  |   .0240126   .0021986    10.92   0.000     .0197035    .0283217
     Public#Car  |   .0480825   .0024267    19.81   0.000     .0433263    .0528388
         Public #|
         Public  |  -.0717259   .0043081   -16.65   0.000    -.0801695   -.0632823
         Public #|
        Bicycle  |   .0128744   .0014888     8.65   0.000     .0099564    .0157924
    Public#Walk  |    .010769   .0013537     7.96   0.000     .0081158    .0134223
    Bicycle#Car  |   .0291269    .002192    13.29   0.000     .0248305    .0334232
        Bicycle #|
         Public  |   .0128744   .0014888     8.65   0.000     .0099564    .0157924
        Bicycle #|
        Bicycle  |  -.0492566   .0042985   -11.46   0.000    -.0576814   -.0408318
        Bicycle #|
           Walk  |   .0072553   .0010918     6.65   0.000     .0051155    .0093952
       Walk#Car  |   .0240126   .0021986    10.92   0.000     .0197035    .0283217
    Walk#Public  |    .010769   .0013537     7.96   0.000     .0081158    .0134223
           Walk #|
        Bicycle  |   .0072553   .0010918     6.65   0.000     .0051155    .0093952
      Walk#Walk  |  -.0420369   .0043062    -9.76   0.000    -.0504769   -.0335969
    ------------------------------------------------------------------------------
    However, we can also target alternatives for specific effects of interest. For example, if our question was how the probability of choosing public transportation is affected by the cost of car travel, then we could use the outcome() and alternative() options of margins like so:
    Code:
    . margins, dydx(trcost) outcome(Public) alternative(Car)
    
    Average marginal effects                        Number of obs     =      6,000
    Model VCE    : Robust
    
    Expression   : Pr(alt|1 selected), predict()
    Alternative  : Car
    Outcome      : Public
    dy/dx w.r.t. : trcost
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    trcost       |
           _cons |   .0480825   .0024267    19.81   0.000     .0433263    .0528388
    ------------------------------------------------------------------------------
    For further information about marginal predictions and effects with discrete choice models, including introductory notes and examples, I suggest to have a look at Stata's [CM] Choice Models manual, especially Intro 1:

    https://www.stata.com/bookstore/choi...erence-manual/

    Another example can be found here:

    https://www.stata.com/stata-news/news35-2/spotlight/

    Now, in regards to your particular results: I can't say much about those as they cannot be reproduced given the information you provided. Please feel free to send your dataset and do-file to [email protected] and we will take a look.

    Best,
    Joerg

    Comment

    Working...
    X