Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multinomial logit model with random effects

    Dear all,

    I am working on the data of traffic violations recorded in one year. as shown in the table below, every row in the database is related to a unique traffic violation. there is 267,000 traffic violations recorded in this database, so the id is from 1 to 267,000. First colomn indicates the type of traffic violation that was occurred (3 types of traffic violations is provided in this database). Other three characteristics of every violations that we have in this database are related to the driver that committed that violation. For example, the first row indicated that a seat belt violation occurred and we know a driver is 37 years old, is man and his education level is 4 committed that violation. I tried to provide a Multinomial logit model to identify the impact of age, sex and education levels on the drivers' violations type. For example, I want to report that by increasing age of driver, the probability to commit speeding violation will decrease. I have 3 question:
    1: Did I follow a appropriate process and was the multinomial logit suitable for this purpose?
    2: I want to consider the impact of unobserved variables that may affect the response variable (choosing between 3 violation types). For this purpose, Should I provide a mixed multinomial logit model? or other models? what will be the code for that model in stata?
    3: For providing random effects multinomial logit model , Is it possible to provide this type of model to a data that is not panel data ( for example the data I use includes just one year traffic violations)? if the answer is yes, what is the code in Stata?
    id violation type driver's age
    driver's sex(dummy)
    driver's educations level (1 to 8)
    1 seat belt 37 1 4
    2 speeding 19 1 7
    3 seat belt 24 1 2
    4 using mobile phone 30 0 5
    5 speeding 28 1 5
    Tags: None

  • #2
    If you have a substantial number of drivers who have more than one entry in this data base, then it is possible to use a random effects model. There is no official Stata command for random effects multinomial logistic regression. However, you can emulate it with -gsem-. You don't show real example data, just a suggestive tableau, so I can't give you specifics. But the code will look something like this:

    Code:
    gsem (violation_type <- age i.sex education L1[id])
    (Not clear if you want to treat education as discrete or continuous. If discrete, use i.education.) If you are not familiar with factor-variable notation, read -help fvvarlist-.
    But be warned that these models are very finicky and require strong data to reach convergence. If you don't have enough id's that have more than one observation in the data, it will be very difficult or impossible to estimate the random effects. And if there are any violation types that occur only rarely, that can cause convergence difficulties as well. Good luck!

    Comment


    • #3
      Stata 17 did add xtmlogit, which can handle both fixed effects and random effect mlogit.

      If you have an older version of Stata, you can use the user-written femlogit (fixed effects mlogit), which is described at

      https://journals.sagepub.com/doi/pdf...867X1401400409

      Also, like Clyde says, you can use gsem. For more detail, see

      https://www.stata.com/stata-news/news29-2/xtmlogit/
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 18.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Richard Williams Thanks for calling that to my attention. Somehow I missed that when I read the whatsnew after I installed Stata 17.

        Comment


        • #5
          @Clyde Schechter Thank you so much

          Excuse me for not showing the real example of the data. Because I have change the data in many forms to use for different models, it was difficult to show all the tables and explain them and it took lots of time. but in general, the raw data that I have, Is exactly like the table I showed. I checked it and I find that 69 of drivers had just one observation (they have committed just one traffic violation in that year), And 31 percent of drivers had more than one observation ( for example one driver that has committed one speeding violation and one seat belt violation in that year). Is it possible to use random effects multinomial logit?

          I used gsem code that you mentioned but I received this error:
          invalid path specification;
          Violation type may not be the destination of a path from L1[id]
          what should I do to fix this error?

          before asking you for the better code for random effects multinomial logit, I used cmxtmixlogit and the AIC and BIC for the model provided by cmxtmixlogit were much smaller than multinomial logit with mlogit code. Does is mean than the better model fitted to my data is the model provided with cmxtmixlogit ? the codes I used to provide these two models are:

          cmxtmixlogit violation_type , casevars(age i.sex education)

          mlogit violation_type age i.sex education

          does these models that I tried on my data, are correct?

          Comment


          • #6
            @Richard Williams Thank you so much

            Comment


            • #7
              Violation type may not be the destination of a path from L1[id]
              I cannot imagine any reason why you would get this message unless violationtype is a string variable. (Since you don't use -dataex-, I have no way to know what type your variables are: that's why it's so critical to always use -dataex- to show example data.) The use of -gsem- requires all the variable to be numeric. So if violation type is not numeric, then you need to -encode- it. But if violation type is not numeric, I cannot see how you could have gotten anywhere with any of the other commands you have used.

              There is, however, an error in the code, it should have been -gsem (violation_type <- age i.sex education L1[id], mlogit)- to do a multinomial regression. Without that it's fitting a linear model and treating the outcome as continuous.

              In any case now that Richard Williams has reminded us that Stata now contains an -xtmlogit- command, you should use that in preference to -gsem- if you are running version 17.

              I used cmxtmixlogit and the AIC and BIC for the model provided by cmxtmixlogit were much smaller than multinomial logit with mlogit code. Does is mean than the better model fitted to my data is the model provided with cmxtmixlogit ?
              In theory, yes. In reality, perhaps not. The problem is that sometimes the people who code maximum likelihood estimation will remove constant scale factors or drop constant terms from the likelihood in order to speed up computation. When we are comparing the AIC or BIC of different models estimated with the same command, we can be sure that was done the same way. But when we are comparing models across commands, the AICs and BICs may not be comparable if the log likelihood of either has been rescaled. Even among the Stata official estimation commands, this is sometimes an issue between different estimation commands. I am completely unfamiliar with -cmxtmixlogit-, so I don't know how to advise you on this.

              Comment


              • #8
                Hi @Richard Williams @Clyde Schechter, I have 40N30T panel data and non-ordered categorical dep. var. Do I estimate paralel regression assumption.? How we estimate after xtmlogit.

                Comment

                Working...
                X