Multinomial logistic model mlogit for panel data

Jantje Beton

Join Date: Jun 2015

Posts: 20
#1

Multinomial logistic model mlogit for panel data

31 Dec 2017, 09:25

Hi all,

I am performing a multinomial logistic regression for 402 regions for a time period from 2000 to 2014 by using Stata 15.1
A variable I am using has missing data, therefore multiple imputation (mi) is used before the mlogit command.
After the regression Statas number of observations output is over 20,000, although I expected this to be 402. Does anyone know how this is possible?

I ran the mlogit command in stata in combination with mi. mi estimate: mlogit etc.
(I did set the data to mlong in order to calculate missing data points)

So what I thinkg might be wrong is:
- the set up of the panel data
- maybe more missing data
- the calculation of the missing data
- the command to perform the mlogit regression

I hope this is clear. Please let me know if more details are necessary.

Happy new year for everyone!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#2

31 Dec 2017, 09:55

So what I thinkg might be wrong is:
- the set up of the panel data
- maybe more missing data
- the calculation of the missing data
- the command to perform the mlogit regression

Indeed. And there could be other possibilities as well.

But you give only a general description of these. In programming there are no small details. Without seeing an example of the actual data, the actual full and exact code and the actual exact output you got from Stata, it is anybody's guess what's going on.

Please read the entire FAQ for excellent advice about how to post effectively on this Forum. Pay particular attention to #12 which gives instructions on the best way to show example data, code, and output that makes them readable and usable by those who would like to try to help.
Comment
Jantje Beton

Join Date: Jun 2015

Posts: 20
#3

31 Dec 2017, 10:45

Thanks Clyde.

I am currently trying to perform the analysis again and providing details later.

The number of observations in my case are: 402 x 15 year = 6,030.
I needed to exclude some observations, therefore the total number of observations are 5,970.

Last edited by Jantje Beton; 31 Dec 2017, 11:26.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17676

01 Jan 2018, 07:40

Jantje:
Clyde gave excellent advice.
As far as your expected number of observation is concerned, I suspect that you mistook observations for groups, as you can see from the following toy-example (with -xtreg, though):

Code:

. xtreg ln_wage i.collgrad tenure

Random-effects GLS regression                   Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-sq:                                           Obs per group:
     within  = 0.0972                                         min =          1
     between = 0.3206                                         avg =        6.0
     overall = 0.2334                                         max =         15

                                                Wald chi2(2)      =    4720.39
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  1.collgrad |   .4203783   .0124391    33.79   0.000      .395998    .4447586
      tenure |   .0367065   .0006384    57.50   0.000     .0354553    .0379577
       _cons |   1.474926   .0057781   255.26   0.000     1.463601    1.486251
-------------+----------------------------------------------------------------
     sigma_u |  .30379522
     sigma_e |  .30357621
         rho |  .50036059   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Kind regards,
Carlo
(Stata 19.0)

Comment

Jantje Beton

Join Date: Jun 2015

Posts: 20
#5

07 Jan 2018, 04:50

I discovered my mistakes. It appeared that another variable had missing data, only a very small number. I fixed that by using mulitple imputation for both variables. After which the number of observations were correct again, thus for all the included regions over the entire period.

Another issue I experienced was the fact that by using mi estimate, the Pseudo R² is not given. I read that this is for a specific reason, and the goodness of fit for a model with imputed data has less meaning. Is there another way to determine the goodness of fit of a multinomial logit model with imputed data?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#6

07 Jan 2018, 10:06

Post-estimation following multiple imputation estimates is not all that fully developed. I'm not sure if this is because the theoretical foundations are not there, or whether it is due to technical problems of implementation. In addition, putting aside any questions relating to multiple imputation, I am not a fan of goodness-of-fit statistics of any kind because I think that model fit is too rich and complicated to be reduced to a single number.

My general approach to model fit is to graphically explore the agreement between model predictions and the data. The one standard goodness of fit approach that I like, because it fits my general approach here, is the Hosmer-Lemeshow procedure. I tend to disregard the summary chi square statistic, but I love seeing the table of observed and expected event counts by decile of predicted risk. (In large data sets I use more than 10 groups, and in small data sets I sometimes use fewer.) While this is only implemented directly in Stata following logistic regression, it is not hard to "roll your own" following multinomial logistic regression. You just calculate the predicted probability of each outcome level for each observation using -predict- (or, when MI has been involved, -mi predict-*) Then group the data into interesting subsets defined by range of predicted outcome probabilities--deciles are often quite convenient for this purpose. Then use -collapse- to tally up the total number of observed and predicted outcomes in each such group, and then graph those results. The advantage of this approach is that not only can you get a sense of how well calibrated the model is overall, you can see greater detail. You can, for example, note that perhaps your model works rather well when it predicts medium levels of risk, but not so well at the extremes. Or perhaps it is undershooting at one end of the data and overshooting at the other. These observations not only enable you to appraise the usefulness of your model for present purposes, but can also stimulate your thinking about ways to improve the model.

*-mi predict- does not, so far as I know, calculate predicted probabilities following a logistic or multinomial logistic MI estimated model. It only calculates -xb-. So you have to calculate invlogit(xb) to get the predicted probabilities.
1 like
Comment
Linh Nguyen

Join Date: Nov 2017

Posts: 85
#7

23 Jul 2018, 07:13

Jantje Beton

Could you please show your code of the multinomial logistic model mlogit for panel data? Is it something like this (without mi.):

mlogit depVar indepVars, b(0)

--------------------
(Stata 15.1 MP)
Comment

Announcement

Multinomial logistic model mlogit for panel data

Comment

Comment

Comment

Comment

Comment

Comment