Help needed to estimate a Panel (Longitudinal) Multinomial Logit Model

Tiaga Falcao

Join Date: Oct 2020

Posts: 16
#1

Help needed to estimate a Panel (Longitudinal) Multinomial Logit Model

17 Oct 2020, 13:53

Dear Stata Users,

Before explaining my problem, I should say that I have tried Stata's femlogit, cmxtmixlogit functions, and also gsem workaround as an alternative for the nonexistence xtmlogit function which was explained in this post: https://www.stata.com/stata-news/news29-2/xtmlogit/. I also tried SAS procedures like GENMOD, GEE, and GLIMMIX. Unfortunately, each of these functions has limitations and either not converging or not producing "goodness-of-fit" measures. So, I appreciate your insights and help on how to approach the following problem:

My data is monthly agent-level observations to study peer influence on agents' decision outcomes (for example three unordered/categorical outcomes like "A", "B", "C") that happens every month periodically, which is not the same as time-to-event observations in survival analysis. The data is unbalanced since agents' data are not available for all months of the panel, due to their start and ending dates of employment. Also, I only have values for independent variables of observed outcome (in other words it's different from choice models analysis where we have the values of independent variables for both selected alternative and un-selected alternatives). You may find a sample of my data in the attachment. The generic format of the model is as follows:

Y_it= alpha + a_itY_it-1 + b_itX_it + c_itG^X_{it,outcome_n} + d_itG^Y_{it,outcome_n} + f_t

where,

Y_it is agent i's decision outcome at time t (multinomial variable, more than two outcomes)

Y_it-1 is agent i's previous decision outcome (Granger causality)

X_it is an agent-specific exogenous or contextual variable at time t (this variable is continuous but can be binary or categorical too)

G^X_{it,outcome_n} is a decision-specific variable and equals to variable X's average of peers who have the same decision as agent i in Y_it [for example if agent i's decision at time t is "outcome2", we use members from the reference group that have same "outcome2" decision] and calculated by (sum of X of peers with the same decision as in Y_it)/(agent i's group size at time t)

G^Y_{it,outcome_n} is a decision-specific variable and equals to the average of decisions that are the same as what we observed in Y_it and calculated by (number of similar outcomes in the agent i's reference group at time t)/(agent i's group size at time t)

This is basically "linear-in-mean model + Granger causality". Assuming that we are allowed to ignore inherited linear-in-mean models' identification and reflection issues (Manski 1993), is there any efficient and usable command/functions in Stata to estimate the following parameters as well as generating "goodness-of-fit" measures:

alpha: constant factor
a: coefficient for Granger causality
b: coefficient vector for the agent's exogenous variable (X)
c: coefficient vector for the group's average contextual variable (G^X)
d: coefficient for group's decision effect (G^Y)
f: time (or month) fixed effect

Thanking you in advance.
Attached Files

sample_unbalaced_xt_mlogit.csv (72.4 KB, 1 view)
Tags: categorical, fixed effects, logit, panel data, unbalanced
Tiaga Falcao

Join Date: Oct 2020

Posts: 16
#2

18 Oct 2020, 00:36

@Joro Kolev Dear Professor Kolev, I appreciate it if you please share your insight/solution to this problem of mine.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#3

18 Oct 2020, 11:56

Tiaga: Putting in a lagged dependent variable is not "Granger causality." It's when you add lags of other variables along with the lagged DV that allows you to test for Granger causality. But that's semantics.

You need to read about dynamic panel data models with heterogeneity. You can't just put in a lagged DV and use something like xtprobit or xtoprobit or, in your case, a multinomial version of those. You need to deal with the so-called initial conditions problem. I discuss one way to do this in my 2005 Journal of Applied Econometrics paper.

More generally, just because a model allows for unobserved heterogeneity does not mean it does so in a useful way. There is much confusion about this point. Going from probit to xtprobit, for example, is not allowing for the unobserved heterogeneity to be correlated with x(i,t). xtprobit simply produces estimators more efficient than probit if all of the ideal assumptions for xtprobit hold.

In your case, there must be correlation between y(i,t-1) and the heterogeneity, but you're looking at methods that assume there is none. As I said, you need to read up on methods that allow a lagged y and heterogeneity. Chapter 13 of MIT Press book contains a general discussion and then later chapters cover special cases.

If you just want a dynamic model without accounting for heterogeneity -- something one would do for prediction -- just use melogit. You don't even have to cluster the standard errors because, presumably, you'd want to assume you have the dynamics correctly specified.

I hope this helps a little.

JW
1 like
Comment
Tiaga Falcao

Join Date: Oct 2020

Posts: 16
#4

18 Oct 2020, 14:48

Thank you Professor Jeff Wooldridge for the explanation. I will correct my writing for misusing granger causality. If I may explain the problem more, it may explain why I used the term.

without Y(i,t-1), the model is called "Linear-in-means model" in social interaction studies which according to Manski (1993) cannot be point estimated due to identification and reflection issues. Most of the literature uses this model with continuous DV on a single panel data followed by IV and 2SLS estimation method. However, this paper uses it including Y(i,t-1) https://onlinelibrary.wiley.com/doi/...111/jofi.12094 . I found my model to be similar to the paper but with DV being a categorical variable, but including Y(i,t-1) for a different reason.

The reason why I use Y(i,t-1) and called it granger causality is that:

Agent (i) makes decision at the end of each month after observing his/her peers' decisions during the month and before making the decision, therefore I could ignore the reflection issue of peer-effects studies for now due to time differences between decisions. Although, agent's peers don't know about the agent's decision at (t) however they may be influenced by the agent's previous decision Y(i,t-1). So, I included Y(i,t-1) in the model to estimate and simultaneously measure the correlation between Y(i,t-1) and GY(i,t) in a Random Effects fashion. In other words, taking care of the reflection issue is a bigger concern than heterogeneity here.

The closest Stata function to estimate this model is cmxtmixlogit but it needs measures for both selected option and un-selected options each time that an option is selected. Unfortunately, my data has the values for the selected option only so I couldn't use it.

Last edited by Tiaga Falcao; 18 Oct 2020, 15:18.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#5

18 Oct 2020, 15:19

My suggestion is to ignore heterogeneity and use mlogit. You’ll control for lagged Y and estimate the peer effects. It’s very hard to allow all three features.
1 like
Comment
Tiaga Falcao

Join Date: Oct 2020

Posts: 16
#6

18 Oct 2020, 16:23

Jeff Wooldridge Thank you for the comments, really appreciated. I will try mlogit and at the same time read your paper and book chapters to get a better understanding of these types of models.
Comment

Announcement

Help needed to estimate a Panel (Longitudinal) Multinomial Logit Model

Comment

Comment

Comment

Comment

Comment