Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multinomial Logistic Regression Taking hour+

    Hello, I'm running a fairly simple multinomial logistic regression with a sort of categorical/ordinal mixed variable (composite of multiple categorical), on a number of dummy variables that represent risk-factors/comorbidities and other factors, there are about 350 observations for each variable and it is taking quite a while to complete these regressions. Is this normal in Stata? I'm using stata 16 SE on a reasonably powerful computer.

  • #2
    Welcome to Stata list. You will increase your chances of useful answer by following the FAQ on asking questions-provide Stata code in code delimiters, readable Stata output, and sample dummies using dataex.

    This seems like an extremely long time to estimate such model. It could be that you have an extremely high number of right-hand side variables or that your dependent variable takes on an extremely large number of potential values.

    I have found that fixed effects multinomial logits take a long time particularly if the dependent variable as a great number of categories but I have not found this with conventional multinomial logit. Is it possible that the model is having trouble converging? Even a relatively straightforward model if it is such that the maximum likelihood can't converge can sit forever.

    By the way, note that the issue is not how many observations you have for each variable, it is how many observations have the data on all of the variables. Like many statistical packages, Stata generally drops an observation if there is a missing value on any of the variables being used.

    Comment


    • #3
      I agree with Phil Bromiley It's hard to advise concretely, but some classic broad advice:

      1. Try a much simpler model, especially with predictors that you know or think are obviously important. It's likely to fit very fast. You may then build up and try adding various predictors.

      2. The ratio of observations that are usable to parameters should be pretty high.

      3. Look more closely at the data. With categorical data (wide sense) it can be hard to spot outliers and graphs may be hard to interpret. tab1 can show you variables that will be hard to use because only a few observations are available for one or other category.. Similarly summarize may show e.g. means near 0 or near 1 for binary indicators.
      Last edited by Nick Cox; 28 Apr 2020, 11:22.

      Comment


      • #4
        Thank you both Nick Cox and Phil Bromiley, I ended up realizing that I had put an incorrect column with far too many right handed variables and it was taking forever based on that fact.

        Comment

        Working...
        X