Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Marginal Effects after Mlogit with big data

    Hello Stata Forum,

    I'm running a multinomial logit model with over 16,000,000 observations and 14 different variables. The multinomial logit runs without issue, even though it still takes 30-60 minutes.

    However, in attempting to use margins commands, it takes hours to run. I already resample my data into over 3,000,000 observations, but it seems stata takes hours as well to come up with the results. If anyone has any insight on big data sets, nonlinear probability models, and margins/mfx compute, it would be much appreciated.

    And also is there any way to run margins for each outcome without run the mlogit command first?

    Thanks

  • #2
    If you can live without the standard errors, adding the -nose- option to margins will probably speed things up considerably.

    I'm not sure what you men by " is there any way to run margins for each outcome without run the mlogit command first?" If you mean predict outcome 1, then predict outcome 2, etc. then you only need to run mlogit once. And if you are running Stata 14.2 or later margins will do all the outcomes for you.

    Incidentally, I like to use the spost13 mtable command, especially for things like ologit and mlogit. The output looks tidier, for one thing. For more details, see

    https://www3.nd.edu/~rwilliam/stats3/Margins05.pdf
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Tiffany,

      In addition to what Richard said, -margins- is inherently pretty slow in this application. One obvious way around this is to throw money at the problem - buy Stata MP or add more cores to your existing Stata MP license, get faster processors, etc. But obviously that doesn't help you right now!

      Pardon me if you know this already, but you can use -estimates store- to store models in memory, or -estimates save- to save them to disk. This isn't needed to run a bunch of consecutive -margins- commands on one model. However, if you think of some other way to present margins later on, you can restore the original model estimates later and run -margins-, e.g.

      Code:
      mlogit y x1 x2 x3
      estimates store model1
      margins x1, predict(outcome(1))
      margins x1, predict(outcome(2))
      
      /*You run a bunch more models, then you have an eureka moment*/
      
      estimates restore model1
      margins x1#x2, predict(outcome(1))
      margins x1#x2, predict(outcome(2))
      For commands where I know things will take a long time, I often set them up to run overnight. Again, pardon me if you already knew this.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Tiffany: Are you able to post the summary statistics from your estimation sample (sum, d) ? I have an idea that will work for some data structures but not others and I'll know from your summary stats if it would work for your case.

        Comment

        Working...
        X