Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Make margins postestimation run faster? Manual calculation for margins and 95% CI?

    I am running mixed effects/multilevel analyses on a very complex, very large dataset and I've noticed that margins is taking a very, very long time to run and is keeping me from getting timely results.

    My question for you all: do you know of a way of storing estimates and calculating margins later? do you know of a way of more quickly calculating predicted probabilities and their standard errors?

    Thank you!

  • #2
    Read -help estimates store- and -help estimates save- for ways to store regression results in memory or in a file.

    If you just want the margins themselves and don't need their standard errors, the -nose- option in the -margins- command will save you a lot of time. If you need the standard errors, then you're pretty much stock. On a large data set with a complicated model this is an extremely numerically intensive calculation. A faster computer, or perhaps one with more RAM might help, or more cores if you are running Stata MP. But basically, short of an investment in new computing infrastructure, it is what it is. I recommend patience and finding something else to do while this is cranking away.

    Comment


    • #3
      Shouting at Stata -- or in an open office or other shared space muttering furiously -- to encourage its efforts is also a possible solution. It doesn't ever detectably work but it can make you feel better.

      A geographical computing text of my youth advised repeatedly "Go for a coffee" in the face of time-consuming calculations. Knowing the author quite well I can say that at conferences and seminars that it was evident that he followed his own advice, as he often seemed over-excited.

      Comment


      • #4
        I'm big on the nose option, especially if this isn't necessarily your final model. I'm primarily interested in the significance levels from the original model, not the marginal effects.

        I don't like it as well, but I wonder if the atmeans option will cause margins to run more quickly.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 18.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          @Nick Cox: Ah the days when you had to depose your IBM punched cards at the computer centre, or have it picked up by a van, and wait till the next day for the results, only to discover that you had mis-typed an instruction. Made you go through your command several times before punching the cards. This habit of double and triple checking before executing a set of commands has disappeared.

          Comment


          • #6
            Recently, instead of getting a coffee or crying out loud, I relied on a different strategy so as to perform Bayesian analysis in a 3-level 'mixed" design and get the 95% CRIs. Before sleeping, I typed the command for each model on a daily basis. Then, waking up the day after just 15 minutes earlier, I grabbed the computer, checked the output, saved the results and did the postestimations. It took 5 days to reach the best model, but I led the computer to do the hard work while I was sleeping. No complaints.
            Best regards,

            Marcos

            Comment


            • #7
              I have Unix machines off in the distance I can send jobs to. Also I have more than one computer, and can just let another one do it -- even if it is slower it keeps my main machine free.

              People have different definitions of what a very very long time is. For some it means a week. For others it is a half hour.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 18.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                One slightly different variation on the theme of "have a coffee" and "do something else while waiting" is to use a computing cluster from Amazon or somewhere. Compute time these days is quite cheap and you can access metal that you wouldn't otherwise be able to. This of course works if you only need it for occasional tasks, otherwise it may be worth investing a better computer if waiting is unbearable.

                Comment


                • #9
                  If you have many cores in your machine, you can also run more than one invocation of Stata at a time. You have to work on keeping then editors and output windows straight.

                  Comment


                  • #10
                    Thanks, everyone! This is very helpful advice. I do need the 95% CIs, so I won't use nose, but I appreciate the suggestions.

                    Running margins after all the models have run seems like a helpful way to go about this:

                    estimates store [model1]
                    estimates restore [model1]
                    margins i.variable

                    Leonardo Guizzetti I am using a computing cluster -- I have 1.5 million observations and many iterative models to run, however. I can't imagine how much more annoying this process would be without it.
                    Phil Bromiley: yes - keeping the editors and output windows straight is an endeavor!

                    Comment


                    • #11
                      Richard Williams in #7, you say that you are interested in std errors from the original model, not margins. I am using eststo after margins. But if I run nose option, then I don't get the significance stars in -esttab-. Can I get the p-values from the original model and the marginal effects from -margins-?

                      Comment


                      • #12
                        Originally posted by Parul Gupta View Post
                        Richard Williams in #7, you say that you are interested in std errors from the original model, not margins. I am using eststo after margins. But if I run nose option, then I don't get the significance stars in -esttab-. Can I get the p-values from the original model and the marginal effects from -margins-?
                        As far as I know, you need to NOT use the nose option. Let it run overnight (or a week or a month!) if absolutely necessary. But also consider whether you really need such information after margins. I never report it, but I'm sure others do.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 18.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          I'm going to pile on to what Richard Williams has said, and say it more emphatically.

                          First we need to be clear on what you are estimating with -margins-. If you are using it to estimate marginal effects (that is, you are using the -dydx()- option) then you should definitely get the standard errors so that you can also get confidence intervals and test statistics. While I think confidence intervals are more useful than p-values, and certainly more informative than significance stars, you definitely want estimates of uncertainty for these, and if you have to wait a long time for them, so be it. Avoid -nose- in this situation.

                          But if you are estimating predicted margins, p-values are almost invariably pointless in this situation. Remember what a predicted margin is: it's the expected value of the outcome (conditional on whatever you specify in the -margins- varlist and -at()- and -over()- options). There are extremely few circumstances in real world research where there is any interest at all in the null hypothesis that the expected value of the outcome is zero. One might have interest in testing a null hypothesis that the difference between outcomes is zero--but that puts you in the marginal effects context (or with -pwcompare- or -contrast-). For predictive margins, no, the null hypothesis that it is zero is almost never of any interest. So, if you are calculating predictive margins, unless you can tell a convincing story about why anybody would want to test whether an expected outcome is zero, there is no need for p-values and you can happily use the -nose- option to avoid the extra time required to compute them.

                          Comment


                          • #14
                            Interesting points, Clyde. I agree that a test of whether the predicted marginal effect is 0 is not very meaningful. The thing that worries me about not producing SEs from predicted margins is when someone goes on to plot those margins. Generally speaking, the standard error bars give you a sense of where your data is sparse because they tend to get larger/wider in those regions of the data. With the nose option, you lose that and might feel equally confident in the divergence of the predicted margins no matter how much data you have. Ideally, people know the strengths and weaknesses of their sample vis-a-vis the variables in their model and margins call, but we all know this isn't an ideal world. This is not a statistical argument for keeping the SEs, and instead is kind of a pragmatic one. I'm curious what you think about that.

                            Some example code based on Rich Williams' excellent Stata Journal article on margins gives a sense of what I mean if you look at the marginsplot.
                            Code:
                            webuse nhanes2f, clear
                            keep if !missing(diabetes, black, female, age2, agegrp)
                            logit diabetes i.black i.female##c.age, nolog
                            *Predicted margins across a range of ages
                            margins female, at(age=(20(10)70))
                            marginsplot
                            *Marginal effect across the same age range
                            margins, dydx(female) at(age=(20(10)70))
                            marginsplot, ylab(0(.05).15)

                            Comment


                            • #15
                              Actually, I agree with Erik Ruzek . I cannot think of a single instance in my own work over several decades where I have presented predictive margins without their standard errors or confidence intervals. I think presentations of random results should always be accompanied by some measure of their uncertainty.

                              Rather, I was responding to
                              But if I run nose option, then I don't get the significance stars in -esttab-. Can I get the p-values from the original model and the marginal effects from -margins-?
                              From this, I inferred that O.P. is not interested in the uncertainty of the marginal effects but just wants to do some statistical tests so she can decorate her results with significance stars. (I really do think the use of significance stars should be a felony, but I'll spare that rant for now.) So I wanted to emphasize the point that testing the null hypothesis that some predictive margin is zero is seldom, if ever, a sensible thing to do, and if that is all she cares about, then certainly she should go ahead and use -nose-. I didn't intend to appear to endorse the idea that the standard errors are not important in their own right, although I suspect that is what O.P. thinks. For that matter, probably so do many others, whose atrocious statistical training has mis-taught them that statistical significance is the end-all and be-all of data analysis.

                              Comment

                              Working...
                              X