Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • gsem is potentially a powerful tool-- it can handle a wide variety to problems very easily. But it's terribly slow, and many times fail to have initial values even though those models can be run using alternative commands.

    Comment


    • Might v17 consider adding some additional –margins– features for –gmm–?

      Specifically, given a gmm command
      Code:
      gmm (residual equation),...
      might it be possible to allow post-estimation options like
      Code:
      margins, dydx(*)
      where the derivatives are taken with respect to the x's in the residual equation identified by the variables(…) option (i.e. what is returned by e(rhs))?

      In some cases this may result in nonsense, but in others it can be informative. E.g. consider
      Code:
       gmm (y-exp({xb: x1 x2 _cons})), var(x1 x2) instr(x1 x2)
      The derivatives of r=y-exp({xb: x1 x2 _cons}) with respect to (x1,x2) where the x's appear explicitly in the residual function, i.e. (d/dx)(-exp(xb)), may imaginably be of interest.

      Note: This is beyond simply dydx(*) with respect to the linear predictor.
      Last edited by John Mullahy; 01 Jan 2021, 11:06. Reason: Edited for clarity.

      Comment


      • Make stable the default for -sort-.

        Not using
        Code:
        sort x y z, stable
        can be a source of major headaches.

        The manual says the following:

        Click image for larger version

Name:	Screen Shot 2021-01-03 at 5.24.51 PM.jpg
Views:	2
Size:	316.1 KB
ID:	1588186


        Which is a bit perplexing. When general processor speed was a major constraint this made sense; and for those with massive datasets there could be a "fast" option that would have all the caveats of a m:m merge?
        Attached Files
        __________________________________________________ __
        Assistant Professor, Department of Biostatistics and Epidemiology
        School of Public Health and Health Sciences
        University of Massachusetts- Amherst

        Comment


        • Originally posted by Andrew Lover View Post
          Make stable the default for -sort-.
          I strongly disagree. The variables you list in the sort command should be both sufficient and necessary to ensure a unique sort order. If the sort order depends on something that is not recorded in your dataset, the dataset is flawed, not the sort command.
          Last edited by daniel klein; 04 Jan 2021, 01:16.

          Comment


          • Or, further to my earlier comment if Stata Corp will not add a suite of functions for efficient operations on matrices (which, really should be easy, yes?) then give us a way to do it by allowing/documenting the writing of plugins for Mata.

            Comment


            • Originally posted by daniel klein View Post

              I strongly disagree. The variables you list in the sort command should be both sufficient and necessary to ensure a unique sort order. If the sort order depends on something that is not recorded in your dataset, the dataset is flawed, not the sort command.
              Absolutely agree, but does that imply that best practice is:

              Code:
              sort x y z _all
              ?

              (Genuinely curious here).
              __________________________________________________ __
              Assistant Professor, Department of Biostatistics and Epidemiology
              School of Public Health and Health Sciences
              University of Massachusetts- Amherst

              Comment


              • Originally posted by Andrew Lover View Post

                Absolutely agree, but does that imply that best practice is:

                Code:
                sort x y z _all
                ?

                No. I guess best practice is

                Code:
                sort all_variables_that_uniquely_identify_the_sort_order_that_I_want_for_whatever_reason
                The point is that if a certain result (not necessarily restricted to estimation) depends on the sort order of the dataset, then all variables that define, i.e., carry substantive meaning for that particular sort order should be spelled out explicitly. We should know why a result depends on a particular sort order and the variables that we list should convey that information. Thus, I would extend my initial suggestion to

                Code:
                sort all_variables_that_uniquely_identify_the_sort_order_that_I_want_for_whatever_reason_and_no_variables_that_are_irrelevant_for_what_I_want

                Edit:

                Of course, others have written about this topic some time ago (e.g., Schumm 2006).


                Schumm, L. P. 2006. Stata tip 28: Precise control of dataset sort order. The Stata Journal, 6(1), pp. 144--146.
                Last edited by daniel klein; 04 Jan 2021, 14:45.

                Comment


                • I think the -mixed- command needs to be optimized to work with random slopes. Although it nominally supports random slopes, in my experience it takes a very long time, or fails to converge, with all but the simplest models. I have found -mixed- to be practically useless for random slope models. -meglm- is no better.

                  The problem of fitting random slope models is computationally tractable. For example, HLM software runs them very quickly. So I'm sure Stata can do better if it makes these models a priority.

                  Comment


                  • Re #413. As one who frequently fits random slopes models, I agree with the importance of this. But I have to say that my experience with -mixed- has been that it runs well and estimates these models quickly, even with large data sets. (That's a huge contrast with, for example, -melogit-, which can take very long times to fit even simple models in large data sets.) I wonder what accounts for the difference in our experiences with this same command.

                    Comment


                    • re: #'s 413 and 414: there is evidence that Stata is slower than competitors and, to some extent, less likely to converge: see McCoach, DB, et al. (2018), "Does the package matte? A comparison of five common multilevel modeling software packages", Journal of educational and behavioral statistics, 43 (5): 594-627; re: speed, p. 620 says, "In terms of computational speed, Stata was by far the slowest of the software programs, and the difference was not trivial "; the situation is not as clear re: convergence but there is clearly a problem; note that I sent a pre-print of this to people at StataCorp and have, intermittently, been in touch; while I am assured that work on these issues is ongoing, it is still the case that Stata is slow for many of these models (and the same appears to be true for SEM/GSEM though I know of no actual comparative data for these); there is some evidence of convergence problems at "boundaries" (e.g., random effects near zero)

                      Comment


                      • re: 415. Rich, thanks for that clarification. It may be that my perception that -mixed- is not slow is because I actually end up having to use -melogit- more often than -mixed-, and by comparison, -mixed- is greased lightning and really easy to get to converge.

                        Comment


                        • I think almost all commands should have the option of computing cluster-robust variance matrix estimators -- and these allow for various kinds of heteroskedasticity as well. There are two kinds of commands where Stata does not allow either vce(robust) or vce(cluster id).

                          1. Commands where they clearly should, because the estimators are consistent with general forms of cluster correlation (including serial correlation) and heteroskedasticity. The commands -sureg- and -reg3- fit into this category.

                          2. Commands where the need for vce(robust) or vce(cluster id) is an admission that the estimators are consistent because the need for a robust variance matrix violates the underlying assumptions needed for consistency. xtlogit with the fe option and xttobit (which does RE tobit) are two examples. Somewhat puzzling is that xtlogit with the re option does allow for a full variance sandwich estimator but xttobit with the RE option does not. Technically, both estimators are inconsistent if anything about the model is misspecified: including serial correlation. (Contrast xtreg and xtpoisson, which are fully robust to serial correlation and any form of heteroskedasticity). I still prefer allowing computation of robust standard errors even when the parameter estimators are inconsistent. After all, all models are approximations to the truth. We should compute standard errors that properly account for the sampling uncertainty.

                          3. Related to point (1) is that I think the -gmm- command should allow a weighting matrix that leads to the GMM version of three stage least squares. This estimator can have better small-sample properties than GMM with an unrestricted weighting matrix. Of course, one would allow vce(robust) and vce(cluster id) options.

                          Comment


                          • The do-file editor, at least in Windows, allows you to position the cursor somewhere in the file, and then type ctrl+D, and Stata will then execute the file from the line in which the cursor is located on down to the end. That's often convenient.

                            What I find I need to do more often, however, is the reverse. It would be nice to have a keyboard shortcut that would allow me to place the cursor at a desired stopping point, type the shortcut and have Stata respond by starting at the top of the do-file and continuing down to where the cursor is, and then halt.

                            Comment


                            • #418 Ctrl+Shift+Home Extend selection to start of document. https://www.scintilla.org/SciTEDoc.html

                              Comment


                              • #419This is really useful. On a Mac: shift-command-uparrow

                                Comment

                                Working...
                                X