Wishlist for Stata 17

Manish Srivastava

Join Date: Apr 2014

Posts: 21
#406

29 Dec 2020, 21:23

gsem is potentially a powerful tool-- it can handle a wide variety to problems very easily. But it's terribly slow, and many times fail to have initial values even though those models can be run using alternative commands.
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 772
#407

01 Jan 2021, 10:03

Might v17 consider adding some additional –margins– features for –gmm–?

Specifically, given a gmm command

Code:

gmm (residual equation),...

might it be possible to allow post-estimation options like

Code:

margins, dydx(*)

where the derivatives are taken with respect to the x's in the residual equation identified by the variables(…) option (i.e. what is returned by e(rhs))?

In some cases this may result in nonsense, but in others it can be informative. E.g. consider

Code:

gmm (y-exp({xb: x1 x2 _cons})), var(x1 x2) instr(x1 x2)

The derivatives of r=y-exp({xb: x1 x2 _cons}) with respect to (x1,x2) where the x's appear explicitly in the residual function, i.e. (d/dx)(-exp(xb)), may imaginably be of interest.

Note: This is beyond simply dydx(*) with respect to the linear predictor.

Last edited by John Mullahy; 01 Jan 2021, 10:06. Reason: Edited for clarity.
Comment
Andrew Lover

Join Date: Apr 2014

Posts: 182
#408

03 Jan 2021, 16:59

Make stable the default for -sort-.

Not using

Code:

sort x y z, stable

can be a source of major headaches.

The manual says the following:

Which is a bit perplexing. When general processor speed was a major constraint this made sense; and for those with massive datasets there could be a "fast" option that would have all the caveats of a m:m merge?

Attached Files

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3912
#409

04 Jan 2021, 00:11

Originally posted by Andrew Lover View Post

Make stable the default for -sort-.

I strongly disagree. The variables you list in the sort command should be both sufficient and necessary to ensure a unique sort order. If the sort order depends on something that is not recorded in your dataset, the dataset is flawed, not the sort command.

Last edited by daniel klein; 04 Jan 2021, 00:16.
5 likes
Comment
David Roodman

Join Date: Jul 2014

Posts: 479
#410

04 Jan 2021, 10:35

Or, further to my earlier comment if Stata Corp will not add a suite of functions for efficient operations on matrices (which, really should be easy, yes?) then give us a way to do it by allowing/documenting the writing of plugins for Mata.
3 likes
Comment
Andrew Lover

Join Date: Apr 2014

Posts: 182
#411

04 Jan 2021, 12:54

Originally posted by daniel klein View Post

I strongly disagree. The variables you list in the sort command should be both sufficient and necessary to ensure a unique sort order. If the sort order depends on something that is not recorded in your dataset, the dataset is flawed, not the sort command.

Absolutely agree, but does that imply that best practice is:

Code:

sort x y z _all

?

(Genuinely curious here).

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
Comment
daniel klein

Join Date: Mar 2014

Posts: 3912
#412

04 Jan 2021, 13:22

Originally posted by Andrew Lover View Post

Absolutely agree, but does that imply that best practice is:

Code:

sort x y z _all

?

No. I guess best practice is

Code:

sort all_variables_that_uniquely_identify_the_sort_order_that_I_want_for_whatever_reason

The point is that if a certain result (not necessarily restricted to estimation) depends on the sort order of the dataset, then all variables that define, i.e., carry substantive meaning for that particular sort order should be spelled out explicitly. We should know why a result depends on a particular sort order and the variables that we list should convey that information. Thus, I would extend my initial suggestion to

Code:

sort all_variables_that_uniquely_identify_the_sort_order_that_I_want_for_whatever_reason_and_no_variables_that_are_irrelevant_for_what_I_want

Edit:

Of course, others have written about this topic some time ago (e.g., Schumm 2006).

Schumm, L. P. 2006. Stata tip 28: Precise control of dataset sort order. The Stata Journal, 6(1), pp. 144--146.

Last edited by daniel klein; 04 Jan 2021, 13:45.
4 likes
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 517
#413

04 Jan 2021, 14:30

I think the -mixed- command needs to be optimized to work with random slopes. Although it nominally supports random slopes, in my experience it takes a very long time, or fails to converge, with all but the simplest models. I have found -mixed- to be practically useless for random slope models. -meglm- is no better.

The problem of fitting random slope models is computationally tractable. For example, HLM software runs them very quickly. So I'm sure Stata can do better if it makes these models a priority.
2 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#414

04 Jan 2021, 14:37

Re #413. As one who frequently fits random slopes models, I agree with the importance of this. But I have to say that my experience with -mixed- has been that it runs well and estimates these models quickly, even with large data sets. (That's a huge contrast with, for example, -melogit-, which can take very long times to fit even simple models in large data sets.) I wonder what accounts for the difference in our experiences with this same command.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4548
#415

05 Jan 2021, 06:01

re: #'s 413 and 414: there is evidence that Stata is slower than competitors and, to some extent, less likely to converge: see McCoach, DB, et al. (2018), "Does the package matte? A comparison of five common multilevel modeling software packages", Journal of educational and behavioral statistics, 43 (5): 594-627; re: speed, p. 620 says, "In terms of computational speed, Stata was by far the slowest of the software programs, and the difference was not trivial "; the situation is not as clear re: convergence but there is clearly a problem; note that I sent a pre-print of this to people at StataCorp and have, intermittently, been in touch; while I am assured that work on these issues is ongoing, it is still the case that Stata is slow for many of these models (and the same appears to be true for SEM/GSEM though I know of no actual comparative data for these); there is some evidence of convergence problems at "boundaries" (e.g., random effects near zero)
4 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#416

05 Jan 2021, 10:34

re: 415. Rich, thanks for that clarification. It may be that my perception that -mixed- is not slow is because I actually end up having to use -melogit- more often than -mixed-, and by comparison, -mixed- is greased lightning and really easy to get to converge.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2291
#417

05 Jan 2021, 12:10

I think almost all commands should have the option of computing cluster-robust variance matrix estimators -- and these allow for various kinds of heteroskedasticity as well. There are two kinds of commands where Stata does not allow either vce(robust) or vce(cluster id).

1. Commands where they clearly should, because the estimators are consistent with general forms of cluster correlation (including serial correlation) and heteroskedasticity. The commands -sureg- and -reg3- fit into this category.

2. Commands where the need for vce(robust) or vce(cluster id) is an admission that the estimators are consistent because the need for a robust variance matrix violates the underlying assumptions needed for consistency. xtlogit with the fe option and xttobit (which does RE tobit) are two examples. Somewhat puzzling is that xtlogit with the re option does allow for a full variance sandwich estimator but xttobit with the RE option does not. Technically, both estimators are inconsistent if anything about the model is misspecified: including serial correlation. (Contrast xtreg and xtpoisson, which are fully robust to serial correlation and any form of heteroskedasticity). I still prefer allowing computation of robust standard errors even when the parameter estimators are inconsistent. After all, all models are approximations to the truth. We should compute standard errors that properly account for the sampling uncertainty.

3. Related to point (1) is that I think the -gmm- command should allow a weighting matrix that leads to the GMM version of three stage least squares. This estimator can have better small-sample properties than GMM with an unrestricted weighting matrix. Of course, one would allow vce(robust) and vce(cluster id) options.
4 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#418

07 Jan 2021, 14:27

The do-file editor, at least in Windows, allows you to position the cursor somewhere in the file, and then type ctrl+D, and Stata will then execute the file from the line in which the cursor is located on down to the end. That's often convenient.

What I find I need to do more often, however, is the reverse. It would be nice to have a keyboard shortcut that would allow me to place the cursor at a desired stopping point, type the shortcut and have Stata respond by starting at the top of the do-file and continuing down to where the cursor is, and then halt.
1 like
Comment
Bjarte Aagnes

Join Date: Apr 2014

Posts: 786
#419

07 Jan 2021, 14:40

#418 Ctrl+Shift+Home Extend selection to start of document. https://www.scintilla.org/SciTEDoc.html
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 772
#420

07 Jan 2021, 14:59

#419This is really useful. On a Mac: shift-command-uparrow
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment