Wishlist for Stata 18

Bert Lloyd

Join Date: Apr 2014

Posts: 110
#331

31 Mar 2022, 09:00

Options for additional "flavors" of markdown in dyndoc and related. (E.g., pandoc markdown)

Syntax highlighting for dyndoc (i.e. mix of Stata and markdown highlighting) in the do-file editor.
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#332

01 Apr 2022, 09:52

1. I often find myself re-running the same section of code when I'm debugging. If it is a large section of code, this can be inconvenient. Being able to set something like bookmarks and run the code between two bookmarks would be nice.
2. While the eregress estimators are great, if would be nice if we had something similar where possible for fixed effects.
3. xtregar - could it be extended to (i) handle more than one lag on the serial correlation, (ii) present robust standard errors, or (iii) allow for endogeneity?
Comment
Chris Stefancik

Join Date: Dec 2018

Posts: 6
#333

01 Apr 2022, 18:40

How about a merge command syntax where we can specify which variable to link between the master and the using instead of requiring both variables to have the same name.

Code:

merge 1:1 patientid == patient_num using "demo_data.dta", keep(master match)
3 likes
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#334

01 Apr 2022, 18:54

Originally posted by Chris Stefancik View Post

How about a merge command syntax where we can specify which variable to link between the master and the using instead of requiring both variables to have the same name.

Code:

merge 1:1 patientid == patient_num using "demo_data.dta", keep(master match)

This is what frames partially solves. I say partially because frlink only supports m:1 or 1:1 linking, and sometimes the 1:m situation is more cumbersome than using merge.
2 likes
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#335

04 Apr 2022, 02:42

Originally posted by Chris Stefancik View Post

How about a merge command syntax where we can specify which variable to link between the master and the using instead of requiring both variables to have the same name.

Code:

merge 1:1 patientid == patient_num using "demo_data.dta", keep(master match)

You might like the -join- command includes in the -ftools- package.
Comment
Zachary Brown

Join Date: Apr 2022

Posts: 1
#336

05 Apr 2022, 09:02

Two items for the built-in mixed logit choice models command:
An internal implementation of the user- 'mixlogitwtp' by Hole, i.e. within 'cmmixlogit' and 'cmxtmixlogit'. This would facilitate additional postestimation work with these models, -for example, routines that rely on scores outputted from standard ML commands.

With the panel command 'cmxtmixlogit', an option to specify that a given choice is replicated multiple times within decisionmaker. With conditional logit, this option is unnecessary because the replicated tasks can simply be implemented as weights. But with panel mixed logit, each decisionmaker's likelihood is integrated over the random coefficients before taking logs, and so weights can't vary within decisionmaker. The use of replications would essentially a way to reduce the memory burden of executing the following:

expand choicereps
bysort DMid taskid: gen crep = _n
egen taskid_expanded = group(taskid crep)
cmset DMid taskid_expanded altvar
cmxtmixlogit chosen x

Besides being cumbersome to code the above, it also can dramatically and unnecessarily increase the memory burden when decisionmakers are making the same choice across many identical choice occasions. (An example would be recreational site choice data or transportation mode choices. In the dataset I'm working with, the number of rows in the data goes from ~150K to 2M+.)

Instead, with task replicates it would be nice if we could implement something like:
cmxtmixlogit chosen x, nchoices(choicereps)

Computationally, this would be easy to implement, since we can just compute the DM's conditional likelihood as by taking the probability of each distinct choice made, raised to a power equal to the number of replicates that choice was made on identical choice occasions, and then taking product of each of these distinct choice probabilities.
Comment
Justus Franz-Christoph Meyer

Join Date: Dec 2017

Posts: 7
#337

08 Apr 2022, 12:39

I think one could add further distributions to the standard set of distributions available in the statistical distribution functions, even if only to make it more comfortable to use them (e.g. various beta-type distributions, skewed normal etc.).
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 751
#338

09 Apr 2022, 15:12

The current v17 documentation file STATA FUNCTIONS REFERENCE MANUAL RELEASE 17 (fn.pdf) shows the functional forms of some but not all of the probability distributions it covers (see pp. 109 and beyond in the pdf file). For example, compare betaden and cauchyden.

Might future editions of the documentation show the functional forms for all distributions?
4 likes
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#339

10 Apr 2022, 07:28

Along the lines of other requests related to merge (and recognizing my previous asks to adopt something analogous to more standard SQL type capabilities), it’d be nice to be able to use expressions that resolve to booleans in merge commands. For example, being able to specify that you want to merge based on a common ID but only for records at a later/earlier date. The syntax might look something like:

merge 1:1 id == otherid & using.newdatefield > master.date

Using and master are intended to indicate which frame/file the field would be found in.
2 likes
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#340

11 Apr 2022, 06:12

Originally posted by wbuchanan View Post

Along the lines of other requests related to merge (and recognizing my previous asks to adopt something analogous to more standard SQL type capabilities), it’d be nice to be able to use expressions that resolve to booleans in merge commands. For example, being able to specify that you want to merge based on a common ID but only for records at a later/earlier date. The syntax might look something like:

merge 1:1 id == otherid & using.newdatefield > master.date

Using and master are intended to indicate which frame/file the field would be found in.

For inspiration, I recently discovered Pandas (Python) supports this through the merge_asof method: https://pandas.pydata.org/pandas-doc...erge_asof.html
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#341

11 Apr 2022, 07:12

Originally posted by wbuchanan View Post

Along the lines of other requests related to merge (and recognizing my previous asks to adopt something analogous to more standard SQL type capabilities), it’d be nice to be able to use expressions that resolve to booleans in merge commands. For example, being able to specify that you want to merge based on a common ID but only for records at a later/earlier date. The syntax might look something like:

merge 1:1 id == otherid & using.newdatefield > master.date

Using and master are intended to indicate which frame/file the field would be found in.

Alternatively, -rangejoin- (SSC by Robert Picard) supports this concept, but requires a bit of setup for the task.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#342

11 Apr 2022, 10:20

Re #341. It is true that -rangejoin- accommodates the specific functionality of that example. And, don't get me wrong--I'm a huge fan of -rangejoin-. It literally revolutionized my workflow when Robert Picard released it, enabling me to do work on Stata that I previously had to do in other (compiled) frameworks. But I think the functionality wbuchanan is referring to is more general: being able to restrict the match based on any kind of Boolean condition, not just ones based on ordering relationships. While I don't think my own work would make much use of that, I can imagine that others might benefit greatly.
2 likes
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#343

11 Apr 2022, 11:54

Agreed, it's not a drop-in replacement for the breadth of an SQL query syntax, but very helpful for an important subset of that type of work.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#344

11 Apr 2022, 13:02

Clyde Schechter
That was definitely what I was after. Ideally, any expression that resolves to a boolean should be acceptable to define a join operation on two sets.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#345

11 Apr 2022, 14:10

I support the intent of #339 but not the compromise implicit in it.

My preference would be not to add SQL-ish features to existing commands, but rather to support SQL operations on native SAS datasets within a Mata-like coding syntax. It would recognize SQL syntax in one-line SQL commands and in SQL programs, but allow direct access to Stata datasets both on disk and in other frames, without the push-pull of data structure conversions. Those who come to Stata from SQL will have access to SQL functionality in the form they are familiar with, and those who want to learn and adopt SQL functionality will be able to reference copious commercial documentation.

My experience - some years past now, thankfully, but perhaps outdated - working in a SAS environment back-ended by an Oracle database led me to appreciate the differing strengths of a relational database and its SQL commands and the "flat file" data structures used by conventional statistical programming with sorting and merging. What I suggest is no different than how SAS approached the problem back in the day, by creating PROC SQL to understand SQL commands (and interface directly with SQL databases) rather than shoehorn SQL capabilities into existing DATA step commands.

To be clear, there are tasks that I've undertaken in Stata that would have been more straightforward with access to SQL syntax, and I have missed access to that tool, as I had earlier missed access to the full power of contemporary regular expression syntax now embedded in the Unicode regular expression functions. What I suggest is intended to mimic the recent integration of Python functionality into Stata to extend Stata's capabilities to tools written in Python, and before that, the separation of Mata into a purpose-built language rather than gluing it onto the existing Stata matrix commands.

Where it makes sense, Stata should embrace existing programming frameworks rather than strive to become the Esperanto of statistical programming languages.
3 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment