Wishlist for Stata 17

jerome falken

Join Date: Aug 2017

Posts: 88
#121

22 Feb 2020, 12:09

I would like to see a supported Stata kernel for Jupyter, there are a few out there, notably:
https://github.com/kylebarron/stata_kernel
I think this is something Statacorp should integrate, (re)distribute and support going forward (it's GPL v3), and build upon, conjointly to (keep) opening stata to the outside world (python integration, plugins, etc.); so data scientists can easily integrate it in their workflows and would facilitate peer reviews, git integration, pull requests, and publishing on corporate wikis, meetings, etc.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#122

23 Feb 2020, 07:46

#119 #120 Making generate and replace r-class would break many commands and do-files. The point is not that the results could be ignored; they would appear regardless and zap other r-class results. Even keeping classic behaviour under version control would be widely not understood or forgotten. Unless you’re volunteering to answer all the puzzled threads arising from such a change this isn’t an attractive idea.

What would seem defensible to me would be new commands that were r-class. Then users who wanted them would have different behaviour.

Last edited by Nick Cox; 23 Feb 2020, 07:55.
2 likes
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#123

23 Feb 2020, 08:53

#122 Good point. Instead of making generate and replace r-class, a local() option could be added to each to generate local macros for the reported results. For replace one might want to be able to specify two local names - one for the total number of replacements, and a second for the number of missing values. The principle remains the same: if it's accessible to copy-and-paste it should be accessible to the program. (And not by creating and parsing a log file to rule out the obvious hack .)
1 like
Comment
Marija Vasilevska

Join Date: Feb 2020

Posts: 11
#124

23 Feb 2020, 14:36

I noticed that -twoway has an upper limit of lines when overlaying, perhaps due to color palette limitations. I had to plot 20+ lines and I think it only displayed the first 15 and dropped all others. Maybe that is something that can be improved in the next version.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#125

23 Feb 2020, 14:48

re: #124 - the limit (see help limits) is 100 variable and 20 styles - maybe you need to open a new topic and ask a question; be sure to show the code you used
Comment
Marija Vasilevska

Join Date: Feb 2020

Posts: 11
#126

23 Feb 2020, 15:25

re: #125
I saw from other posts that this can be circumvented by playing around with color and solid/dotted line combinations. It worked in my case, but it was time consuming to specify the color and line combinations individually in the code. I thought maybe it would be a good suggestion to add it here.
1 like
Comment

Joseph Luchman

Join Date: Mar 2014
Posts: 114

#127

24 Feb 2020, 14:28

Extending Stata's capabilities to support function application methods such as Python's map or R's lapply would be a value add to aggregating results in the way similar to that of the addition of data frames in V16 has been to working with multiple datasets simultaneously.

In particular, being able to capture any [e]returned result from a command into a list-like object that is in memory that would not need to be restored or saved in a .ster file "on the fly".

As a basic example consider the following:

(note: uses Stata V 16)

Code:

. sysuse nlsw88
(NLSW, 1988 extract)

. unab vlist: _all

. python
----------------------------------------------- python (type end to exit) --------------------------------------------------------------------------------
>>> import sfi
>>> def summarize(x):
...  sfi.SFIToolkit.stata("summarize " + x)
...  return( [sfi.Scalar.getValue("r(N)"), sfi.Scalar.getValue("r(mean)"), sfi.Scalar.getValue("r(sd)"), sfi.Scalar.getValue("r(min)"), sfi.Scalar.getValu
> e("r(max)")])
...
>>> nlsw88_sum = dict(zip("`vlist'".split(" "), list(map(summarize, "`vlist'".split(" ")))))

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      idcode |      2,246    2612.654    1480.864          1       5159

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |      2,246    39.15316    3.060002         34         46

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        race |      2,246    1.282725    .4754413          1          3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     married |      2,246    .6420303    .4795099          0          1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
never_marr~d |      2,246    .1041852    .3055687          0          1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       grade |      2,244    13.09893    2.521246          0         18

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    collgrad |      2,246    .2368655    .4252538          0          1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       south |      2,246    .4194123    .4935728          0          1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        smsa |      2,246    .7039181    .4566292          0          1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      c_city |      2,246    .2916296    .4546139          0          1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    industry |      2,232    8.189516    3.010875          1         12

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
  occupation |      2,237    4.642825    3.408897          1         13

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       union |      1,878    .2454739    .4304825          0          1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        wage |      2,246    7.766949    5.755523   1.004952   40.74659

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       hours |      2,242    37.21811    10.50914          1         80

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     ttl_exp |      2,246    12.53498    4.610208   .1153846   28.88461

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      tenure |      2,231     5.97785    5.510331          0   25.91667
>>> nlsw88_sum
{'idcode': [2246.0, 2612.654496883348, 1480.8637634568668, 1.0, 5159.0], 'age': [2246.0, 39.15316117542297, 3.0600022239430684, 34.0, 46.0], 'race': [2246
> .0, 1.2827248441674086, 0.47544129024449705, 1.0, 3.0], 'married': [2246.0, 0.6420302760463046, 0.4795099307555556, 0.0, 1.0], 'never_married': [2246.0,
>  0.10418521816562779, 0.30556870120775137, 0.0, 1.0], 'grade': [2244.0, 13.098930481283423, 2.5212460945811133, 0.0, 18.0], 'collgrad': [2246.0, 0.23686
> 553873552982, 0.4252537737781529, 0.0, 1.0], 'south': [2246.0, 0.41941228851291185, 0.4935727773212602, 0.0, 1.0], 'smsa': [2246.0, 0.7039180765805877,
> 0.45662923067852623, 0.0, 1.0], 'c_city': [2246.0, 0.29162956366874443, 0.45461387997400726, 0.0, 1.0], 'industry': [2232.0, 8.189516129032258, 3.010874
> 8568471775, 1.0, 12.0], 'occupation': [2237.0, 4.642825212337953, 3.4088972128545767, 1.0, 13.0], 'union': [1878.0, 0.24547390841320554, 0.4304824567422
> 844, 0.0, 1.0], 'wage': [2246.0, 7.76694903741006, 5.755522859382768, 1.00495183467865, 40.74658966064453], 'hours': [2242.0, 37.218108831400535, 10.509
> 135117595422, 1.0, 80.0], 'ttl_exp': [2246.0, 12.534976707079771, 4.6102075341192625, 0.11538461595773697, 28.884614944458008], 'tenure': [2231.0, 5.977
> 849999269874, 5.510331212404582, 0.0, 25.91666603088379]}
>>> end
----------------------------------------------------------------------------------------------------------------------------------------------------------

.

It's a basic example, but captures, by variable name, all the summarized data. Additionally, one can refer to elements later by variable name like:

Code:

. python: nlsw88_sum['age']
[2246.0, 39.15316117542297, 3.0600022239430684, 34.0, 46.0]

This capability in lapply is a very nice feature of R (in my view) and can clearly be accommodated in the new Python integration using map, but it would be nice to have native to Stata and/or Mata.

- joe

Last edited by Joseph Luchman; 24 Feb 2020, 14:36.

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP

Comment

Rasool Baloch

Join Date: Nov 2016

Posts: 59
#128

25 Feb 2020, 02:41

Is there any way to select all of same word in do editor? Like notepad++ when we select a word it highlight all same words in the editor. It will be easy to trace the occurrence of same word in whole do file.

Best regards,
Rasool Bux
1 like
Comment
eric_a_booth

Join Date: Apr 2014

Posts: 292
#129

25 Feb 2020, 05:05

Originally posted by Christopher Bratt View Post

My main concern would be an improved approach to reproducible research, with a flexility similar to Rmarkdown and the knitr package in R/RStudio.

I also second the hope for much improved speed in SEM analyses.

Check out -putdocx- and -putpdf- from Stata and the -texdoc-, -webdoc-, and -markstat- packages from SSC. Of these, -texdoc- is my preferred tool.

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX
Comment
Rasool Baloch

Join Date: Nov 2016

Posts: 59
#130

26 Feb 2020, 02:55

My wish to add button in data browser/editor window for show/hide value labels or there must be an permanent option (check box) for this in Edit --> General Preference--> Data Editor. Currently we can do it by command browse, nol. Is there any way to show variable label like a tip when moving cursor/mouse on column?

Best regards,
Rasool Bux
2 likes
Comment
Zach Adams

Join Date: Oct 2016

Posts: 12
#131

26 Feb 2020, 08:05

Originally posted by William Lisowski View Post

In reaction to #119, with which I agree, I note that it is a particular example reinforcing the general principle expressed at #93 and discussed in the subsequent posts.

Ah, I missed that post - apologies to Rene Macon, I very much agree!

Originally posted by Nick Cox View Post

#119 #120 Making generate and replace r-class would break many commands and do-files. The point is not that the results could be ignored; they would appear regardless and zap other r-class results. Even keeping classic behaviour under version control would be widely not understood or forgotten. Unless you’re volunteering to answer all the puzzled threads arising from such a change this isn’t an attractive idea.

What would seem defensible to me would be new commands that were r-class. Then users who wanted them would have different behaviour.

This would also be much appreciated and a perfectly fine option from my perspective.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#132

26 Feb 2020, 09:47

#123 #131 Optionally pushing named locals to the caller space would be fine by me too.
1 like
Comment
Allan Massie

Join Date: Feb 2020

Posts: 10
#133

26 Feb 2020, 13:51

A simple UI improvement: I would love to see tabbed filename completion after do, use, using, and ls(and maybe other relevant commands/contexts).

I know that in other contexts, tab completion is for variable names. But you'll never have a varlist after ls or do, and rarely after use (unless you're re-loading a subset of variables for the dataset already in memory).

Tab completion of commands would also be pretty cool.

This may be harder to work with existing syntax, but my dream would be unixlike command line shortcuts e.g. !$ to repeat the last argument from the previous command
Comment
Yao Zhao

Join Date: Feb 2017

Posts: 226
#134

05 Mar 2020, 18:55

word clouds figure!
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#135

10 Mar 2020, 11:53

Originally posted by Oscar Ozfidan View Post

One of my wishes is to have the ability to append non dta files like xlsx or csv without having to save it as dta file first. This functionality could be restricted to files that has the same variable list initially later be expanded. I really dont understand why an xlsx file needs to be imported and saved as dta to be able to append it. Perhaps the import command may be modified to bypass that step.

Oscar Ozfidan I still need to do a bit more testing, but I've actually developed something specifically to address the specific need that you mentioned.
2 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment