Wishlist for Stata 15

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17671
#136

11 Dec 2016, 06:10

Dear All,
I wonder if, whenever it omits one variable due to collinearity, Stata 15 can give back a message stating which variables are envolved in that issue.

Kind regards,
Carlo
(StataNow 18.5)
2 likes
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1116
#137

11 Dec 2016, 07:32

I suggest adding an option to margins to display confidence intervals (or prediction/forecast intervals) computed using stdf (standard error of the forecast) rather than stdp (standard error of the linear prediction). See this thread for some context and an example.

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#138

11 Dec 2016, 08:31

Carlo #136

? _rmcoll _rmdcoll
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17671
#139

11 Dec 2016, 09:55

Nick:
thanks for pointing those user-written programmes out.
I was not aware of them.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4940
#140

11 Dec 2016, 09:57

Actually _rmcoll and _rmdcoll are part of official Stata. I suspect most users are not aware of them as they are mostly used when writing Stata programs.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17671
#141

11 Dec 2016, 10:09

Richard is right.
I thought they were user-written programmes indeed, so I typed -search- instead of -help-.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Avi Jutaru

Join Date: Jul 2016

Posts: 11
#142

17 Dec 2016, 11:49

I would add the following things to Stata 15 if I could:

1) Better handling of character vs numerical variables. Similarly to SAS, R and SPSS, I wish there would be a way to define the type of each variable, in a way that each model will know which are the categorical variables.
2) Data Mining functionality - I don't expect Stata to include features like SVM, Boosting or association rules, but I do expect to find models such as decision trees (CART) and random forest. In addition, I would like to see in the dialog box of the models, a possibility to divide the data into training and testing (without actually creating datasets), and the output to include validation of the testing set. This is relevant also to regression models.
3) I wish there was a nicer output, and a possibility to export it to MS Word. For example, one thing I am missing is an interactive way of creating tables (descriptive or frequency) and outputting them to MS WORD.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#143

17 Dec 2016, 12:30

1) Better handling of character vs numerical variables. Similarly to SAS, R and SPSS, I wish there would be a way to define the type of each variable, in a way that each model will know which are the categorical variables.

But there is. It's factor variable notation.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#144

19 Dec 2016, 14:42

Nick Cox I think what Avi Jutaru was referencing with the factor thing was more about the ability to treat string data as factors and being able to label string data directly. R treats strings like strLs in Stata, except they can be passed directly to modeling functions where the strings are automatically converted/treated as indicators. SAS and SPSS - from what I remember - both allow users to label string data, so a string like "A101" could have a label associated with it; this has always intrigued me since using a purely numeric value would be more efficient in these cases, but for some reason it is still used fairly often in practice. Being able to pass a string variable to other commands is what I think Avi Jutaru was after. This would make something like:

Code:

. sysuse auto.dta, clear (1978 Automobile Data) . reg mpg weight i.make make: string variables may not be used as factor variables r(109);

valid for execution since the make variable would be treated as a series of indicators in the same way it would if the data were first numerically encoded.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1116
#145

19 Dec 2016, 15:07

Re #144, SPSS is rather inconsistent. Some procedures allow use of strings as factor variables, and others don't. E.g.,
http://www-01.ibm.com/support/docvie...id=swg21483147

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#146

20 Dec 2016, 02:07

My initial reaction is that allowing strings as factors would create immensely more problems than it solves.

The downside of any feature understood by the smart people who ask for it because they will only use it sensibly is often the technical support and Statalist support required by people who apply it blindly. Everyone should care about that when it means less attention to their own more challenging questions.

As the solution is just to encode, or otherwise to create appropriate numeric variables, I have to disagree.
Comment
Jeph Herrin

Join Date: Apr 2014

Posts: 332
#147

20 Dec 2016, 08:16

There are a number of very useful commands in Stata which report confidence intervals for an estimate: lincom, nlcom, and ratio, to name a few. I use them often, but am routinely annoyed that they don't store the confidence interval bounds in their return list; I have to reconstruct them from r(estimate) and r(se) or e(b) and e(V). In some instances, it is not trivial to reconstruct them. My modest wish is that in Stata 15 any command which reports only one confidence interval return the lower and upper bounds for the confidence interval.

Last edited by Jeph Herrin; 20 Dec 2016, 08:28.
4 likes
Comment
Sean Fiedler

Join Date: Jan 2016

Posts: 72
#148

20 Dec 2016, 09:56

Throwing in my 2 cents for some wishes

- a separate forum section for wishlist items may be useful (apologies if these are repeat wishes)

- can the generate command have options to immediately specify the variable format and variable label? This would reduce the very common need for multiple commands on multiple lines. Especially for time series, the extra line of "format t %tm" could be saved by doing something like "gen t = mofd(x), format(%tm) "

- can the median value be a part of the summarize command, or at least the return results of summarize, without needing the "detail" option

- when working with files in the program editor window, when I use the File>Save As dialog box it brings up the same directory no matter which of several files is the active one. Can the directory where each file exists be opened by default? (similar to how MS Office save dialog box works) It is a pain to have to go to my top level folder and then go back through my whole computer directory to the correct file folder per each file I want to save as.

- When you run a highlighted section of code from the program editor, the command window shows the most recent command entry as "do Temp\STD100000.tmp". Can it show the literal commands that were run? After running many commands like this I basically lose the ability to see what my recent commands were.

- Can the program editor have a way to run just the single line the cursor is on? If I want to run the single line for now I have to highlight at least one character width first, and then press ctrl+D to run. A single keystroke to run the current line would be faster. (apologies if this exists and I missed it somehow)

- Can the tsline command have its own natural option to graph with marker symbols? To do so now I have to add on the two options " tsline y, recast(connected) ms(O)" every single time just to get markers. Also, for tsline I suspect almost nobody wants the xaxis to read "date", my vote is to have the option xtitle("") by default.

- Every time a macro evaluates to empty or blank, can a caution message be displayed? A blank macro evaluation has been the cause of about 98% of my macro programming problems. I know this could be resource intensive, but maybe a "set macrowarning on/off" option could be used.

- Can the command window gui include some real estate for showing the values of macros? In the lower right hand side I see the visual display of "Properties>Variables + Data", most of this I don't use, I would love to have the option to instead show what local macros and global macros are defined, similar to the variable list on the upper right hand side. Maybe also include mata objects.

- Obviously a longer term idea but would love to see some machine learning methods added, eg, decision trees and neural networks

Of course the software is still great as is, many thanks.

Last edited by Sean Fiedler; 20 Dec 2016, 10:33.
3 likes
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#149

23 Dec 2016, 11:16

Sean Fiedler the blank macro thing would probably pollute the results window. For example, any non specified optional parameters would create a list of entries in the results window just to say the macros were empty. The issue of displaying local macro values in the GUI is a scoping issue in some cases (e.g., the macro would only be defined within the scope of the command).

The suggestion for the generate command I could see as pretty helpful. The do file editor suggestion could be nice too.
2 likes
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#150

29 Dec 2016, 12:40

I think I've mentioned this previously, but it would be great if graphs that use the by option would return an empty plot image in cases where there is not sufficient data available to render marks in the plot itself. Basically it would prevent errors from being thrown due to the lack of data (which could be considered a missing data problem in which case the behavior would be inconsistent with other treatments of missing data in plots).
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment