Wishlist for Stata 19

daniel klein

Join Date: Mar 2014

Posts: 3798
#361

02 Oct 2024, 02:11

Originally posted by Niels Henrik Bruun View Post

When using user-written commands, ancillary files are saved in the current folder.
[...] It would be better if the ancillary were saved in the personal folder.

Are you asking for changing the default? You can already change the location where ancillary files are stored with

Code:

net set other PERSONAL

which could be part of your profile.do.

Also, wouldn't PLUS be more in line with the idea that ancillary files are additions provided by others?
Comment
carole fantini

Join Date: May 2016

Posts: 56
#362

02 Oct 2024, 05:14

easier way to obtain margins estimates and plot after Mi estimate
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2365
#363

02 Oct 2024, 17:36

I have been asking for this since at least Stata 16, but I would really appreciate a -sort- that achowledges sorting in descending order. I am aware of -gsort-, but this fails to set a sorting flag in the dataset for any variables sorted in reverse order.In my opinion, there shouldn't be two separate commands -sort- and -gsort-, there should be a single -sort- that allows sorting in ascending and descending order, and would be compatible with the -by- prefix on those byvars.
10 likes
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1109
#364

02 Oct 2024, 21:08

I support Leonardo Guizzetti's request in #363. SORT CASES in SPSS works that way, for example.
https://www.ibm.com/docs/en/spss-sta...w-sort-command

https://www.ibm.com/docs/en/spss-sta...cases_examples

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2365
#365

06 Oct 2024, 18:27

-egen, concat()- with the -decde- option could be modified to include a -notrim- option to prevent the use of -trim()- or similar behaviour. This saves a step of having to then substitute a character when spaces are desired and then having to replace then again.
1 like
Comment

Ilya Bolotov

Join Date: Nov 2018
Posts: 75

#366

09 Oct 2024, 17:10

I would like to request four new built-in functions in both Stata and Mata, ustrregextf, ustrregexta, regextransform, and regextransformall. All of them should apply a Stata/Mata string function on a pattern in s1. The functions will take three compulsory arguments: s1, re, and t.
The functions were already written by me in Mata, here's the code (I passed "t" as a string scalar, e.g. t="ustrupper", but maybe a different solution can be found):

Code:

version 18.0
clear all

loc RS        real scalar
loc SS        string scalar
loc SM        string matrix

mata:
mata set matastrict on

`SM' regextransform(`SM' s1, `SS' re, `SS' t,| `RS' noc, `RS' std,              
                    `RS' nlalt)                                    /* Single  */
{
    `RS' i, j
    `SS' g, s
    noc   = noc   != . ? noc   : 0

    s = st_tempname()
    for(i = 1; i <= rows(s1); i++) {
        for(j = 1; j <= cols(s1); j++) {
            (void)   regexmatch(s1[i,j], re, noc, std, nlalt)
            if ((g=regexcapture(0)) != "") {
                stata(`"mata: st_strscalar(""'        + s + `"", "'            +
                                           t + `"(""' + g + `""))"')
                s1[i,j] = usubinstr(s1[i,j], g,  st_strscalar(s), 1)
            }
        }
    }
    return(s1)
}

`SM' regextransformall(`SM' s1, `SS' re, `SS' t,| `RS' noc, `RS' std,          
                       `RS' nlalt)                                 /* Single  */
{
    `RS' i, j
    `SS' g, g_flag, s
    noc   = noc   != . ? noc   : 0

    s = st_tempname()
    for(i = 1; i <= rows(s1); i++) {
        for(j = 1; j <= cols(s1); j++) {
            while(1) {
                (void)   regexmatch(s1[i,j], re, noc, std, nlalt)
                if ((g=regexcapture(0)) != "" & g != g_flag) {
                    stata(`"mata: st_strscalar(""'              + s + `"", "'  +
                                                     t + `"(""' + g + `""))"')
                    s1[i,j] = usubinstr(s1[i,j], g, (g_flag=st_strscalar(s)), 1)
                } else break
            }
        }
    }
    return(s1)
}

`SM' ustrregextf(`SM' s1, `SS' re, `SS' t,| `RS' noc)              /* Single  */
{
    `RS' i, j
    `SS' g, s
    noc = noc != . ? noc : 0

    s = st_tempname()
    for(i = 1; i <= rows(s1); i++) {
        for(j = 1; j <= cols(s1); j++) {
            (void) ustrregexm(s1[i,j], re, noc)
            if ((g=ustrregexs(0)) != "") {
                stata(`"mata: st_strscalar(""'        + s + `"", "'            +
                                           t + `"(""' + g + `""))"')
                s1[i,j] = usubinstr(s1[i,j], g,  st_strscalar(s), 1)
            }
        }
    }
    return(s1)
}

`SM' ustrregexta(`SM' s1, `SS' re, `SS' t,| `RS' noc)              /* Single  */
{
    `RS' i, j
    `SS' g, g_flag, s
    noc = noc != . ? noc : 0

    s = st_tempname()
    for(i = 1; i <= rows(s1); i++) {
        for(j = 1; j <= cols(s1); j++) {
            while(1) {
                (void) ustrregexm(s1[i,j], re, noc)
                if ((g=ustrregexs(0)) != "" & g != g_flag) {
                    stata(`"mata: st_strscalar(""'              + s + `"", "'  +
                                                     t + `"(""' + g + `""))"')
                    s1[i,j] = usubinstr(s1[i,j], g, (g_flag=st_strscalar(s)), 1)
                } else break
            }
        }
    }
    return(s1)
}
end

Last edited by Ilya Bolotov; 09 Oct 2024, 17:15.

Comment

Ilya Bolotov

Join Date: Nov 2018

Posts: 75
#367

17 Oct 2024, 10:23

Originally posted by Bruce Weaver View Post

I support Leonardo Guizzetti's request in #363. SORT CASES in SPSS works that way, for example.
https://www.ibm.com/docs/en/spss-sta...w-sort-command

https://www.ibm.com/docs/en/spss-sta...cases_examples

Mission accomplished, at least partly:
https://www.statalist.org/forums/for...rt#post1765942
Comment
Jean-Michel Galarneau

Join Date: Aug 2018

Posts: 37
#368

18 Oct 2024, 07:17

The ability to enter in custom subgroup means in Stata's meta forestplot suite just like the customoverall option allows.
Comment
ericmelse

Join Date: May 2014

Posts: 420
#369

19 Oct 2024, 01:40

Originally posted by Fahad Mirza View Post

By that I mean allow Stata to read pixel data from images in PNG, JPEG, TIFF etc.

Gradient colors would be interesting as it allows for transition of state. Imagine a sankey plot with gradient tones.

Even better, we very much could use the ability to use gradient colors (tones) in between color points / positions on objects created by Stata's graphics engine.
And indeed, that would be most useful for alluvial and Sankey plots but there are other applications.
For example, instead of using twoway contour plot with area shading, we could use the x y data points [0,0] and [1,1] (or any other value), set their color respectively to white & gold and create this effect to color the plot area (using the suggested option to set gradient colors):

Last edited by ericmelse; 19 Oct 2024, 01:43.

http://publicationslist.org/eric.melse
4 likes
Comment
David Tannenbaum

Join Date: Jul 2020

Posts: 20
#370

26 Oct 2024, 22:56

R integration and support for GAMs (general additive models)
2 likes
Comment
Miguel Henry

Join Date: Oct 2015

Posts: 9
#371

09 Nov 2024, 23:16

A command that automatically detects and marks values like NULL, N/A, #N/A, or NA as missing during data import like pandas does automatically in Python.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29773
#372

10 Nov 2024, 10:27

Re #371. I'd make that an option that can be specified on the -import excel- and -import delimited-. I wouldn't make it automatic or default behavior because things can be complicated. In fact, a data set that I imported just two days ago has "NULL" for missing values but also has a valid N/A response category for certain variables as well. Even an option would have difficulty coping with this. The option itself would have to be sufficiently complicated that it might just be simpler to deal with it by writing a few lines of code to loop over string variables and replace NULL with "". I mean

Code:

ds, has(type string) local str_vars `r(varlist)' foreach v of local str_vars { replace `v' = "" if inlist(`v', "NULL", "N/A", "#N/A", "NA") }

or some slight variation on that doesn't take much effort. In fact, you could just wrap that in a program in an .ado file to make it painless to use if it comes up frequently enough in your workflow.
6 likes
Comment
Miguel Henry

Join Date: Oct 2015

Posts: 9
#373

10 Nov 2024, 23:19

Thanks Clyde for the code to address those unusual placeholder values, but I was specifically trying to avoid writing that piece of code. In my opinion, having it as an option in the -import excel- and -import delimited- commands would be much more helpful rather than writing custom code every time we import datasets, which is what I was suggesting. When importing an Excel file using pandas, any cells that are empty or have values like NULL are automatically recognized as NaN (Not a Number), which represents missing values in pandas. This allows for easier handling of missing data. While pandas treats a variety of placeholders as NaN by default, Stata does not handle these placeholders the same way. This can be problematic when dealing with datasets containing thousands of variables, as some of these unusual placeholder values might be easily overlooked. Best, Miguel

Last edited by Miguel Henry; 10 Nov 2024, 23:25.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3798
#374

11 Nov 2024, 02:05

I am with Clyde here. What if NA represents ISO-3166 ALPHA-2 of Namibia? Do you think you would spot this using such an option? Unlikely. The same is true for NaN, by the way, and probably for many others, too.

Last edited by daniel klein; 11 Nov 2024, 02:12.
2 likes
Comment
Alex Weckenman

Join Date: Nov 2019

Posts: 19
#375

11 Nov 2024, 14:33

Two related requests, which will probably be met with hostility.

1) Please either allow more than 80 characters in a variable label, or add an option to 'export excel' that allows you to use variable notes instead of variable names or labels.

I know that it is the opinion of many that there is no good reason to have a label of more than 80 characters, but this comes up in my work almost all of the time. A common use case is a survey where it would be helpful to be able to fit the full question text in the export rather than a truncated or shortened version of the question.

2) Please allow variable names to have more than 32 characters

Again, I imagine many will be up in arms that variables should never be this long. But I am often running up against the limit and need to spend hours to find a way to make it all 'fit'. The use case that I often come across is that variables have about 6-7 categorizations that all need to be captured in the variable name to make it easy to "grab" them for various analyses, while also allowing for extra characters so that additional variables can be created based on the original variable name in loops.
5 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment