Announcement

Collapse
No announcement yet.
This is a sticky topic.
X
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Niels Henrik Bruun View Post
    When using user-written commands, ancillary files are saved in the current folder.
    [...] It would be better if the ancillary were saved in the personal folder.
    Are you asking for changing the default? You can already change the location where ancillary files are stored with

    Code:
    net set other PERSONAL
    which could be part of your profile.do.

    Also, wouldn't PLUS be more in line with the idea that ancillary files are additions provided by others?

    Comment


    • easier way to obtain margins estimates and plot after Mi estimate

      Comment


      • I have been asking for this since at least Stata 16, but I would really appreciate a -sort- that achowledges sorting in descending order. I am aware of -gsort-, but this fails to set a sorting flag in the dataset for any variables sorted in reverse order.In my opinion, there shouldn't be two separate commands -sort- and -gsort-, there should be a single -sort- that allows sorting in ascending and descending order, and would be compatible with the -by- prefix on those byvars.

        Comment


        • I support Leonardo Guizzetti's request in #363. SORT CASES in SPSS works that way, for example.
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • -egen, concat()- with the -decde- option could be modified to include a -notrim- option to prevent the use of -trim()- or similar behaviour. This saves a step of having to then substitute a character when spaces are desired and then having to replace then again.

            Comment


            • I would like to request four new built-in functions in both Stata and Mata, ustrregextf, ustrregexta, regextransform, and regextransformall. All of them should apply a Stata/Mata string function on a pattern in s1. The functions will take three compulsory arguments: s1, re, and t.
              The functions were already written by me in Mata, here's the code (I passed "t" as a string scalar, e.g. t="ustrupper", but maybe a different solution can be found):

              Code:
              version 18.0
              clear all
              
              loc RS        real scalar
              loc SS        string scalar
              loc SM        string matrix
              
              mata:
              mata set matastrict on
              
              `SM' regextransform(`SM' s1, `SS' re, `SS' t,| `RS' noc, `RS' std,              
                                  `RS' nlalt)                                    /* Single  */
              {
                  `RS' i, j
                  `SS' g, s
                  noc   = noc   != . ? noc   : 0
              
                  s = st_tempname()
                  for(i = 1; i <= rows(s1); i++) {
                      for(j = 1; j <= cols(s1); j++) {
                          (void)   regexmatch(s1[i,j], re, noc, std, nlalt)
                          if ((g=regexcapture(0)) != "") {
                              stata(`"mata: st_strscalar(""'        + s + `"", "'            +
                                                         t + `"(""' + g + `""))"')
                              s1[i,j] = usubinstr(s1[i,j], g,  st_strscalar(s), 1)
                          }
                      }
                  }
                  return(s1)
              }
              
              `SM' regextransformall(`SM' s1, `SS' re, `SS' t,| `RS' noc, `RS' std,          
                                     `RS' nlalt)                                 /* Single  */
              {
                  `RS' i, j
                  `SS' g, g_flag, s
                  noc   = noc   != . ? noc   : 0
              
                  s = st_tempname()
                  for(i = 1; i <= rows(s1); i++) {
                      for(j = 1; j <= cols(s1); j++) {
                          while(1) {
                              (void)   regexmatch(s1[i,j], re, noc, std, nlalt)
                              if ((g=regexcapture(0)) != "" & g != g_flag) {
                                  stata(`"mata: st_strscalar(""'              + s + `"", "'  +
                                                                   t + `"(""' + g + `""))"')
                                  s1[i,j] = usubinstr(s1[i,j], g, (g_flag=st_strscalar(s)), 1)
                              } else break
                          }
                      }
                  }
                  return(s1)
              }
              
              `SM' ustrregextf(`SM' s1, `SS' re, `SS' t,| `RS' noc)              /* Single  */
              {
                  `RS' i, j
                  `SS' g, s
                  noc = noc != . ? noc : 0
              
                  s = st_tempname()
                  for(i = 1; i <= rows(s1); i++) {
                      for(j = 1; j <= cols(s1); j++) {
                          (void) ustrregexm(s1[i,j], re, noc)
                          if ((g=ustrregexs(0)) != "") {
                              stata(`"mata: st_strscalar(""'        + s + `"", "'            +
                                                         t + `"(""' + g + `""))"')
                              s1[i,j] = usubinstr(s1[i,j], g,  st_strscalar(s), 1)
                          }
                      }
                  }
                  return(s1)
              }
              
              `SM' ustrregexta(`SM' s1, `SS' re, `SS' t,| `RS' noc)              /* Single  */
              {
                  `RS' i, j
                  `SS' g, g_flag, s
                  noc = noc != . ? noc : 0
              
                  s = st_tempname()
                  for(i = 1; i <= rows(s1); i++) {
                      for(j = 1; j <= cols(s1); j++) {
                          while(1) {
                              (void) ustrregexm(s1[i,j], re, noc)
                              if ((g=ustrregexs(0)) != "" & g != g_flag) {
                                  stata(`"mata: st_strscalar(""'              + s + `"", "'  +
                                                                   t + `"(""' + g + `""))"')
                                  s1[i,j] = usubinstr(s1[i,j], g, (g_flag=st_strscalar(s)), 1)
                              } else break
                          }
                      }
                  }
                  return(s1)
              }
              end
              Last edited by Ilya Bolotov; 09 Oct 2024, 17:15.

              Comment


              • Originally posted by Bruce Weaver View Post
                Mission accomplished, at least partly:
                https://www.statalist.org/forums/for...rt#post1765942

                Comment


                • The ability to enter in custom subgroup means in Stata's meta forestplot suite just like the customoverall option allows.

                  Comment


                  • Originally posted by Fahad Mirza View Post

                    By that I mean allow Stata to read pixel data from images in PNG, JPEG, TIFF etc.

                    Gradient colors would be interesting as it allows for transition of state. Imagine a sankey plot with gradient tones.
                    Even better, we very much could use the ability to use gradient colors (tones) in between color points / positions on objects created by Stata's graphics engine.
                    And indeed, that would be most useful for alluvial and Sankey plots but there are other applications.
                    For example, instead of using twoway contour plot with area shading, we could use the x y data points [0,0] and [1,1] (or any other value), set their color respectively to white & gold and create this effect to color the plot area (using the suggested option to set gradient colors):
                    Click image for larger version

Name:	Background_using_gradient_colors.png
Views:	1
Size:	37.0 KB
ID:	1766053

                    Last edited by ericmelse; 19 Oct 2024, 01:43.
                    http://publicationslist.org/eric.melse

                    Comment


                    • R integration and support for GAMs (general additive models)

                      Comment


                      • A command that automatically detects and marks values like NULL, N/A, #N/A, or NA as missing during data import like pandas does automatically in Python.

                        Comment


                        • Re #371. I'd make that an option that can be specified on the -import excel- and -import delimited-. I wouldn't make it automatic or default behavior because things can be complicated. In fact, a data set that I imported just two days ago has "NULL" for missing values but also has a valid N/A response category for certain variables as well. Even an option would have difficulty coping with this. The option itself would have to be sufficiently complicated that it might just be simpler to deal with it by writing a few lines of code to loop over string variables and replace NULL with "". I mean
                          Code:
                          ds, has(type string)
                          local str_vars `r(varlist)'
                          foreach v of local str_vars {
                              replace `v' = "" if inlist(`v', "NULL", "N/A", "#N/A", "NA")
                          }
                          or some slight variation on that doesn't take much effort. In fact, you could just wrap that in a program in an .ado file to make it painless to use if it comes up frequently enough in your workflow.

                          Comment


                          • Thanks Clyde for the code to address those unusual placeholder values, but I was specifically trying to avoid writing that piece of code. In my opinion, having it as an option in the -import excel- and -import delimited- commands would be much more helpful rather than writing custom code every time we import datasets, which is what I was suggesting. When importing an Excel file using pandas, any cells that are empty or have values like NULL are automatically recognized as NaN (Not a Number), which represents missing values in pandas. This allows for easier handling of missing data. While pandas treats a variety of placeholders as NaN by default, Stata does not handle these placeholders the same way. This can be problematic when dealing with datasets containing thousands of variables, as some of these unusual placeholder values might be easily overlooked. Best, Miguel
                            Last edited by Miguel Henry; 10 Nov 2024, 23:25.

                            Comment


                            • I am with Clyde here. What if NA represents ISO-3166 ALPHA-2 of Namibia? Do you think you would spot this using such an option? Unlikely. The same is true for NaN, by the way, and probably for many others, too.
                              Last edited by daniel klein; 11 Nov 2024, 02:12.

                              Comment


                              • Two related requests, which will probably be met with hostility.

                                1) Please either allow more than 80 characters in a variable label, or add an option to 'export excel' that allows you to use variable notes instead of variable names or labels.

                                I know that it is the opinion of many that there is no good reason to have a label of more than 80 characters, but this comes up in my work almost all of the time. A common use case is a survey where it would be helpful to be able to fit the full question text in the export rather than a truncated or shortened version of the question.

                                2) Please allow variable names to have more than 32 characters

                                Again, I imagine many will be up in arms that variables should never be this long. But I am often running up against the limit and need to spend hours to find a way to make it all 'fit'. The use case that I often come across is that variables have about 6-7 categorizations that all need to be captured in the variable name to make it easy to "grab" them for various analyses, while also allowing for extra characters so that additional variables can be created based on the original variable name in loops.

                                Comment

                                Working...
                                X