Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two additional wishes for mata
    1) it would be great if one could run mata scripts from the dofile editor, once Mata is activated (it usually gives error because "do" is not valid from within mata)
    2) Perhaps it would be useful to have a dedicated mata editor. So that programming and using mata is as flexible as with other languages.
    F

    Comment


    • I am getting more and more often frustrated with managing ado paths when working with coauthors or helping people find bugs. Also I fear that a lot of people are not really aware what is in their ado folders and which version of a program is actually used. Thus I would suggest the following (I think I mentioned some of it somewhere before but can't find it):
      • adopath clear to reset the environment to the default when Stata is loaded
      • ensure that the path exists when adding one using adopath +
      • ignore back/forwardlash at the end of path when using adopath -
      • extend which so it shows path which program is used and alternative paths if a program appears in several ado folders.
      • Often I find people have installed several versions of programs without realising it. This causes incomplete updates, errors and can even lead to inconsistent results. The problem is that one can install a package from SSC or other sources at the same time. For example a user can install a version 1 from SSC and then an update from - say - GitHub. I had occurrences when Stata thought there are two versions and the update was incomplete. Some files were overwritten, others not. The user went ahead and the package produced at best an error, at worse results with an outdated version of a program. It would be great if it would be possible to give a program an identifier which is independent of the source from which is installed together with a program specific version indicator.

      I think the points raised are important because one of the great advantages over R are (from my point of view) Stata's capabilities of version control, easy installations of packages/community contributed programs and reproducibility.

      Comment


      • Subject: Extend -permute- to shuffle more than one variable.

        I'd recommend extending the -permute- command to allow shuffling of more than one variable. (This occurs to me in the context of responding to some StataList questions about what are being called "placebo tests" in econometrics, which as near as I can tell often are a type of permutation test with multiple variables being permuted.) This should be easy to do, as all the "hard stuff" to program in -permute- is already there, i.e., keeping track of and reporting the results. Shuffling multiple variables at each rep. should not be hard.

        Comment


        • I or others may have mentioned it, but integrating R code into Stata would be awesome too like we've done with Python. Too often, we have to "choose" between Python or R... where instead, we should have the best of the three worlds- data cleaning in Stata, and fancier statistical analyses in R/Python. It would go a great way into making people bi or trilingual, as well as expand the scope of Stata's use to others.

          Comment


          • I have long asked for xthtaylor to work with the special case in which there are no exogenous time-invariant regressors. Currently, the command aborts with an error message. There is no econometric reason for preventing the estimation of such a model. This should be a relatively straightforward update. See also: https://www.statalist.org/forums/for...nous-variables
            https://www.kripfganz.de/stata/

            Comment


            • I don't think I've seen this one so far: it would be really great to have a simple syntax for frame iteration (looping through the rows of a frame, loading variable values into local macros that can then be used in the execution of code within a loop). A common use case for me would be iterating through candidate specifications where each iteration of the loop requires multiple pieces of information to run. More generally, frame iteration would accomplish many of the same patterns that are accomplished in Python with list or dictionary iteration, and in SAS with hash object iterators.

              A very simple and non-sensical example (more typical use would have the specifications frame loaded from another source):

              Code:
              sysuse auto
              
              frame create specifications strL( name xvars vcetype )
              frame post specifications ("weight only") ("weight") ("ols")
              frame post specifications ("weight and length, robust") ("weight length") ("robust")
              frame post specifications ("weight and mpg, bootstrap") ("weight mpg") ("bootstrap")
              
              frame create results strL( name b se )
              
              foreach of frame specifications {
              
                  regress price i.foreign `xvars', vce(`vcetype')
                  frame post results ("`name'") (_b[1.foreign]) (_se[1.foreign])
              
              }
              I'm not wedded to this specific syntax, but you get the idea. Iterating over observations of a frame now is very clunky, because each local macro variable has to be explicitly assigned inside an observation counter loop, rather than being automatically assigned using the name of the corresponding variable in the frame.

              Also want to +1 on appending frames to one another as built-in functionality. I have found the various frame appending packages to be buggy.

              Comment


              • Originally posted by Lee Tucker View Post
                I don't think I've seen this one so far: it would be really great to have a simple syntax for frame iteration (looping through the rows of a frame, loading variable values into local macros that can then be used in the execution of code within a loop). A common use case for me would be iterating through candidate specifications where each iteration of the loop requires multiple pieces of information to run. More generally, frame iteration would accomplish many of the same patterns that are accomplished in Python with list or dictionary iteration, and in SAS with hash object iterators.

                A very simple and non-sensical example (more typical use would have the specifications frame loaded from another source):

                Code:
                sysuse auto
                
                frame create specifications strL( name xvars vcetype )
                frame post specifications ("weight only") ("weight") ("ols")
                frame post specifications ("weight and length, robust") ("weight length") ("robust")
                frame post specifications ("weight and mpg, bootstrap") ("weight mpg") ("bootstrap")
                
                frame create results strL( name b se )
                
                foreach of frame specifications {
                
                regress price i.foreign `xvars', vce(`vcetype')
                frame post results ("`name'") (_b[1.foreign]) (_se[1.foreign])
                
                }
                I'm not wedded to this specific syntax, but you get the idea. Iterating over observations of a frame now is very clunky, because each local macro variable has to be explicitly assigned inside an observation counter loop, rather than being automatically assigned using the name of the corresponding variable in the frame.

                Also want to +1 on appending frames to one another as built-in functionality. I have found the various frame appending packages to be buggy.
                What you've described can already be done with a little tweaking. It wouldn't be too much more work to load in specifications from an external file (say an Excel sheet), and to split out the part of the loop into a utility program that serves to fetch the relevant specs from a specific row. Nevertheless, this is one, minimal technique that works.

                Code:
                * Prepare Specs and results frames
                frame create specifications str64( name xvars vcetype )
                frame post specifications ("weight only") ("weight") ("ols")
                frame post specifications ("weight and length, robust") ("weight length") ("robust")
                frame post specifications ("weight and mpg, bootstrap") ("weight mpg") ("bootstrap")
                frame specifications: compress
                
                frame create results str64(name) double(b se)
                
                // Load data, and step through regressions, driven by the Specs
                mkf data
                cwf data
                sysuse auto
                
                frame specifications: local nspecs = _N
                forval i = 1/`nspecs' {
                  cwf specifications
                  local i_name = "`=name[`i']'"
                  local i_xvars = "`=xvars[`i']'"
                  local i_vce = "`=vcetype[`i']'"
                 
                  cwf data
                  regress price i.foreign `i_xvars', vce(`i_vce')
                  frame post results ("`i_name'") (_b[1.foreign]) (_se[1.foreign])
                }
                
                cwf results
                list
                As an aside, it's better to store estimation results as numeric types (here, double) so you can manipulate them later if needed. You can also keep your specs to strings shorter than strL because it's unlikely (or impossible) to have such long strings in most cases, and you can save some memory by being more conservative.

                Comment


                • re #531/#532 another variant of very clunky
                  Code:
                  frame change specifications
                  
                  des, varlist
                  local varlist `r(varlist)'  
                  
                  forvalues i = 1/`=_N' {
                      
                      foreach arg of local varlist {
                  
                          loc `arg' = `arg'[`i']
                      }    
                  
                      frame default {
                          
                          regress price i.foreign `xvars', vce(`vcetype')
                          frame post results ("`name'") (_b[1.foreign]) (_se[1.foreign])
                      }
                  }

                  Comment


                  • Leonardo Guizzetti
                    Assuming there is only one categorical variable of interest, I’ll post a different solution I developed to this challenge (which also includes some other functionality related to the task) a bit later. The solution I created requires an additional command call to set up the infrastructure, but will deal with any number of categories (and assumes the intercept is not of interest), captures the model command call, and allows additional info to be passed to describe the model. The approach I took uses a Mata object that I defined after I figured out that struct objects didn’t seem to persist between calls to an ado. So all the metadata needed gets stored in an object and the methods defined on the object handle all the storage and retrieval work.

                    Comment


                    • Stata kernel for Jupyter notebooks. The current Stata magic command works, but is a little clunky, and I am not a fan of needing to write in at the start of each cell. It is also a bit slow, in my experience. Much slower than running in Stata directly. Likewise, Stata should likely find a way to help notebooks with syntax highlighting.

                      Comment


                      • It would be very useful if there was an option to fvexpand which removes base values, as Mark Schaffer's fvstrip does (which is also in Sergio Correia's ftools under the name ms_fvstrip):
                        https://www.statalist.org/forums/for...-fvexpand-list
                        https://www.kripfganz.de/stata/

                        Comment


                        • Another thing I would most appreciate is to allow pinpoint customizations of the legend placement with coordinates, e.g.,
                          Code:
                          legend(pos(coord(1970, 45)))

                          Comment


                          • Jared Greathouse what would those coordinates represent? Also, are you just asking to define where the upper left corner would begin/start or something else?

                            Comment


                            • Hey, so say I have a time series that looks like this
                              Code:
                              * Example generated by -dataex-. For more info, type help dataex
                              clear
                              input double cf float(cigsale3 year)
                              122.09479671055374   123 1970
                               121.6065529858943   121 1971
                              123.65496555673786 123.5 1972
                              124.10483385256285 124.4 1973
                              126.75462121002437 126.7 1974
                              126.63825689453941 127.1 1975
                              127.92824675473713   128 1976
                               126.4873238266998 126.4 1977
                              125.37864959522437 126.1 1978
                              122.34822098598946 121.9 1979
                                119.928632673551 120.2 1980
                              119.10175992318514 118.6 1981
                              115.68795915144617 115.4 1982
                              111.04282414949492 110.8 1983
                              104.56636628133799 104.8 1984
                               102.7682761771388 102.8 1985
                               99.35677151118881  99.7 1986
                               97.39989301281054  97.5 1987
                               91.15104874688336  90.1 1988
                               88.66315821472102  82.4 1989
                               85.21665593484636  77.8 1990
                               80.76789213762831  68.7 1991
                               79.08976003988303  67.5 1992
                               79.33037122087953  63.4 1993
                               76.85184614706827  58.6 1994
                               74.46827712093958  56.4 1995
                               73.64585997258112  54.5 1996
                               73.61649595338298  53.8 1997
                               71.64467330066938  52.3 1998
                               70.90756101627477  47.2 1999
                               65.74723052634468  41.6 2000
                              end
                              format %ty year
                              
                              cap set scheme gg_w3d 
                              
                              if _rc {
                                  
                              net install schemepack, ///
                              from("https://raw.githubusercontent.com/asjadnaqvi/stata-schemepack/main/installation/") ///
                              replace    
                                  
                              }
                              
                              lab var cf "Counterfactual Sales"
                              lab var cig "Real Cigarette Sales"
                              
                              tsset year, y
                              cls
                              tsline cig, lcol("12 10 0") lwidth(thick) || /// Observed Sales
                              tsline cf, lcol("237 41 57") lwidth(medthick) lpat(--),, /// Counterfactual
                              scheme(gg_w3d) ///
                              tline(1989, lcol(blue) lpat(solid) lwid(medthick)) ///
                              legend(ring(0) pos(3)) ///
                              tti(Year) ///
                              yti(Cigarette Sales) ///
                              plotregion(fcol(gs12))
                              The coordinates are meant to rep the x-y axis respectively, allowing the user to place the legend literally wherever they please. At present the legend appears such that it intersects with the counterfactual (I know I can just delete the color from within the legend box, but suppose I like the color and wanna keep it). Say I also, arbitrarily, wanna keep the legend on the right side of the graph. In a situation where I could do, for example,
                              Code:
                              legend(pos(coord(1990, 99.5)))
                              we'd have the legend's box moved inward, such that its left vertice touches 1990, and its centroid intersects with the y value of 99.5. Now of course, this could get a lot trickier when multiple things are plotted, so no doubt some tinkering and adjustment would be needed. I like the clock way, but I always wondered why we couldn't simply be arbitrary and place it wherever our heart desires. wbuchanan

                              Comment


                              • Now that it's on my mind, it would also be cool to integrate hex colors into Stata's customization scheme. So if I like the color imperial red, I could just do
                                Code:
                                lcol("#ed2939")
                                instead of
                                Code:
                                lcol("237 41 57")

                                Comment

                                Working...
                                X