Announcement

Collapse
No announcement yet.
This is a sticky topic.
X
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Include a command for best projection reiterative truncated projected least squares.

    Comment


    • I would welcome some minor improvements to the awesome bookmark feature of the do-file editor:

      1. Fix the issue of displaying the bookmark navigation pane when Stata and the Do-File Editor are on different monitors, as discussed here:
      https://www.statalist.org/forums/for...27#post1610427
      Chinh Nguyen (StataCorp) mentioned that StataCorp was aware of this problem but AFAIK it has not been fixed.


      2. Add the ability to dock the navigation pane on the side of the do-file editor window, and/or a keyboard shortcut to jump to the navigation pane, as discussed here:
      https://www.statalist.org/forums/for...re#post1610169


      (And of course it would be great if these were implemented in Stata 17 and 18 as well.)

      Comment


      • The command ivregress nicely allows factor notation for endogenous explanatory variables, making it convenient to include squares and interactions among endogenous variables and then using margins. For example,

        Code:
        ivregress 2sls y (c.w c.w#c.w c.w#c.x1 = c.z c.z#c.z c.z#c.z1) x1 x2 ... xk, robust
        margins, dydx(w)
        As far as I can tell, xtivreg doesn't allow this feature, making things cumbersome for fixed and random effects versions of IV.

        Comment


        • Dear Jeff,
          While that is not possible with official commands, you could combine the ssc command f_able to allow you to create the variables yourself.
          in your setup

          ssc install f_able
          fgen w2 = w^2
          fgen wx1 = w*x1
          ivregress 2sls y (w w2 wx1 = c.z c.z#c.z c.z#c.z1) x1 x2 ... xk, robust
          f_able w2 wx1, auto
          margins, dydx(w)

          This allows for almost any kind of variable transformation
          F

          Comment


          • I propose options to the -collect preview- commands such as md, latex, and html etc. Now -collect preview- always show output in smcl, but it would be nice to see the output in the log as, e.g., markdown(md).
            A workaround is:
            Code:
            collect export tmp.md
            type "tmp.md"
            rm "tmp.md"
            Kind regards

            nhb

            Comment


            • Maybe this doesn't need to be part of the new version, but if at all possible, it would be fantastic to have access to the Stata parser, Python's ast (abstract syntax tree) module is a perfect example. This would allow development of code linters, formatters,. validators, full-fledged plugins for vscode/other editors based on language servers, etc.

              Comment


              • I would very much like to see:

                (1) a suite of official machine learning tools (there are several user-written commands but only lasso and npregress are official commands, and they certainly don't represent the current standard).

                (2) a way to speed up mixed models (all of them). I have projects in which mixed (which is certainly the "fastest" of the bunch) has taken 12 WEEKS (yes, weeks!) to complete. That's ridiculous. And I am using an extremely expensive version of Stata for 12-cores! The slowness of multilevel models leads to bad practices. If it takes me 12 weeks to run a single model using -mixed-, I may use -mixed- for a binary outcome because I know that using -melogit- may take twice as long. I am not sure why this cannot be performed as a parallel process to speed it up, or some other mechanism...

                Fingers crossed!

                Ariel

                Comment


                • Compared with Matlab, R and other software, Stata has a big gap in drawing 3D plots. For example, three-dimensional scatter plots and so on. I sincerely hope that this will be addressed in Stata19. In particular, the ability to manually rotate angles in the view, as in Matlab.

                  Comment


                  • if possible, trace should also be a prefix command with the option depth like
                    Code:
                    trace, depth = 2: command_to_be_traced
                    .
                    Kind regards

                    nhb

                    Comment


                    • Ariel Linden I hear you about mixed models in Stata. There have been other threads on this. My personal opinion is that it's worth the one-time expense to purchase a mixed model-specific program. I think MLwiN is the best. It's continually updated and they created a Stata program (runmlwin) that calls MLwiN from Stata and returns results for further manipulation. You can't do everything with it like you can with built-in mixed, such as margins, but if you know what you are doing, you can get those yourself with a little extra code. I view it similarly as I do Mplus. If I'm doing serious structural equation modeling, I'm going to spend to get the best. Of course, some will disagree.

                      Comment


                      • I'd like to see the power and sample size functionality expanded to provide coverage for additional models. There are a few proprietary standalone pieces of software that seem to have an extremely large number of tests covered to estimate required sample sizes, power, minimal detectable effect, etc... (e.g., nQuery Advisor, PASS, Power and Precision). Additionally, it would be great if the existing power and sample size commands could be extended to cover complex sampling designs as well (Valliant, Dever, & Kreuter, 2018 have some examples for a handful of tests for how to do this). Lastly, it would also be great if some tooling related to survey sample allocation could be added to the survey commands.

                        Comment


                        • Will be nice if boxplot can be included in twoway plots. Not just as twoway but as an immediate command too. Will help to include boxplots with other plots such as bar. Right now if I have to make a box plot in twoway, I have to perform manual calculations then paste on the plot according to where I would like to see it.

                          Comment


                          • Please don't restrict anymore the possible length of variable names: I followed previous discussions about this: If I design a project, I would never name variables longer than 32 characters. However, for some projects this is a given (as such datasets are handed over to me for processing) and I just HAVE to handle names longer than 32 and it is a huge PITA and a big source of possible errors to rename everything, just because stata has such restrictions in place.

                            Comment


                            • Enable use of wildcards in -reshape long- for naming variable stubs

                              Comment


                              • I second all the recommendations in #85.

                                Some additional wishes:
                                1. Increase limits in general. I regularly work with very large datasets (>100GB) and often encode strings which can significantly reduce the file sizes (think of replacing 1B observations of a str2000 with an int or long), but the limit is 65,536 unique values - I think this could easily be increased. Other limits that I've hit up against and would be nice to increase include the number of variables (rare, to exceed 120k, but has happened), the # of characters in a command, and number of arguments in inlist (particularly strings).
                                2. Allow the use of a RAM drive to save files to memory. For example "save test.dta, ram" would save the file into ram, rather than writing it to a disc, which is faster to write and read later on. You can, of course, create a RAM drive separately and save the files to it, but Stata could do this dynamically.
                                3. Generally speed up and optimize existing commands. For example, despite sort being significantly improved, it's not consistently leveraged. For example:
                                  Code:
                                  	clear all
                                  	sysuse auto, clear
                                  	expand 1000000
                                  	timer on 1
                                  	duplicates drop make, force
                                  	timer off 1
                                  	
                                  	sysuse auto, clear
                                  	expand 1000000
                                  	timer on 2
                                  	sort make
                                  	duplicates drop make, force
                                  	timer off 2
                                  	
                                  	timer list
                                  On my system:
                                  1: 89.48 / 1 = 89.4830
                                  2: 53.41 / 1 = 53.4090
                                  Most string functions are also pretty slow - maybe things like sed or tr from Linux could be adopted to speed up commands like subinstr and strpos. Reshaping also is far from optimized with large datasets ex(reshaping 1000 variables from wide to long) can take very long.
                                4. With "append, force", rather than force the new data type to conform with the existing type, allow the option to convert all nonmatching variables to strings. Currently if you use "append, force" the data in the appended variable will be lost. On a one-off basis, using tostring for the original or appended data is trivial, but when there are hundreds of variables and many files, this is not trivial. One solution is to bring in all data in as strings and then destring as possible after files are appended, but that is not memory efficient.
                                5. Create a much better many-to-many merge. Joinby (which I still believe should be what m:m does) is incredibly slow and memory inefficient for large datasets to the point of being unusable. There are work-arounds that I've implemented (using expand and m:1 / 1:m) but it would be nicer to have this functionality built into Stata.
                                6. Better support for JSON files. I consider JSON to be a hideous file structure, but it's becoming very common and Stata has limited ability to work with it, particularly nested files.

                                Comment

                                Working...
                                X