Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • I don't know if SAS implemented their own version of SQL with PROC SQL (I suspect so) or if they mask some backend to some other database. Whatever it is, they are their own "dialect", adding on features that only make sense in a SAS context and not quite allowing full ANSI-specification SQL. I would really be happy to see something like an SQL interface since there are clear advantages for some needs. But, I think this is where Stata seems to philosophically follow the idea of embracing existing frameworks -- by allowing ODBC/JDBC interfaces to these databases -- which means users who desire those tools and functionality must implement the setup steps themselves.

    Comment


    • Stata Installation Qualification Tool (IQT): adding API, or command line arguments, to integrate IQT in automated deployment processes.

      Comment


      • William Lisowski
        The SAS "SQL" implementation is not what I would consider to be anything but an example of how not to approach the problem. There are packages in several languages that implement various types of SQL functionality, particularly with regards to joins, but they do so using their own native approaches that fit within their current programming paradigms. Leonardo Guizzetti is correct that SAS is implementing something like their own version of a subset of SQL that is pretty far removed from the full ANSI specification.

        Comment


        • Nothing in #348 contradicts the point I made in #345 that

          ... SAS approached the problem back in the day by creating PROC SQL to understand SQL commands (and interface directly with SQL databases) rather than shoehorn SQL capabilities into existing DATA step commands.
          Statalist is not the forum in which to argue the strengths of SAS's particular implementation of SQL within PROC SQL, which was not the case I was making. I will point out that my copy of the SQL In a Nutshell (2nd Edition, 2004) reference sets the stage for the book in its Preface:

          SQL in a Nutshell, Second Edition, describes the latest ANSI standard, SQL2003, version of each command and then documents each platform's implementation of that command.
          The platforms comprise six databases popular at the time. So SAS was not alone in adapting the standard (originally published in 1986) to the needs of their implementation. I would prefer Stata do the same should it implement SQL functionality. It that, it would differ from what was done with Python. My preference is to apply SQL syntax to native Stata datasets rather than have to move the datasets into and out of an external database for what are largely data management tasks.

          Comment


          • Sophisticated tools to support investment/portfolio development and analyses. Examples: estimation of weights for optimal diversification (with constraints; robust; Bayesian; ...). Bootstrap standard errors for estimates of "optimal" portfolio weights. Portfolio performance analyses that relaxes assumption of i.i.d. observation periods.

            Comment


            • More comprehensive "table" creation. For example, neither *tabdisp* nor *list* allow a *collect* command to readily gather output appropriate for a table to be output via *putdocx*.

              Comment


              • Originally posted by Jay Patel View Post
                More comprehensive "table" creation. For example, neither *tabdisp* nor *list* allow a *collect* command to readily gather output appropriate for a table to be output via *putdocx*.
                I agree with the spirit of the request to expand the collect system, and I expect those features are being developed. A slight aside, putdocx directly supports listing data already.

                Comment


                • Show the name of the frame in the Data Browser!

                  Comment


                  • I great addition to Stata´s Machine Learning capabilities ( eg. Lasso ) would be an automated feature engineering, just like python's FeatueTools (https://featuretools.alteryx.com/en/stable/).

                    Once one establishes relationships among Level 0 and higher-levels (Level 1,2,3) datasets and define a cut-off time (Any data after this point in time will be filtered out before calculating features, to avoid "label leakage"), hundreds of features ( variables) are "automagic" created using max, min, means, counts,sum, etc and can be feed into ML algorithims.

                    IMHO, would involve frames, frlinks, rangestats and lots of egen to translate that in Stata.

                    Comment


                    • Originally posted by Anthony Killeen View Post
                      Show the name of the frame in the Data Browser!
                      Anthony, this is shown in the Properties pane of the Data Browser/Editor (View > Properties).

                      Comment


                      • Thank you!

                        Comment


                        • Repeated requests (unlikely to happen, I know):

                          1. Do not allow m:m merges. These are never useful outside of StataCorp. Even if they were useful in very rare situations, they surely produce more harm than good. If required, keep m:m under version control.

                          2. Remove mi's suggestion to use the force option. You never want that; yet, we regularly see it used (blindly copied) on posts to Statalist.

                          Comment


                          • When using merge with a very large master dataset and a small using dataset, Stata saves the master dataset in preserve to sort the using dataset first. This can be extremely slow if the master does not fit in preserve memory and so Stata has has to save it in I/O temporary files. This is completely unnecessary if the using dataset fits easily in the preserve memory, and can be sorted there before the merge.

                            A nice trick I use is:

                            Code:
                            frame create sorter
                            frame sorter: use usingdataset.dta, clear
                            frame sorter: sort merging_variables
                            frame sorter: save usingdataset.dta, replace
                            
                            merge 1:1 merging_variables using "usingdataset.dta"
                            Adding this trick into merge code could potentially save a lot of time doing the merge without any sacrifice whatsoever.

                            Comment


                            • For Latent Class Analysis (LCA) as conducted in gsem, to have Latent Transition Analysis and Stata equivalents of Mplus' R3STEP for 3-step latent class regression accounting for classification error, and Knownclass for multiple-group LCA.

                              Gsem currently estimates multiple-group LCA using the groups() and ginvariant() options, but ginvariant() does not offer an option to constrain coefficients but not variances. More details in this earlier post. Hence, this has to be done manually as advised by Stata Tech Support: "You could place those needed constraints into the constraint definition and then supply to the -constraints()- option after -gsem-", which can be very tedious with 100s of constraints for a 4-class model with 7 groups.

                              Comment


                              • Can Stata Corp please fix the documentation for -matrix accum- and add some examples that illustrate how these commands
                                matrix glsaccum
                                matrix opaccum
                                matrix vecaccum
                                are used?

                                In particular -matrix glsaccum- has been in the current state of documentation since about Stata 7. I do not understand nearly anything from the current abstract explanation of what -glsaccum- does, there are no examples showing how the command is used anywhere, etc.

                                And the command -glsaccum- is useful because once upon the time I had a lucky day and managed to implement all the estimators in Wooldridge 2010 "Chapter 7: Estimating Systems of Equations by OLS and GLS" from scratch, just using -glsaccum-, without even reaching the limits of the command (I used the same weighting matrix accross groups, and it apparently allows the weighting matrix to vary).

                                In short, the plain -matrix accum- is clear. But the more complicated versions listed above and in particular -glsaccum- are not clear at all in the manual .

                                Comment

                                Working...
                                X