Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Options for additional "flavors" of markdown in dyndoc and related. (E.g., pandoc markdown)

    Syntax highlighting for dyndoc (i.e. mix of Stata and markdown highlighting) in the do-file editor.

    Comment


    • 1. I often find myself re-running the same section of code when I'm debugging. If it is a large section of code, this can be inconvenient. Being able to set something like bookmarks and run the code between two bookmarks would be nice.
      2. While the eregress estimators are great, if would be nice if we had something similar where possible for fixed effects.
      3. xtregar - could it be extended to (i) handle more than one lag on the serial correlation, (ii) present robust standard errors, or (iii) allow for endogeneity?

      Comment


      • How about a merge command syntax where we can specify which variable to link between the master and the using instead of requiring both variables to have the same name.

        Code:
        merge 1:1 patientid == patient_num using "demo_data.dta", keep(master match)

        Comment


        • Originally posted by Chris Stefancik View Post
          How about a merge command syntax where we can specify which variable to link between the master and the using instead of requiring both variables to have the same name.

          Code:
          merge 1:1 patientid == patient_num using "demo_data.dta", keep(master match)
          This is what frames partially solves. I say partially because frlink only supports m:1 or 1:1 linking, and sometimes the 1:m situation is more cumbersome than using merge.

          Comment


          • Originally posted by Chris Stefancik View Post
            How about a merge command syntax where we can specify which variable to link between the master and the using instead of requiring both variables to have the same name.

            Code:
            merge 1:1 patientid == patient_num using "demo_data.dta", keep(master match)
            You might like the -join- command includes in the -ftools- package.

            Comment


            • Two items for the built-in mixed logit choice models command:
              1. An internal implementation of the user- 'mixlogitwtp' by Hole, i.e. within 'cmmixlogit' and 'cmxtmixlogit'. This would facilitate additional postestimation work with these models, -for example, routines that rely on scores outputted from standard ML commands.
              2. With the panel command 'cmxtmixlogit', an option to specify that a given choice is replicated multiple times within decisionmaker. With conditional logit, this option is unnecessary because the replicated tasks can simply be implemented as weights. But with panel mixed logit, each decisionmaker's likelihood is integrated over the random coefficients before taking logs, and so weights can't vary within decisionmaker. The use of replications would essentially a way to reduce the memory burden of executing the following:
              expand choicereps
              bysort DMid taskid: gen crep = _n
              egen taskid_expanded = group(taskid crep)
              cmset DMid taskid_expanded altvar
              cmxtmixlogit chosen x

              Besides being cumbersome to code the above, it also can dramatically and unnecessarily increase the memory burden when decisionmakers are making the same choice across many identical choice occasions. (An example would be recreational site choice data or transportation mode choices. In the dataset I'm working with, the number of rows in the data goes from ~150K to 2M+.)

              Instead, with task replicates it would be nice if we could implement something like:
              cmxtmixlogit chosen x, nchoices(choicereps)

              Computationally, this would be easy to implement, since we can just compute the DM's conditional likelihood as by taking the probability of each distinct choice made, raised to a power equal to the number of replicates that choice was made on identical choice occasions, and then taking product of each of these distinct choice probabilities.

              Comment


              • I think one could add further distributions to the standard set of distributions available in the statistical distribution functions, even if only to make it more comfortable to use them (e.g. various beta-type distributions, skewed normal etc.).

                Comment


                • The current v17 documentation file STATA FUNCTIONS REFERENCE MANUAL RELEASE 17 (fn.pdf) shows the functional forms of some but not all of the probability distributions it covers (see pp. 109 and beyond in the pdf file). For example, compare betaden and cauchyden.

                  Might future editions of the documentation show the functional forms for all distributions?

                  Comment


                  • Along the lines of other requests related to merge (and recognizing my previous asks to adopt something analogous to more standard SQL type capabilities), it’d be nice to be able to use expressions that resolve to booleans in merge commands. For example, being able to specify that you want to merge based on a common ID but only for records at a later/earlier date. The syntax might look something like:

                    merge 1:1 id == otherid & using.newdatefield > master.date

                    Using and master are intended to indicate which frame/file the field would be found in.

                    Comment


                    • Originally posted by wbuchanan View Post
                      Along the lines of other requests related to merge (and recognizing my previous asks to adopt something analogous to more standard SQL type capabilities), it’d be nice to be able to use expressions that resolve to booleans in merge commands. For example, being able to specify that you want to merge based on a common ID but only for records at a later/earlier date. The syntax might look something like:

                      merge 1:1 id == otherid & using.newdatefield > master.date

                      Using and master are intended to indicate which frame/file the field would be found in.
                      For inspiration, I recently discovered Pandas (Python) supports this through the merge_asof method: https://pandas.pydata.org/pandas-doc...erge_asof.html

                      Comment


                      • Originally posted by wbuchanan View Post
                        Along the lines of other requests related to merge (and recognizing my previous asks to adopt something analogous to more standard SQL type capabilities), it’d be nice to be able to use expressions that resolve to booleans in merge commands. For example, being able to specify that you want to merge based on a common ID but only for records at a later/earlier date. The syntax might look something like:

                        merge 1:1 id == otherid & using.newdatefield > master.date

                        Using and master are intended to indicate which frame/file the field would be found in.
                        Alternatively, -rangejoin- (SSC by Robert Picard) supports this concept, but requires a bit of setup for the task.

                        Comment


                        • Re #341. It is true that -rangejoin- accommodates the specific functionality of that example. And, don't get me wrong--I'm a huge fan of -rangejoin-. It literally revolutionized my workflow when Robert Picard released it, enabling me to do work on Stata that I previously had to do in other (compiled) frameworks. But I think the functionality wbuchanan is referring to is more general: being able to restrict the match based on any kind of Boolean condition, not just ones based on ordering relationships. While I don't think my own work would make much use of that, I can imagine that others might benefit greatly.

                          Comment


                          • Agreed, it's not a drop-in replacement for the breadth of an SQL query syntax, but very helpful for an important subset of that type of work.

                            Comment


                            • Clyde Schechter
                              That was definitely what I was after. Ideally, any expression that resolves to a boolean should be acceptable to define a join operation on two sets.

                              Comment


                              • I support the intent of #339 but not the compromise implicit in it.

                                My preference would be not to add SQL-ish features to existing commands, but rather to support SQL operations on native SAS datasets within a Mata-like coding syntax. It would recognize SQL syntax in one-line SQL commands and in SQL programs, but allow direct access to Stata datasets both on disk and in other frames, without the push-pull of data structure conversions. Those who come to Stata from SQL will have access to SQL functionality in the form they are familiar with, and those who want to learn and adopt SQL functionality will be able to reference copious commercial documentation.

                                My experience - some years past now, thankfully, but perhaps outdated - working in a SAS environment back-ended by an Oracle database led me to appreciate the differing strengths of a relational database and its SQL commands and the "flat file" data structures used by conventional statistical programming with sorting and merging. What I suggest is no different than how SAS approached the problem back in the day, by creating PROC SQL to understand SQL commands (and interface directly with SQL databases) rather than shoehorn SQL capabilities into existing DATA step commands.

                                To be clear, there are tasks that I've undertaken in Stata that would have been more straightforward with access to SQL syntax, and I have missed access to that tool, as I had earlier missed access to the full power of contemporary regular expression syntax now embedded in the Unicode regular expression functions. What I suggest is intended to mimic the recent integration of Python functionality into Stata to extend Stata's capabilities to tools written in Python, and before that, the separation of Mata into a purpose-built language rather than gluing it onto the existing Stata matrix commands.

                                Where it makes sense, Stata should embrace existing programming frameworks rather than strive to become the Esperanto of statistical programming languages.

                                Comment

                                Working...
                                X