Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ncik - no, not that. i was not concise enough in my description of what I was thinking!
    i am looking for an on-the-fly variable creation.
    my "var1<=60" evaluates to 0 or 1 depending on the value of var1. likewise for var2. thus the tab statement that i envisioned would yield a 2x2 table
    in the ccode below, i generated two indicator variables var160 and var170 to demonstrate the desired effect. this particular example takes 3 lines of code to tabulate, but more complex conidtions would require more.

    Code:
     input var1 var2
    
              var1       var2
      1. 10 10
      2. 20 20
      3. 30 30
      4. 40 40
      5. 50 50
      6. 60 60
      7. 70 70
      8. 80 80
      9. 90 90
     10. 10 90
     11. 20 80
     12. 30 70
     13. 40 60
     14. 50 50
     15. 60 40
     16. 30 70
     17. 20 80
     18. 10 90
     19. end
    
    . tab var1 var2 if var1<=60 & var2<=70
    
               |                                     var2
          var1 |        10         20         30         40         50         60         70 |     Total
    -----------+-----------------------------------------------------------------------------+----------
            10 |         1          0          0          0          0          0          0 |         1
            20 |         0          1          0          0          0          0          0 |         1
            30 |         0          0          1          0          0          0          2 |         3
            40 |         0          0          0          1          0          1          0 |         2
            50 |         0          0          0          0          2          0          0 |         2
            60 |         0          0          0          1          0          1          0 |         2
    -----------+-----------------------------------------------------------------------------+----------
         Total |         1          1          1          2          2          2          2 |        11
    
    
    . def var160 = var1<=60
    
    . def var270 = var2<=70
    
    
    . tab var160 var270
    
               |        var270
        var160 |         0          1 |     Total
    -----------+----------------------+----------
             0 |         2          1 |         3
             1 |         4         11 |        15
    -----------+----------------------+----------
         Total |         6         12 |        18
    
    .
    i envision a way to generate the indicator variables on the fly.
    more generally, the temporary varibel need not be an indicator variable.
    the syntax engine would evaluate expressions and create a temporary variable from the expression.

    example:

    Code:
     reg y x1 x2 {x3<=5} {1/x4}
    thanks

    this would regress y against x1, x2, an indicator for x3<=5, and the value of 1/x4
    Last edited by George Hoffman; 08 Oct 2018, 07:21. Reason: correct typo

    Comment


    • My wish for the next Stata update: I would really appreciate the ability to add headers and footers to documents produced using putpdf.

      Comment


      • Originally posted by George Hoffman View Post
        Ncik - no, not that. i was not concise enough in my description of what I was thinking!
        i envision a way to generate the indicator variables on the fly.
        more generally, the temporary varibel need not be an indicator variable.
        the syntax engine would evaluate expressions and create a temporary variable from the expression.

        example:

        Code:
         reg y x1 x2 {x3<=5} {1/x4}
        thanks

        this would regress y against x1, x2, an indicator for x3<=5, and the value of 1/x4
        A basic version of this is not too difficult to implement via a separate command, though it would certainly be nice to have such a thing built in and for everything to be correctly labeled as the expression that generated the on-the-fly variable, rather than a temporary variable name. This requires Stata 14+

        Code:
        program varparse
            _on_colon_parse `0'
            local 1 `s(after)'
            local ix 0
            qui while ustrregexm(`"`1'"', "\{(.+?)\}") {
                tempvar v`++ix'
                local g`ix' = ustrregexs(1)
                gen `v`ix'' = `g`ix''
                label var `v`ix'' `"`=ustrregexs(1)'"'
                local 1 = ustrregexrf(`"`1'"', "\{(.+?)\}", `"`v`ix''"')
            }
            `1'
        end
        
        clear
        set seed 1729
        set obs 100
        gen x1 = runiform()
        gen x2 = rnormal()
        gen x3 = runiform() * 10
        gen x4 = rnormal()
        gen y  = 1 + x1 - x2 + 2 * (x3 <= 5) - 3 / x4 + rnormal() * 2
        gen var1 = int(100 * runiform())
        gen var2 = int(100 * runiform())
        
        varparse: tab {var1<=60} {var2<=70}
        varparse: reg y x1 x2 {x3<=5} {1/x4}
        This gives what you want, I think. While "tab" uses the variable label, however, reg does not. Not sure how to make that happen (perhaps swapping the variable names for the labels in regress can be added to the wishlist? It's not so obvious how to do it in esttab etc. either since the variables no longer exist in memory).

        Comment


        • George Hoffman Mauricio Caceres #134ff

          See also tabcount from a while back:

          SJ-3-4 pr0011 . . . . . . . . Speaking Stata: Problems with tables, Part II
          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
          Q4/03 SJ 3(4):420--439 (no commands)
          reviews three user-written commands (tabcount, makematrix,
          and groups) as different approaches to tabulation problems

          tabcount from http://fmwww.bc.edu/RePEc/bocode/t
          'TABCOUNT': module to tabulate frequencies, with zeros explicit / tabcount
          tabulates frequencies for up to 7 variables. Its main / distinctive
          features are that zero frequencies of one or more / specified values or
          conditions are always shown in the table / (i.e. entirely empty rows,




          Code:
          clear
          input float(var1 var2)
          10 10
          20 20
          30 30
          40 40
          50 50
          60 60
          70 70
          80 80
          90 90
          10 90
          20 80
          30 70
          40 60
          50 50
          60 40
          30 70
          20 80
          10 90
          end
          
          tabcount var1 var2, c1(<=60 >60) c2(<=70 >70)
          
          ----------------------
                    |    var2  
               var1 | <=70   >70
          ----------+-----------
               <=60 |   11     4
                >60 |    1     2
          ----------------------

          Comment


          • Nick Cox , Mauricio Caceres :

            thank you for these suggestions.
            tabcount works for tab. for other commands, i'm going to play around with varparse. it's not a complete solution but it's a great start.
            ​​​​​​​thanks again.

            Comment


            • I wish Stata stops printing lines to the command window once it encounters an error within an if condition. Especially in larger code blocks, scrolling up to find where the error actually occurred gets boring really quickly.

              Here's an example.

              Code
              Code:
              local this "example"
              if "`this'" == "example" {
                  di "`value'"
                  di "something else"
                  error 413
                  di "1. I don't want to see this line in the output"
                  di "2. I don't want to see this line either"
                  di "3. You get the idea by now"
              }
              Actual output
              Code:
              . if "`this'" == "example" {
              .         di "`value'"
              
              .         di "something else"
              something else
              .         error 413
              r(413);
              .         di "1. I don't want to see this line in the output"
              .         di "2. I don't want to see this line either"
              .         di "3. You get the idea by now"
              . }
              r(413);
              
              end of do-file
              r(413);
              What I would like
              Code:
              . if "`this'" == "example" {
              .         di "`value'"
              
              .         di "something else"
              something else
              .         error 413
              r(413);
              
              end of do-file
              
              r(413);

              Some context on when this becomes annoying. When I have a dofile that does three separate (but related) things, I enclose each block in an if-condition (e.g. if "$runA" == "1"). At the top of the dofile I can then set global runA to 1 or 0, depending on whether I want to run that part at this point. No hassle with commenting out parts, no issues with common macros not being defined yet, no remembering which lines need and need not be selected to get the thing to run. These blocks can get very long (hundreds of lines is not uncommon). Correspondingly, any small error, or even a manually added stop (/error 1) means I have to scroll all the way up to see where it happened. Yet I've never in my life needed all those printed lines, because they are verbatim copies of my dofile anyway.

              Comment


              • How about power and sample size for ROC AUC analysis, maybe similar to power.roc.test() from pROC in R?

                Comment


                • Perhaps an arcane/unfeasible request... but is it possible to split the dofile editor and "main stata" processes? Imagine you are running some heavy regressions, you might still want to work on your dofile while it's running. On modern computers with multiple CPU cores, that's often feasible in theory, given that Stata rarely ever uses all cores to their maximum capacity. In practice, the dofile editor always seems to hang up or be very slow at least.

                  I know you can edit your dofiles in separate programs, but in the end they are never as integrated with Stata as its own dofile editor, so I'd prefer to keep using it.

                  Comment


                  • I would have other ideas, but on the top of my wishlist:

                    * Ability to call an external DLL from Mata. Might need additional types to help (byte, short...) in writing more or less the equivalent of a Declare in VBA.
                    (if it's flexible enough, it would open many many manyyyyyyy other possibilities: call libraries for numerical computations and special functions, plotting, multiprecision, OS services, file I/O in other formats...), and of course add Mata functions that are not easy (or not fast enough) to implement in pure Mata code.
                    * In the preceding, a way to pass directly Mata matrices (or even a dataset) back and forth would be very valuable.
                    * Ability to call Mata functions (especially user defined ones) in Stata, in some places where only Stata functions are currently available (maybe with the help of a generic function, e.g. callmata("functionname", arg1, ...), for instance in gen/egen.
                    * Ability to plot from Mata, especially to plot data from a vector/matrix, and to update a plot with subsequent Mata code.

                    Comment


                    • A few little things I'd like (or may be unaware that exist):

                      1. Find-Replace in do-editor tells you how many instances of the find term were replaced.
                      2. Option to open multiple do-files from explorer window in one editor window rather than one Stata instance per do-file. I don't really want to have to create .stpr files for that purpose.
                      3. Option to set do file preferences (e.g. font/colors) permanently (this may already be an option--I just haven't seen it if it exists).
                      4.
                      Code:
                      inlist()
                      command that can hold more than 10 string vars directly rather than having to loop or use multiple or statements.

                      Comment


                      • Here's one I've been mulling over for a while. The way missing values work with logical expressions is often problematic. Missing value is treated as true. But in most contexts, a missing value on an expression or variable really means "could be true or false, we don't know." Adjusting simple logical expressions quickly gets complicated. When calculating a conjunction (&), we would want 1& . = ., 0 & . = 0, . & . = .. For disjunction, 1 | . = 1, 0 | . = ., . | . = ..

                        With existing Stata features you can work around this by recoding . to 0.5, and then use min(a, b) for a&b, and max(a, b) for a | b and min(a, b) for a | b. But if you have a lengthy logical expression with many operators, and perhaps parenthesized expressions nested within, this kind of translation becomes tedious and error-prone. Moreover, the resulting code is as opaque as possible.

                        Now, redefining the operation of & and | to do this would break lots of existing code, and would create chaos even if the old behavior were maintained under version control. But why not define new logical operators && and || that would behave this way? It would also be nice to have a negation operator that gave us negation of . = . !! would not work for an analogous negation operator, because !! is itself a legitimate expression of double negation (and one that I find helpful and use fairly often), so one would have to find some other expression for it (or perhaps a function). && would never clash with anything else as it is never legal syntax currently. || is "taken" as a separator between the fixed and random components of mixed models, but I think that it would seldom if ever create confusion as that is a highly distinct context that the parser could recognize, though it might require some lookahead. (In linguistic terms, I think the two meanings of || would be in complementary distribution.)

                        Comment


                        • Originally posted by Clyde Schechter View Post
                          [...] why not define new logical operators && and || that would behave this way?
                          I like it. In Julia and in Java (and perhaps in some other languages), && is known as "short-circuiting boolean AND"
                          and || is "short-circuiting boolean OR". Short-circuiting means stopping evaluation once you know that the answer can no longer change. For example, in Julia: https://docs.julialang.org/en/v1/base/math/#&&

                          Comment


                          • In general, I like Clyde's idea. However, I also see a potential for confusion. For example, the logical expression in

                            Code:
                            if (.) ...
                            evaluates to true (better: not false) and the code following the condition will be executed. What would we expect for

                            Code:
                            if (1&&.) ...
                            Best
                            Daniel
                            Last edited by daniel klein; 15 Nov 2018, 14:25.

                            Comment


                            • daniel klein Good question. I don't have a great answer for it. Certainly -if- conditions in Stata are inherently dichotomous. The command will either apply to the observation or it won't and there is no middle ground. The convention that . means not false has the virtue of consistency, and we are all on notice that in this context we have to sometimes make explicit provisions to include or exclude observations where the condition's truth value is unknown. And I don't see any why to modify that.

                              FWIW, if you have time to read a digression, back around the same time that Bill Gould was inventing Stata, I toyed with the idea of developing a statistical package myself. I developed a prototype that I used for a few years in my research. In that prototype, logical expressions evaluated to true/false/unknown, and logical variables were a separate data storage type from numbers. My equivalent of the -if- qualifier was called -where- (and it preceded the command instead of following it), and, it caused commands to be applied only to those observations where the condition evaluated as true. But I also had a structure that syntactically resembled Stata's -if- command, but with different semantics. That one allowed for indeterminate outcomes. The syntax of the structure was

                              Code:
                              if condition {
                                  do this
                              }
                              [ifnot {
                                  do that
                              }]
                              [unknown {
                                  do the other thing
                              }]
                              Note:  Square brackets denote optional components of the syntax, as in Stata reference manuals' syntax diagrams.​​​​
                              It's not all that different in principle from an -if ... else if ... else ...- structure (which my program also supported, and the parser treated them more or less the same way). But it was a convenient shorthand. You only had to state the condition once. When true, the first branch was taken, when false the second, and when unknown (missing value), the third. The semantics differed also in that this -if- structure could be used both to skip or execute commands on the entire data set (like Stata's -if- comman), and also to select observations to which the command would apply (like Stata's -if- condition.)

                              Ultimately, I decided I didn't have the entrepreneurial spirit it would take to try to really develop the package and bring it to market. When I first encountered Stata, back at version 4, I was delighted and surprised to see that its programming language was strongly similar to what I had designed. (Mine was consciously modeled on C, though with differences, and I suspect Stata's was as well.) And, where they differed, mostly I thought Stata was better.

                              Comment


                              • Clyde, thank you for sharing. Daniel, yes good question. This is Julia's implementation:
                                https://docs.julialang.org/en/v1/man...ng-Operators-1

                                Comment

                                Working...
                                X