Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The cond() function is doing something dysfunctional, in my view. Am I doing something silly, or is this silly design of the function?

    Good afternoon,

    The cond() function is failing me in a situation where I think it should work.

    In my programs I have typically a macro called weights which might contain the weights if the user has specified them, or might be empty if the user has not specified weights.

    I am trying the following expression expression with the following purpose: if the user has specified the weights to put them in, and if the user has not specified them to put in 1.

    Code:
    gen myweights = cond(missing("`weights'"), 1, `weights')
    So when the weights are specified, it goes through as it should, e.g.,
    Code:
    . local bla = 3
    
    . dis cond(missing("`bla'"),1,`bla')
    3
    But when the macro bla is not defined, instead of giving me 1, the cond() throws an error

    Code:
    . macro drop _all
    
    . dis `bla'
    
    
    . dis cond(missing("`bla'"),1,`bla')
    invalid syntax
    r(198);
    I think I know what is happening here, when the macro bla is empty the condition evaluates to

    Code:
    . dis cond(missing("`bla'"),1,)
    and the third position is empty. Yes, but given that the condition is true, I am not supposed to reach the third position, I am supposed to go to the second position.



  • #2
    This seems like perfectly normal behavior. The syntax for cond is cond(x,a,b [,c]), where x, a, and b are required arguments.

    If bla doesn't exist,
    Code:
     cond(missing("`bla'"),1,`bla')
    is a syntax error in the same way as
    Code:
    cond(1,1,)
    and
    Code:
    cond(0,,0)
    are syntax errors. Even though the omitted argument is not "reached," it is still required by the syntax.

    I don't see how that can be described as dysfunctional since it adheres to syntax outlined in the documentation.

    Comment


    • #3
      I can describe it as dysfunctional easily, because the function does not provide the functionality I am looking for, and not for substantive reasons (substantive reason would be that what I am asking is logically impossible) but rather does not provide the functionality because at design stage it did not occur to anybody that there might be applications with empty "slots" that are never reached.

      In any case, for the progeny encountering this thread, I overcame the problem. It sadly took more than an hour to come up with a convolution that overcomes this design issue in cond().

      The solution is to add a 0, +0, like so

      Code:
      . macro drop _all
      
      . dis cond(missing("`bla'"), 1, `bla')
      invalid syntax
      r(198);
      
      . dis cond(missing("`bla'"), 1, `bla'+0)
      1
      
      . local bla = 7
      
      . dis cond(missing("`bla'"), 1, `bla'+0)
      7

      Comment


      • #4
        This is how I typically deal with situations like that:

        Code:
        if "`weights'" == "" {
            gen myweights = 1
        }
        else {
            gen myweights = `weights'
        }
        It is more lines of code, but I find it easier to read (though I am obviously not a completely neutral party...).
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Maarten's solution is most appropriate here.
          Ali has pointed out a syntax error which I agree with. Keep in mind that macro substitution is evaluated at runtime, so it should rightfully fail when a parameter is fully missing. The same is true of all other functions in Stata and Mata, so why should cond() be any different?

          As I see it, you are misusing -cond()- for. another reason. The tested condition implies that you intend for the local macros weight to hold a variable name specifying weights, as is common practice. In your example application, if weights are specified, Stata defaults to the first value of weight, which would be logically unwanted, but would not throw an error.

          Comment


          • #6
            Code:
            . gen weights = 1
            
            . if "`weights'" != "" replace weights = `weights'
            is 2 lines. The idea is that replace is a second command and Stata won't even look at it if the previous command -- the if command -- doesn't return true.

            Comment


            • #7
              Philosophising about stuff is one thing, and I will leave it for the next post. On substance, Leonardo Guizzetti , the code comes from the other thread on which you were helping me
              https://www.statalist.org/forums/for...e-a-percentile
              The code in question is
              Code:
              gen double `myweights' = cond(!missing(`x'),cond(!missing("`weights'"),`weights'+0,1),.)
              Not only that I am not misusing cond(), but also my code is a pure stroke of genius.

              Indeed what Maarten Buis shows is the classical way of dealing with the issue -- what Maarten shows is the style of Stata Corp.

              If I go for the classical style, achieving what I have achieved above will take me half a page of statements.

              What I am achieving in one short line is the following:

              1. I check whether the variable `x' is missing, and if it is missing, I set `myweights' variable to missing as well. This is a vector operation, observation by observation.

              2. If the variable `x' is not missing, I proceed to the internal condition, and the internal condition is checking whether the string `weights' is missing, basically missing("`weights'") does absolutely the same as
              if "`weights'" == ""
              but in my view in more elegant way.
              3. If the `weights' string is empty, I set my variable `myweights' equal to 1 for each observation.
              4. If the `weights' string is not empty, I set my variable `myweights' to `weights'.

              Pure brilliance in one line of code .

              Comment


              • #8
                My philosophising on the issue is that

                1. Syntax errors are of two types. Type one: the user is trying to do something illogical; type two: the designer did not think it through very well when he designed the command. In my view this situation is type two syntax error.

                2. I, unlike Leonardo think this syntax error is very much un-Stata. My vague impression is that Stata is mostly parsing on spaces, and omitting spaces does not cause problems generally. In other words the syntax of cond() is very pedantic, which is totally not the case when I run a regression, whose full syntax is
                regress depvar [indepvars] [if] [in] [weight] [, options]
                so if regress was as pedantic as cond(), every time when I want to do
                reg y x
                I would have to specify ifs, ins weights, one ton of options etc. Thankfully I do not need to do that.

                3. I disagree with Leonardo and Maarten that the classical way of doing this, that Maarten showed "is most appropriate here" or most readable. In fact when I read code written by Stata Corp, often times I am thinking to myself "Are these guys paid by number of lines of code they write??? If I wanted to, I could not artificially stretch what is being done here on so many lines..."
                And I do not find this stretched on as many lines as possible code desirable or readable at all. I do not like scrolling down, I forget what is defined above, because I cannot see it anymore when I scroll down... it is nasty.
                I like code which is packed more or less so that I can with a glimpse see it all on my screen. However the only programs that I have written are simple ones fitting on a page or two. The guys at Stata Corp write programs that are long and complicated, maybe for complicated programs this stretching of the programs on many lines makes sense. To me it does not, but what do I know...

                Comment


                • #9
                  Joro, I think we disagree mostly on style and personal preference. I did not mean to provoke an argument.

                  If you are happy to condense multiple lines of logic and code into a single line, that is fine and you may do what makes you happy. My preference is that such condensed logic can be harder to understand when debugging or when revisiting the code later. (This type of coding reminds me very much of Perl, where some Perl programmers, somewhat tongue-in-cheek, pride themselves on taking this condensing to pathologic extremes.)

                  On point 2, I was reacting to the literal pattern from post #1 (below) as not being Stata-like in syntax.

                  Code:
                  cond(missing("`weights'"), 1, `weights')
                  The reason being is that this line is resolved in one of two ways at runtime, depending on whether local -weights- is defined (with non-null content) or not.

                  Code:
                  cond(missing("`weights'"), 1,   )   // <--  nothing here when -weights- not defined, and cond() requires the first 3 arguments. Stata complains because one argument is required but has not been given.
                  cond(missing("`weights'"), 1, <weight var>)  // <-  something here, no problems from the syntax perspective.
                  So my reaction is that this is a violation of both Stata and Mata function syntax, so it would not be consistent if Stata did allow non-existent arguments (note: I don't mean evaluate to missing, I mean literally nothing) for some functions and not for others.

                  Lastly, I agree that your solution does work, but the way it works relies purely on the notion that Stata allows an implicit zero for the missing argument to addition. In this way, it may be clever but I don't view it as good form.

                  Code:
                  . di + 0   // implicitly, Stata views the absent left-hand argument as an implicit zero. This is not allowed in Mata.
                  0
                  So your further example again is resolved in two different ways.

                  Code:
                  gen double `myweights' = cond(!missing(`x'),cond(!missing("`weights'"),`weights'+0,1),.)
                  If -weights- is not defined, then the inner conditions resolves to

                  Code:
                     cond(!missing("`weights'"),`weights'+0,1)
                  = cond(!missing("`weights'"),+0,1)
                  = cond(!missing("`weights'"),0,1)  //  <-- there is something in the 2nd position argument. Without the +0 bit, Stata complains.
                  and when -weights- is defined, then it resolves to

                  Code:
                     cond(!missing("`weights'"),`weights'+0,1)
                  = cond(!missing("`weights'"),`<weights var>,1)   //  <-- again, there is something in the 2nd position argument
                  As an aside, parsing on spaces is irrelevant to this discussion, as is the conflation between how a Stata parses the -syntax- specification for a command (like -regress-) with the Stata interpreter parsing arguments to a function (cond).

                  Comment


                  • #10
                    This type of coding reminds me very much of Perl, where some Perl programmers, somewhat tongue-in-cheek, pride themselves on taking this condensing to pathologic extremes.
                    If you'll excuse my introducing a moment if levity, the Wikipedia article at

                    https://en.wikipedia.org/wiki/Write-only_language

                    elaborates on the idea of write-only code as code which once written cannot be read with comprehension. I am pleased to see not only Perl in the list of write-only languages, but also APL (the first language I heard so described, although I suspect the earlier LISP would have been characterized as write-only), the TECO text editor (for which is has been asserted that any string of randomly typed characters is in fact interpretable as a valid TECO command sequence), and my newest favorite, the regular expression syntax. I'm concerned that I have substantial experience with all four of these. Perhaps that explains the low citation count for my doctoral dissertation.

                    Comment


                    • #11
                      Thanks for this, William. I hadn't heard of write-only code but I was thinking instead of code golf, which was apparently first used with Perl.

                      Originally posted by William Lisowski View Post
                      If you'll excuse my introducing a moment if levity, the Wikipedia article at

                      https://en.wikipedia.org/wiki/Write-only_language

                      elaborates on the idea of write-only code as code which once written cannot be read with comprehension. I am pleased to see not only Perl in the list of write-only languages, but also APL (the first language I heard so described, although I suspect the earlier LISP would have been characterized as write-only), the TECO text editor (for which is has been asserted that any string of randomly typed characters is in fact interpretable as a valid TECO command sequence), and my newest favorite, the regular expression syntax. I'm concerned that I have substantial experience with all four of these. Perhaps that explains the low citation count for my doctoral dissertation.

                      Comment


                      • #12
                        #10 reminds me of when I used to maintain a paper-based filing system for my professional and financial records. I always referred to it as write-only memory.

                        Comment


                        • #13
                          Originally posted by Joro Kolev View Post
                          If I go for the classical style, achieving what I have achieved above will take me half a page of statements.
                          I don't think so ...

                          Code:
                          if ( missing("`weight'") ) local weight 1
                          generate double myweight = `weight' if !missing(`x')
                          Yes, it is two lines, but it took me about 20 seconds to come up with it; I needed more time to understand the one nested cond() line than to think of and type the two lines above. My guess is, I will be able to understand what those two lines do a month and even a year from now in a couple of seconds; I cannot say the same for the nested cond() line. But that is me and everyone feels differently about coding style.

                          Edit: Fun fact: my two lines of code actually contain fewer characters than the nested cond() line.
                          Last edited by daniel klein; 26 Apr 2021, 02:39.

                          Comment


                          • #14
                            Daniel,

                            1. On historical note, the one-liner syntax that you used, and the one-liner syntax Nick used in #6 is a "recent" development, e.g., this syntax works in Stata11 and on, but it does not work in Stata 7.
                            My allergies to this structure, date back to Stata 7, when this structure had to be spelled out the way how Maarten showed in #6: condition, open brace, newline what you want to do, newline closing brace. Basically any time you need to use the simplest If or Else, your lines of code are multiplied by 3.

                            2. I am aware of the structure you and Nick used, this is the structure I have used up to yesterday, when I decided to "innovate". With the minor difference that I use the variation
                            if missing("`local'")
                            rather than the structure that I have seen in all ado files written by other people, and Nick used in #6
                            if "`local'" == ""
                            If you wrote
                            if ( missing("`weight'") )
                            because you are mimicking what was said before on this thread, you already see here the value of thinking about how we do stuff. Sometimes you just see something, and you think to yourself , "OK, this
                            if missing("`weight'")
                            looks more pretty, self explanatory, and parsimonious than
                            if "`weights'" == ""
                            probably we should switch to the former."
                            If you wrote what you wrote because this is how you are doing stuff, we are agreed on this point at least: using the missing function is more elegant.

                            3. What you showed is not restricted to the used of cond() in my example. Very generally cond() can be replaced by 2 lines of code, always and everywhere. Then we can philosophise on the topic of whether convenience functions such as missing() and cond() should exist at all.

                            4. There are applications where you want to do stuff in one shot. I do not have much against doing stuff on many lines, as long as it is not overdone as in my previous complaints causing the need to scroll down, and moving previous definitions in an area where you can no longer see... But more lines, come with even more lines, because you need to define variables/locals/scalars/etc. to hold your stuff. And here comes the personal taste, I prefer less lines with the risk of being more complicated to read, rather than many lines which bring with them many more defined objects which you need to remember, and the feature of as you are going down, not being able to see the definitions above.

                            5. Finally, you are doing hindsight analysis, and in hindsight I do not have strong preference for my construct over what Nick or you showed. But when I was working on the problem in real time,
                            a) I roughly estimated that I have 2 conditions, given that I can always do one condition with 2 lines, I estimated that I would have to write 4 lines of code, which with a hindsight now is not true because it simplifies and can be done on two lines.
                            b) I chose to go for my structure to save 4 lines.
                            c) the structure did not work, I got annoyed, because I think that it should be working, and my mind was already not on my initial problem, but on making my new brilliant structure work.




                            Originally posted by daniel klein View Post

                            I don't think so ...

                            Code:
                            if ( missing("`weight'") ) local weight 1
                            generate double myweight = `weight' if !missing(`x')
                            Yes, it is two lines, but it took me about 20 seconds to come up with it; I needed more time to understand the one nested cond() line than to think of and type the two lines above. My guess is, I will be able to understand what those two lines do a month and even a year from now in a couple of seconds; I cannot say the same for the nested cond() line. But that is me and everyone feels differently about coding style.

                            Comment


                            • #15
                              Originally posted by Joro Kolev View Post
                              [...] when I was working on the problem in real time,
                              a) I roughly estimated that I have 2 conditions, given that I can always do one condition with 2 lines, I estimated that I would have to write 4 lines of code, which with a hindsight now is not true because it simplifies and can be done on two lines.
                              That is interesting because I thought about this problem from a different perspective in the first place. I did not see two conditions;. I saw two separate steps in chronological order:

                              Step 1: check whether an argument was (correctly) specified, and set/change default values if necessary
                              Step 2: apply/use the argument


                              That said, I tend to get caught up in implementing "clever" stuff all the time -- outsmarting my future self more often that I would like to admit.

                              Comment

                              Working...
                              X