Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    The disagreement of the methods is only over sets which contain only sequences of 0s and missing values.

    The issue here is that the OP's description does not include instructions what to do when the set contains only 0s and missings.

    Comment


    • #17
      I agree that the statement of the problem in post #1 was incomplete: what's missing at a minimum is the specification of the expected result for each id. What appeared obvious to the OP was in fact ambiguous.

      But the description in post #1 and the topic title "return first non-zero value" suggest that a non-zero value is sought and a zero would not the desired result when no other value is available.

      What is ambiguous is whether Stata missing value codes in the data are to be considered, in the spirit of Stata, a "non-zero" value — after all, .!=0 — or in the spirit of common usage, "missing" and thus no value at all. This makes a difference for id 1, with the sequence . 0 3: in the former case, the . would be returned, in the latter case, the 3 would be returned.

      Comment


      • #18
        Sorry to bother, but again I notice a typo in my code. The negligence happens with typing directly on an iPad while traveling. Instead of
        Code:
        egen awanted = sum(x*(sum(x/x)==1))
        , what I did intent is:
        Code:
        egen awanted2 = total(x*(sum(x/x)==1))

        While -egen sum() - and -egen total()- have been documented to have the same outputs, the latter would explain my code more clearly.

        I am also attractted by the interesting discussion of Joro Kolev in #14. Following the same method, I notice that for this specific case, -egen total()- is about 10% faster.

        Comment


        • #19
          Dear All, Many thanks for the helpful suggestions.
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment


          • #20
            So, River Huang, glad that the suggestions helped, but in the real problem is there a variable recording order within panels?

            Comment


            • #21
              Dear Nick, I do not understand your question (is there a variable recording order within panels?).

              Ho-Chuan (River) Huang
              Stata 19.0, MP(4)

              Comment


              • #22
                What defines "first"? Is it just order in the present dataset -- in which case that is fragile to sorting -- or is there a time or other variable that defines sequence or order explicitly?

                Comment


                • #23
                  There were 17 posts between your question in post #1 and your "thank you" in post #19.

                  The Statalist FAQ tells us

                  Trying to wrap up a thread you started is helpful, especially if you report what solved your problem.
                  The "thank you" would have been improved by acknowledging the ambiguities of the question you posed in post #1 and the additional effort those ambiguities generated.

                  As a long-time contributor to Statalist you surely suspected that "by" would be part of the answer, which requires a notion of order, which suggests that the example data should have included something (a date perhaps) that establishes the order of the observations within each id. The example data looks as though it were created by hand, so it would have benefitted from adding not only the sequencing variable but also the desired result. So those who wanted to help were left with at least four questions to which we received no further guidance.
                  • "return first non-zero value" - what does "return" mean - create a new variable constant within each ID? or produce a collapsed dataset with one observation for each id? or write some sort of function that returns a value? (That's the standard use of the word "return" in Stata.)
                  • What defines the order of the observations?
                  • How should missing values in the data be treated - as a non-zero value or not?
                  • What should the result be if there are no non-zero values?
                  Taking the time to address these concerns in post #19 would have acknowledged the time we each spent on this and would have improved this topic for others who find it later.

                  Comment


                  • #24
                    Dear @Nick Cox and @William Lisowski, As a matter of fact, the question was asked by someone else (not me) here (in Chinese). However, it seems to me that the methods of Clyde Schechter, William Lisowski, Romalpa Akzo, Nick Cox, and Joro Kolev all offer the right solution. I post the code of Clyde Schechter (because he was the first one to answer the question) here, and the one who asked this question seemed satisfied. That's the reason that I am unable to clarify the ambiguities. But my conjecture would be (the comments of @William Lisowski):
                    • Like -collapse- command (first, firstnm), for each `id', return one value.
                    • The order is just like the sample (or observations) shows in the first place.
                    • Do not return missing values and zeros. (I know that this is awkward if all the values of some `id' are either missing or zeros, but let's suppose that there is no such `id' in the sample).
                    • As stated above.
                    Sorry for that, and thanks again.
                    Ho-Chuan (River) Huang
                    Stata 19.0, MP(4)

                    Comment

                    Working...
                    X