return first non-zero value

Joro Kolev

Join Date: Aug 2018

Posts: 3047
#16

09 May 2021, 12:57

The disagreement of the methods is only over sets which contain only sequences of 0s and missing values.

The issue here is that the OP's description does not include instructions what to do when the set contains only 0s and missings.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#17

09 May 2021, 14:17

I agree that the statement of the problem in post #1 was incomplete: what's missing at a minimum is the specification of the expected result for each id. What appeared obvious to the OP was in fact ambiguous.

But the description in post #1 and the topic title "return first non-zero value" suggest that a non-zero value is sought and a zero would not the desired result when no other value is available.

What is ambiguous is whether Stata missing value codes in the data are to be considered, in the spirit of Stata, a "non-zero" value — after all, .!=0 — or in the spirit of common usage, "missing" and thus no value at all. This makes a difference for id 1, with the sequence . 0 3: in the former case, the . would be returned, in the latter case, the 3 would be returned.
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#18

09 May 2021, 15:14

Sorry to bother, but again I notice a typo in my code. The negligence happens with typing directly on an iPad while traveling. Instead of

Code:

egen awanted = sum(x*(sum(x/x)==1))

, what I did intent is:

Code:

egen awanted2 = total(x*(sum(x/x)==1))

While -egen sum() - and -egen total()- have been documented to have the same outputs, the latter would explain my code more clearly.

I am also attractted by the interesting discussion of Joro Kolev in #14. Following the same method, I notice that for this specific case, -egen total()- is about 10% faster.
2 likes
Comment
River Huang

Join Date: Mar 2016

Posts: 1906
#19

09 May 2021, 17:13

Dear All, Many thanks for the helpful suggestions.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#20

10 May 2021, 01:25

So, River Huang, glad that the suggestions helped, but in the real problem is there a variable recording order within panels?
Comment
River Huang

Join Date: Mar 2016

Posts: 1906
#21

10 May 2021, 03:54

Dear Nick, I do not understand your question (is there a variable recording order within panels?).

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#22

10 May 2021, 04:55

What defines "first"? Is it just order in the present dataset -- in which case that is fragile to sorting -- or is there a time or other variable that defines sequence or order explicitly?
4 likes
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#23

10 May 2021, 07:00

There were 17 posts between your question in post #1 and your "thank you" in post #19.

The Statalist FAQ tells us

Trying to wrap up a thread you started is helpful, especially if you report what solved your problem.

The "thank you" would have been improved by acknowledging the ambiguities of the question you posed in post #1 and the additional effort those ambiguities generated.

As a long-time contributor to Statalist you surely suspected that "by" would be part of the answer, which requires a notion of order, which suggests that the example data should have included something (a date perhaps) that establishes the order of the observations within each id. The example data looks as though it were created by hand, so it would have benefitted from adding not only the sequencing variable but also the desired result. So those who wanted to help were left with at least four questions to which we received no further guidance.
"return first non-zero value" - what does "return" mean - create a new variable constant within each ID? or produce a collapsed dataset with one observation for each id? or write some sort of function that returns a value? (That's the standard use of the word "return" in Stata.)

What defines the order of the observations?

How should missing values in the data be treated - as a non-zero value or not?

What should the result be if there are no non-zero values?

Taking the time to address these concerns in post #19 would have acknowledged the time we each spent on this and would have improved this topic for others who find it later.
3 likes
Comment
River Huang

Join Date: Mar 2016

Posts: 1906
#24

10 May 2021, 18:08

Dear @Nick Cox and @William Lisowski, As a matter of fact, the question was asked by someone else (not me) here (in Chinese). However, it seems to me that the methods of Clyde Schechter, William Lisowski, Romalpa Akzo, Nick Cox, and Joro Kolev all offer the right solution. I post the code of Clyde Schechter (because he was the first one to answer the question) here, and the one who asked this question seemed satisfied. That's the reason that I am unable to clarify the ambiguities. But my conjecture would be (the comments of @William Lisowski):
Like -collapse- command (first, firstnm), for each `id', return one value.

The order is just like the sample (or observations) shows in the first place.

Do not return missing values and zeros. (I know that this is awkward if all the values of some `id' are either missing or zeros, but let's suppose that there is no such `id' in the sample).

As stated above.

Sorry for that, and thanks again.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment