first non-missing value within a variable

Lucy Zhao

Join Date: Jan 2017

Posts: 9
#1

first non-missing value within a variable

04 Jan 2017, 10:57

Dear Statalists,

I have the following

clear
input byte(id x)
1 .
1 18
1 12
1 20
2 .
2 .
2 15
2 22
end

And I would like to generate a variable -- call it frstnmx --- that takes the first non-missing value of x by id:

clear
input byte(id x) float var3
1 . 18
1 18 18
1 12 18
1 20 18
2 . 15
2 . 15
2 15 15
2 22 15
end

I found several solutions to similar questions in the archive.

The solution from Scott Merryman (http://www.stata.com/statalist/archi.../msg00509.html)
--
by id: egen frstnmx=min(x)
--
would not work as the first non-missing value is not necessarily the smallest one.

The solution from Sebastian F. Buechte (http://www.stata.com/statalist/archi.../msg00090.html)
---
gen x=.;
by id: replace x=cond(var3[_n]==.,var3[_n+1],.);
---
works with id==1 but not id==2.

I also tried solutions from Nick Cox and David Airey (http://www.stata.com/statalist/archi.../msg00540.html)
---
by id: gen obs = _n
by id: summarize obs if !missing(x), meanonly
by id: replace x = x[r(min)] if _n < r(min)
---
but it did not work (no changes made to the data).

I found a discussion from Nick Cox and Eric A. Booth on generating variable taking first non-zero value in a row (http://www.stata.com/statalist/archi.../msg00953.html) useful; but still I wasn't able to come up with a solution to my question.

Please help; and many thanks!
Lucy
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35405
#2

04 Jan 2017, 11:29

There is some inconsistency here over variable names, what you have and what you want.

http://www.stata.com/support/faqs/da...t-occurrences/ is a systematic discussion you missed.

Of the "solutions" you cite

Merryman: a different problem.

Buechte: the code attributed to Sebastian is not that given in his post. You have changed variable names between examples. This could probably be puzzled out with infinite time and patience.

Cox and Airey: the code is not what is suggested in the link, but your own attempt to generalise the code, which won't work. The only accessible value of r(min) will be after the last summarize in any batch.

Cox and Booth: a different problem.

A serious difficulty with your example is the lack of a time or other order variable, indicating what is first, second, etc. That means that the order is fragile to sorts of the data.

Here is one approach:

Code:

clear input byte(id x) 1 . 1 18 1 12 1 20 2 . 2 . 2 15 2 22 end sort id, stable by id: gen long obsno = _n by id : gen countnonmissing = sum(!missing(x)) if !missing(x) bysort id (countnonmissing) : gen firstnonmissing = x[1] sort id obsno list, sepby(id) +---------------------------------------+ | id x obsno countn~g firstn~g | |---------------------------------------| 1. | 1 . 1 . 18 | 2. | 1 18 2 1 18 | 3. | 1 12 3 2 18 | 4. | 1 20 4 3 18 | |---------------------------------------| 5. | 2 . 1 . 15 | 6. | 2 . 2 . 15 | 7. | 2 15 3 1 15 | 8. | 2 22 4 2 15 | +---------------------------------------+

Naturally you can call the variables anything you like that is legal. More crucially, the FAQ cited above explains the logic here.
2 likes
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

04 Jan 2017, 12:40

A slightly different version of Nick's code; honestly, I think it came from him originally in an earlier post. But any problems with it are entirely the fault of the current author.

Code:

clear
input byte(id x)
1 .
1 18
1 12
1 20
2 .
2 .
2 15
2 22
3 33
3 .
3 31
end

sort id, stable
gen long obsno = _n
gen missing = missing(x)
bysort id (missing obsno) : gen firstnonmissing = x[1]

sort id obsno
list, sepby(id)

Code:

     +--------------------------------------+
     | id    x   obsno   missing   firstn~g |
     |--------------------------------------|
  1. |  1    .       1         1         18 |
  2. |  1   18       2         0         18 |
  3. |  1   12       3         0         18 |
  4. |  1   20       4         0         18 |
     |--------------------------------------|
  5. |  2    .       5         1         15 |
  6. |  2    .       6         1         15 |
  7. |  2   15       7         0         15 |
  8. |  2   22       8         0         15 |
     |--------------------------------------|
  9. |  3   33       9         0         33 |
 10. |  3    .      10         1         33 |
 11. |  3   31      11         0         33 |
     +--------------------------------------+

Last edited by William Lisowski; 04 Jan 2017, 12:43.

Comment

Lucy Zhao

Join Date: Jan 2017

Posts: 9
#4

04 Jan 2017, 14:30

Thank you so much Nick and William! I had been struggling on this for several hours before I posted the question, and your approach works great and I am most appreciative!

Last edited by Lucy Zhao; 04 Jan 2017, 14:33.
Comment
Rebecca Water

Join Date: Sep 2018

Posts: 44
#5

27 Nov 2018, 20:42

To obtain the last non missing value of a variable (per id) instead of the first non missing value:
Is this code doing the job correctly? ( Referring to Nick Cox's solution )

Code:

by id: egen lastobs = max(countnonmissing) bysort id (countnonmissing): egen lastnonmissing = mean(x/(obsno==lastobs))

Ps. I tried this first but it only worked out if the maximum of obsno = maximum of countnonmissing.

Code:

by id: egen lastobs = max(countnonmissing) bysort id (countnonmissing): gen lastnonmissing = x[_N]
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35405

28 Nov 2018, 00:28

Rebecca Water There are many ways to get at last non-missing. This is a current favourite of mine.

Code:

clear
input byte(id x t)
1 .   1 
1 18  2
1 12  3
1 20  4
1 .   5
2 .   1
2 .   2
2 15  3
2 22  4
end 

gen byte OK = !missing(x) 
bysort id (OK t) : gen last_nm = x[_N] 
sort id t 

list, sepby(id) 

     +----------------------------+
     | id    x   t   OK   last_nm |
     |----------------------------|
  1. |  1    .   1    0        20 |
  2. |  1   18   2    1        20 |
  3. |  1   12   3    1        20 |
  4. |  1   20   4    1        20 |
  5. |  1    .   5    0        20 |
     |----------------------------|
  6. |  2    .   1    0        22 |
  7. |  2    .   2    0        22 |
  8. |  2   15   3    1        22 |
  9. |  2   22   4    1        22 |
     +----------------------------+

Announcement

first non-missing value within a variable

Comment

Comment

Comment

Comment

Comment