Assign the 1st or last non-missing value of a variable in group to whole group

Lijun Chen

Join Date: Mar 2018

Posts: 18
#1

Assign the 1st or last non-missing value of a variable in group to whole group

17 Nov 2020, 13:40

Dear Colleagues,

I am processing a survey panel data of children and youth that have several waves of interviews. A few variables I am interested in are collected only in one or two waves and for children at certain ages. I would like to identify the first available/non-missing values of these variables and assign them as new variables for the whole group. I would also like to do the same for the last non-missing values.
Like in the table below, the variable var1 is only available in certain waves. I would like to create new variables: var1_1st for first non-missing values, and var1_last for last non-missing values. Is there any way I can get this done efficiently? Thanks.

child wave var1 var1_1st var1_last

1 1 . 20 30

1 2 20 20 30

1 3 30 20 30

1 4 . 20 30

2 1 . 34 32

2 3 34 34 32

2 4 32 34 32
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

17 Nov 2020, 13:57

This seems to do what you ask.

Code:

generate var1_first = .
generate var1_last  = .
sort child wave
by child (wave): replace var1_first = cond(var1_first[_n-1]!=.,var1_first[_n-1],var1)
by child (wave): replace var1_last  = cond(var1!=.,var1,var1_last[_n-1])
by child (wave): replace var1_first = var1_first[_N]
by child (wave): replace var1_last  = var1_last[_N]
list, sepby(child) abbreviate(12)

Code:

. list, sepby(child) abbreviate(12)

     +----------------------------------------------+
     | child   wave   var1   var1_first   var1_last |
     |----------------------------------------------|
  1. |     1      1      .           20          30 |
  2. |     1      2     20           20          30 |
  3. |     1      3     30           20          30 |
  4. |     1      4      .           20          30 |
     |----------------------------------------------|
  5. |     2      1      .           34          32 |
  6. |     2      3     34           34          32 |
  7. |     2      4     32           34          32 |
     +----------------------------------------------+

Comment

Lijun Chen

Join Date: Mar 2018

Posts: 18
#3

19 Nov 2020, 14:01

Thanks, Williams,

Your codes work perfectly. I have not used the -cond( )- function before. It took me a while to understand the logic of your codes.
By the way, if I want to just identify those observations of the 1st or last non-missing values of a variable in a group (var1nomiss_1st=1 or 0) instead of assigning its values to the whole group, what codes can I use. Many thanks.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

19 Nov 2020, 15:35

Code:

generate var1_first = .
generate var1_last  = .
sort child wave
by child (wave): replace var1_first = cond(var1_first[_n-1]!=.,var1_first[_n-1],var1)
by child (wave): replace var1_last  = cond(var1!=.,var1,var1_last[_n-1])
by child (wave): replace var1_first = var1_first[_N]
by child (wave): replace var1_last  = var1_last[_N]
by child (wave): generate var1_obsf = var1==var1_first & var1[_n-1]==.
by child (wave): generate var1_obsl = var1==var1_last  & var1[_n+1]==.
list, sepby(child) abbreviate(12)

Code:

. list, sepby(child) abbreviate(12)

     +----------------------------------------------------------------------+
     | child   wave   var1   var1_first   var1_last   var1_obsf   var1_obsl |
     |----------------------------------------------------------------------|
  1. |     1      1      .           20          30           0           0 |
  2. |     1      2     20           20          30           1           0 |
  3. |     1      3     30           20          30           0           1 |
  4. |     1      4      .           20          30           0           0 |
     |----------------------------------------------------------------------|
  5. |     2      1      .           34          32           0           0 |
  6. |     2      3     34           34          32           1           0 |
  7. |     2      4     32           34          32           0           1 |
     |----------------------------------------------------------------------|
  8. |     3      1     41           41          43           1           0 |
  9. |     3      2      .           41          43           0           0 |
 10. |     3      3     43           41          43           0           1 |
 11. |     3      4      .           41          43           0           0 |
     +----------------------------------------------------------------------+

Comment

Lijun Chen

Join Date: Mar 2018

Posts: 18
#5

19 Nov 2020, 16:34

Hi William.

Thanks for your quick reply. The codes works well.
Comment
Lijun Chen

Join Date: Mar 2018

Posts: 18
#6

19 Nov 2020, 16:35

Hi William.

Thanks for your quick reply. The codes works well.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35445

19 Nov 2020, 17:55

For your information, egenmore from SSC contains first() and lastnm() functions. Like @William Lisowski's helpful code they hinge on sort order and use of by:.

Code:

. ssc type _gfirst.ado
*! 1.0.0 NJC 31 May 2000 
program define _gfirst 
        version 6.0
        gettoken type 0 : 0
        gettoken g 0 : 0
        gettoken eqs 0 : 0
        syntax varname [if] [in] [, BY(varlist) ] 
        marksample touse, strok
        tempvar order 
        gen long `order' = _n 
        sort `touse' `by' `order' 
        * ignore user-supplied `type' 
        local type : type `varlist' 
        qui by `touse' `by' : gen `type' `g' = `varlist'[1] if `touse'
end

. ssc type _glastnm.ado
*! 1.0.0 NJC 31 May 2000 
program define _glastnm 
        version 6.0
        gettoken type 0 : 0
        gettoken g 0 : 0
        gettoken eqs 0 : 0
        syntax varname [if] [in] [, BY(varlist) ] 
        marksample touse, strok
        tempvar order 
        gen long `order' = _n 
        sort `touse' `by' `order' 
        * ignore user-supplied `type' 
        local type : type `varlist' 
        qui by `touse' `by' : gen `type' `g' = `varlist'[_N] if `touse'
end

Comment

Lijun Chen

Join Date: Mar 2018

Posts: 18
#8

19 Nov 2020, 21:34

Many thanks, Nick.

The is very helpful. I tried out egen first() function for assigning first non-missing value of a variable to the group. It works well except that the first non-missing value of the variable is assigned only to observations with non-missing values for that variable. So it requires another step (such as replace mean) to assign the 1st non-missing value to the other observations of the group. I found out about this in one of your posts in statalist: https://www.stata.com/statalist/arch.../msg01028.html
Comment

Announcement

Assign the 1st or last non-missing value of a variable in group to whole group

Comment

Comment

Comment

Comment

Comment

Comment

Comment