Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assign the 1st or last non-missing value of a variable in group to whole group

    Dear Colleagues,

    I am processing a survey panel data of children and youth that have several waves of interviews. A few variables I am interested in are collected only in one or two waves and for children at certain ages. I would like to identify the first available/non-missing values of these variables and assign them as new variables for the whole group. I would also like to do the same for the last non-missing values.
    Like in the table below, the variable var1 is only available in certain waves. I would like to create new variables: var1_1st for first non-missing values, and var1_last for last non-missing values. Is there any way I can get this done efficiently? Thanks.
    child wave var1 var1_1st var1_last
    1 1 . 20 30
    1 2 20 20 30
    1 3 30 20 30
    1 4 . 20 30
    2 1 . 34 32
    2 3 34 34 32
    2 4 32 34 32

  • #2
    This seems to do what you ask.
    Code:
    generate var1_first = .
    generate var1_last  = .
    sort child wave
    by child (wave): replace var1_first = cond(var1_first[_n-1]!=.,var1_first[_n-1],var1)
    by child (wave): replace var1_last  = cond(var1!=.,var1,var1_last[_n-1])
    by child (wave): replace var1_first = var1_first[_N]
    by child (wave): replace var1_last  = var1_last[_N]
    list, sepby(child) abbreviate(12)
    Code:
    . list, sepby(child) abbreviate(12)
    
         +----------------------------------------------+
         | child   wave   var1   var1_first   var1_last |
         |----------------------------------------------|
      1. |     1      1      .           20          30 |
      2. |     1      2     20           20          30 |
      3. |     1      3     30           20          30 |
      4. |     1      4      .           20          30 |
         |----------------------------------------------|
      5. |     2      1      .           34          32 |
      6. |     2      3     34           34          32 |
      7. |     2      4     32           34          32 |
         +----------------------------------------------+

    Comment


    • #3
      Thanks, Williams,

      Your codes work perfectly. I have not used the -cond( )- function before. It took me a while to understand the logic of your codes.
      By the way, if I want to just identify those observations of the 1st or last non-missing values of a variable in a group (var1nomiss_1st=1 or 0) instead of assigning its values to the whole group, what codes can I use. Many thanks.

      Comment


      • #4
        Code:
        generate var1_first = .
        generate var1_last  = .
        sort child wave
        by child (wave): replace var1_first = cond(var1_first[_n-1]!=.,var1_first[_n-1],var1)
        by child (wave): replace var1_last  = cond(var1!=.,var1,var1_last[_n-1])
        by child (wave): replace var1_first = var1_first[_N]
        by child (wave): replace var1_last  = var1_last[_N]
        by child (wave): generate var1_obsf = var1==var1_first & var1[_n-1]==.
        by child (wave): generate var1_obsl = var1==var1_last  & var1[_n+1]==.
        list, sepby(child) abbreviate(12)
        Code:
        . list, sepby(child) abbreviate(12)
        
             +----------------------------------------------------------------------+
             | child   wave   var1   var1_first   var1_last   var1_obsf   var1_obsl |
             |----------------------------------------------------------------------|
          1. |     1      1      .           20          30           0           0 |
          2. |     1      2     20           20          30           1           0 |
          3. |     1      3     30           20          30           0           1 |
          4. |     1      4      .           20          30           0           0 |
             |----------------------------------------------------------------------|
          5. |     2      1      .           34          32           0           0 |
          6. |     2      3     34           34          32           1           0 |
          7. |     2      4     32           34          32           0           1 |
             |----------------------------------------------------------------------|
          8. |     3      1     41           41          43           1           0 |
          9. |     3      2      .           41          43           0           0 |
         10. |     3      3     43           41          43           0           1 |
         11. |     3      4      .           41          43           0           0 |
             +----------------------------------------------------------------------+

        Comment


        • #5
          Hi William.

          Thanks for your quick reply. The codes works well.

          Comment


          • #6
            Hi William.

            Thanks for your quick reply. The codes works well.

            Comment


            • #7
              For your information, egenmore from SSC contains first() and lastnm() functions. Like @William Lisowski's helpful code they hinge on sort order and use of by:.

              Code:
              . ssc type _gfirst.ado
              *! 1.0.0 NJC 31 May 2000 
              program define _gfirst 
                      version 6.0
                      gettoken type 0 : 0
                      gettoken g 0 : 0
                      gettoken eqs 0 : 0
                      syntax varname [if] [in] [, BY(varlist) ] 
                      marksample touse, strok
                      tempvar order 
                      gen long `order' = _n 
                      sort `touse' `by' `order' 
                      * ignore user-supplied `type' 
                      local type : type `varlist' 
                      qui by `touse' `by' : gen `type' `g' = `varlist'[1] if `touse'
              end
              
              . ssc type _glastnm.ado
              *! 1.0.0 NJC 31 May 2000 
              program define _glastnm 
                      version 6.0
                      gettoken type 0 : 0
                      gettoken g 0 : 0
                      gettoken eqs 0 : 0
                      syntax varname [if] [in] [, BY(varlist) ] 
                      marksample touse, strok
                      tempvar order 
                      gen long `order' = _n 
                      sort `touse' `by' `order' 
                      * ignore user-supplied `type' 
                      local type : type `varlist' 
                      qui by `touse' `by' : gen `type' `g' = `varlist'[_N] if `touse'
              end

              Comment


              • #8
                Many thanks, Nick.

                The is very helpful. I tried out egen first() function for assigning first non-missing value of a variable to the group. It works well except that the first non-missing value of the variable is assigned only to observations with non-missing values for that variable. So it requires another step (such as replace mean) to assign the 1st non-missing value to the other observations of the group. I found out about this in one of your posts in statalist: https://www.stata.com/statalist/arch.../msg01028.html

                Comment

                Working...
                X