Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • first non-missing value within a variable

    Dear Statalists,

    I have the following

    clear
    input byte(id x)
    1 .
    1 18
    1 12
    1 20
    2 .
    2 .
    2 15
    2 22
    end


    And I would like to generate a variable -- call it frstnmx --- that takes the first non-missing value of x by id:

    clear
    input byte(id x) float var3
    1 . 18
    1 18 18
    1 12 18
    1 20 18
    2 . 15
    2 . 15
    2 15 15
    2 22 15
    end


    I found several solutions to similar questions in the archive.

    The solution from Scott Merryman (http://www.stata.com/statalist/archi.../msg00509.html)
    --
    by id: egen frstnmx=min(x)
    --
    would not work as the first non-missing value is not necessarily the smallest one.


    The solution from Sebastian F. Buechte (http://www.stata.com/statalist/archi.../msg00090.html)
    ---
    gen x=.;
    by id: replace x=cond(var3[_n]==.,var3[_n+1],.);
    ---
    works with id==1 but not id==2.


    I also tried solutions from Nick Cox and David Airey (http://www.stata.com/statalist/archi.../msg00540.html)
    ---
    by id: gen obs = _n
    by id: summarize obs if !missing(x), meanonly
    by id: replace x = x[r(min)] if _n < r(min)
    ---
    but it did not work (no changes made to the data).


    I found a discussion from Nick Cox and Eric A. Booth on generating variable taking first non-zero value in a row (http://www.stata.com/statalist/archi.../msg00953.html) useful; but still I wasn't able to come up with a solution to my question.

    Please help; and many thanks!
    Lucy




  • #2
    There is some inconsistency here over variable names, what you have and what you want.

    http://www.stata.com/support/faqs/da...t-occurrences/ is a systematic discussion you missed.

    Of the "solutions" you cite

    Merryman: a different problem.

    Buechte: the code attributed to Sebastian is not that given in his post. You have changed variable names between examples. This could probably be puzzled out with infinite time and patience.

    Cox and Airey: the code is not what is suggested in the link, but your own attempt to generalise the code, which won't work. The only accessible value of r(min) will be after the last summarize in any batch.

    Cox and Booth: a different problem.

    A serious difficulty with your example is the lack of a time or other order variable, indicating what is first, second, etc. That means that the order is fragile to sorts of the data.

    Here is one approach:

    Code:
    clear
    input byte(id x)
    1 .
    1 18
    1 12
    1 20
    2 .
    2 .
    2 15
    2 22
    end
    
    sort id, stable
    by id: gen long obsno = _n
    by id : gen countnonmissing = sum(!missing(x)) if !missing(x)
    bysort id (countnonmissing) : gen firstnonmissing = x[1]
    
    sort id obsno
    list, sepby(id)
    
    
         +---------------------------------------+
         | id    x   obsno   countn~g   firstn~g |
         |---------------------------------------|
      1. |  1    .       1          .         18 |
      2. |  1   18       2          1         18 |
      3. |  1   12       3          2         18 |
      4. |  1   20       4          3         18 |
         |---------------------------------------|
      5. |  2    .       1          .         15 |
      6. |  2    .       2          .         15 |
      7. |  2   15       3          1         15 |
      8. |  2   22       4          2         15 |
         +---------------------------------------+
    Naturally you can call the variables anything you like that is legal. More crucially, the FAQ cited above explains the logic here.

    Comment


    • #3
      A slightly different version of Nick's code; honestly, I think it came from him originally in an earlier post. But any problems with it are entirely the fault of the current author.
      Code:
      clear
      input byte(id x)
      1 .
      1 18
      1 12
      1 20
      2 .
      2 .
      2 15
      2 22
      3 33
      3 .
      3 31
      end
      
      sort id, stable
      gen long obsno = _n
      gen missing = missing(x)
      bysort id (missing obsno) : gen firstnonmissing = x[1]
      
      sort id obsno
      list, sepby(id)
      Code:
           +--------------------------------------+
           | id    x   obsno   missing   firstn~g |
           |--------------------------------------|
        1. |  1    .       1         1         18 |
        2. |  1   18       2         0         18 |
        3. |  1   12       3         0         18 |
        4. |  1   20       4         0         18 |
           |--------------------------------------|
        5. |  2    .       5         1         15 |
        6. |  2    .       6         1         15 |
        7. |  2   15       7         0         15 |
        8. |  2   22       8         0         15 |
           |--------------------------------------|
        9. |  3   33       9         0         33 |
       10. |  3    .      10         1         33 |
       11. |  3   31      11         0         33 |
           +--------------------------------------+
      Last edited by William Lisowski; 04 Jan 2017, 12:43.

      Comment


      • #4
        Thank you so much Nick and William! I had been struggling on this for several hours before I posted the question, and your approach works great and I am most appreciative!
        Last edited by Lucy Zhao; 04 Jan 2017, 14:33.

        Comment


        • #5
          To obtain the last non missing value of a variable (per id) instead of the first non missing value:
          Is this code doing the job correctly? ( Referring to Nick Cox's solution )

          Code:
          by id: egen lastobs = max(countnonmissing)
          bysort id (countnonmissing): egen lastnonmissing = mean(x/(obsno==lastobs))
          Ps. I tried this first but it only worked out if the maximum of obsno = maximum of countnonmissing.
          Code:
          by id: egen lastobs = max(countnonmissing)
          bysort id (countnonmissing): gen lastnonmissing = x[_N]

          Comment


          • #6
            Rebecca Water There are many ways to get at last non-missing. This is a current favourite of mine.

            Code:
            clear
            input byte(id x t)
            1 .   1 
            1 18  2
            1 12  3
            1 20  4
            1 .   5
            2 .   1
            2 .   2
            2 15  3
            2 22  4
            end 
            
            gen byte OK = !missing(x) 
            bysort id (OK t) : gen last_nm = x[_N] 
            sort id t 
            
            list, sepby(id) 
            
                 +----------------------------+
                 | id    x   t   OK   last_nm |
                 |----------------------------|
              1. |  1    .   1    0        20 |
              2. |  1   18   2    1        20 |
              3. |  1   12   3    1        20 |
              4. |  1   20   4    1        20 |
              5. |  1    .   5    0        20 |
                 |----------------------------|
              6. |  2    .   1    0        22 |
              7. |  2    .   2    0        22 |
              8. |  2   15   3    1        22 |
              9. |  2   22   4    1        22 |
                 +----------------------------+

            Comment

            Working...
            X