Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Carrying forward values

    Dear all,

    I have a data set with multiple observations per person.
    The variable "episode" has several missing values that I'm trying to "fill."
    I want to carry forward a given number if the first number to reoccur after the missing value(s)
    is the same number that is being carried forward, within the same person ( indicated by "ID").
    So basically I want to fill in missing values if they are sandwhiched by the same number above and below.
    There can be up to 30 consecutive rows with missing values within any given level of episode.

    I have attached a sample of the data. The variable “new_var” shows what I am trying to do with “episode.”



    It seems that carryforward (ssc install carryforward) might be the right thing to use here, though despite reading the help guide I cannot figure out how.

    Does anyone see a quick way of doing this?

    Thanks.

    Karina
    Attached Files

  • #2
    I haven't used carryforward (SSC), because I always go back to the "first principles" as outlined here:
    http://www.stata.com/support/faqs/da...issing-values/

    One slightly awkward method is to carry forward, then reverse time, again carry forward (really backward), and accept a solution if and only if the two methods give the same results. For that, you need a time variable, which you may have off-stage, but in any case I create one first.

    Code:
    gen time = id != id[_n-1]
    replace time = time[_n-1] + 1 if time == 0
    
    * carry backward 
    gen forward = episode
    bysort id (time) : replace forward = forward[_n-1] if missing(forward)
    
    * carry backward 
    gen backward = episode
    gsort id -time
    by id : replace backward = backward[_n-1] if missing(backward)
    
    * if backward == forward, we use it to replace 
    gen episode2 = cond(missing(episode) & backward == forward, backward, episode)
    But there's a cuter way. Just interpolate linearly. If interpolated values have fractional parts, the non-missing values either side must differ, so we must reject such values. You can use floor(), ceil() or int() to test for integer values. Otherwise, interpolations are fine. (This assumes that your numbering convention follows your example.)

    Code:
     
    bysort id  : ipolate episode time, gen(episode3)
    replace episode3 = . if floor(episode3) != episode3

    Comment


    • #3
      Thanks, I really like the simplicity of the second method!
      I was unaware of -ipolate-, though that does exactly the job I want.

      I normally use the "first princples" you use in your first example, though being rather new to this prorgramming business I was unable to work out the solution of going forward and backward, and then comparing results. Though I see that that works too.

      Thanks again!

      Karina


      Comment


      • #4
        The reversing time trick is a trick and its details are very Stataish. It does depend on fluency with by: and how to use subscripts within blocks of observations.

        I wrote a tutorial on by: at http://www.stata-journal.com/sjpdf.h...iclenum=pr0004 that may help.

        Other kinds of interpolation are covered by nnipolate, cipolate, csipolate, pchipolate (all SSC).

        Comment


        • #5
          If you come here then please note that mipolate (SSC) unites the commands mentioned in #4. See e.g. https://www.statalist.org/forums/for...-interpolation

          Comment

          Working...
          X