Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Substitute rows with average of row above and below

    Dear all,

    I have the following problem. In a data set similar to that below, I have some records that are repeat a couple of times (see the case for year 2007). In my case, those records are likely to contain mistakes and I would like to remove and substitute them using an average between the year before and the year after. This would equal to throw away two lines, and susbtitute them with the corresponding average of the year above and below. So far, I couldn't find help on the collapse documentation, nor did thinking in terms of spells help me.
    ID year var1 var2 var3
    1 2006 34 45 65
    1 2007 45 43 41
    1 2007 3 56 59
    1 2008 39 54 76
    1 2009 41 57 68
    Any help is greatly appreaciated.

    Riccardo

  • #2
    You could use tssmooth ma.

    For example:

    Code:
    clear
    input ID    year    var1    var2    var3
    1    2006    34    45    65
    1    2007    45    43    41
    1    2007    3    56    59
    1    2008    39    54    76
    1    2009    41    57    68
    end
    
    drop if year == 2007
    tsset ID year
    tsfill
    
    forv i = 1/3 {
        tssmooth ma v`i' = var`i' , w(1, 0, 1)
        replace var`i' = v`i' if var`i' == .
    }
    drop v?
    l

    Comment


    • #3
      Here's another approach.

      Code:
      clear
      input id year var1 var2 var3
      1 2006 34 45 65
      1 2007 45 43 41
      1 2007 3 56 59
      1 2008 39 54 76
      1 2009 41 57 68
      end
      //
      duplicates tag id year, gen(mistake)  
      bysort id year: keep if (_n == 1)
      foreach v of varlist var1 var2 var3 {  
         replace `v' =  (`v'[_n-1] + `v'[_n+1])/2 if mistake
      }
      Regards, Mike

      Comment


      • #4
        Many thanks! They are both great ideas!

        Riccardo

        Comment

        Working...
        X