Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Command to ignore missing values

    Hi all,

    I'm working with a panel data set, and I need to calculate the average value of a variable for the first three non-missing years for each individual. I can't do this by just identifying the first three years of observation, since for some individuals the first three years might have missing values. I need a command that would tell Stata to ignore any missing values, and create the average of the first three non-missing years.

    Is there any command to do this? I've included the code for what I have done already, which doesn't work since I haven't ignored missing value years.

    Code:
     bys xwaveid: gen entry3=_n==1|_n==2|_n==3
    egen mh9_avg=mean(mh9) if entry3==1, by(xwaveid)
    I have also included a sample below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str7 xwaveid float(year entry3 mh9 mh9_avg)
    "0100003" 2001 1  86.66666       80
    "0100003" 2002 1  77.77778       80
    "0100003" 2003 1  75.55556       80
    "0100003" 2004 0  75.55556        .
    "0100003" 2005 0  77.77778        .
    "0100003" 2006 0  75.55556        .
    "0100003" 2008 0 68.888885        .
    "0100003" 2009 0  62.22222        .
    "0100003" 2011 0  75.55556        .
    "0100003" 2012 0 66.666664        .
    "0100003" 2013 0         .        .
    "0100003" 2014 0  46.66667        .
    "0100003" 2015 0  57.77778        .
    "0100003" 2016 0         .        .
    "0100003" 2017 0         .        .
    "0100003" 2018 0 71.111115        .
    "0100005" 2010 1 71.111115 76.66667
    "0100005" 2011 1         . 76.66667
    "0100005" 2012 1  82.22222 76.66667
    "0100005" 2013 0         .        .
    "0100005" 2014 0  62.22222        .
    "0100005" 2019 0  77.77778        .
    "0100010" 2002 1 66.666664 74.81481
    "0100010" 2003 1  84.44444 74.81481
    "0100010" 2004 1 73.333336 74.81481
    "0100010" 2005 0  64.44444        .
    "0100010" 2006 0 68.888885        .
    "0100010" 2007 0        60        .
    "0100010" 2008 0 68.888885        .
    "0100010" 2009 0        60        .
    "0100010" 2010 0  55.55556        .
    "0100010" 2011 0 71.111115        .
    "0100010" 2012 0 71.111115        .
    "0100010" 2013 0  64.44444        .
    "0100010" 2014 0  75.55556        .
    "0100010" 2015 0  75.55556        .
    "0100010" 2016 0        80        .
    "0100010" 2017 0  77.77778        .
    "0100010" 2018 0  75.55556        .
    "0100010" 2019 0 71.111115        .
    "0100014" 2001 1  55.55556 54.81482
    "0100014" 2002 1  48.88889 54.81482
    "0100014" 2003 1        60 54.81482
    "0100014" 2004 0        60        .
    "0100014" 2005 0 66.666664        .
    "0100014" 2006 0  55.55556        .
    "0100014" 2007 0  57.77778        .
    "0100014" 2008 0  53.33333        .
    "0100014" 2009 0 66.666664        .
    "0100014" 2010 0  62.22222        .
    end
    In the case of ID 0100005, for example, I would want to ignore the 2nd (and 4th) year(s) for which the values are missing, and create the average based on the 1st, 3rd and 5th years.

  • #2
    Here are two different methods. The first one is longer and appears more complicated, but it is a direct approach to the stated problem in that it removes the missing observations and then averages the first surviving three. The second one is shorter, but a bit more opaque because it relies on sorting the observations with missing values out of the way. That said, in a large data set, method 2 will also be faster.

    Code:
    //    METHOD 1
    frame put xwaveid year mh9, into(working)
    frame working {
        drop if missing(mh9)
        by xwaveid (year), sort: egen wanted1 = mean(cond(_n <= 3, mh9, .))
    }
    frlink 1:1 xwaveid year, frame(working)
    frget wanted1, from(working)
    
    //    METHOD 2
    gen byte missing_mh9 = missing(mh9)
    by xwaveid (missing_mh9 year), sort: gen include = _n <= 3
    by xwaveid (missing_mh9 year): egen wanted2 = mean(cond(include, mh9, .))

    Comment


    • #3
      Thank you very much!

      Comment

      Working...
      X