Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to identify spells of consecutive missings

    I would like Stata to linearly interpolate missing values if the gap is relatively short (i.e. 3 years or less). I was thinking of using the ipolate command, but I am a bit lost how to tell Stata to recognize the short gaps.

    For example, I have a panel data set that look similar to the following:
    country year x
    36 1990 1.88E+09
    36 1991 2.05E+09
    36 1992 2.09E+09
    36 1993 2.10E+09
    36 1994 2.02E+09
    36 1995 1.96E+09
    36 1996
    36 1997
    36 1998
    36 1999 1.97E+09
    36 2000 1.88E+09
    36 2001 1.99E+09
    36 2002 1.98E+09
    36 2003 1.95E+09
    36 2005 1.85E+09
    36 2006
    36 2007
    36 2008
    36 2009
    36 2010
    36 2011
    36 2012 1.79E+09
    36 2013 1.61E+09
    40 1990
    40 1991 ….
    I want Stata to interpolate x for 1996-1998, but not for 2006-2011. Anyone could help me on how to proceed? This seems like a really basic question but I am still having trouble even after going over the help file of tsspell command as well as the following threads:
    https://www.statalist.org/forums/for...-in-panel-data
    https://www.stata.com/support/faqs/d...-observations/

    Thanks so much!

  • #2
    This should do it:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte country int year long x
    36 1990 1880000000
    36 1991 2050000000
    36 1992 2090000000
    36 1993 2100000000
    36 1994 2020000000
    36 1995 1960000000
    36 1996          .
    36 1997          .
    36 1998          .
    36 1999 1970000000
    36 2000 1880000000
    36 2001 1990000000
    36 2002 1980000000
    36 2003 1950000000
    36 2005 1850000000
    36 2006          .
    36 2007          .
    36 2008          .
    36 2009          .
    36 2010          .
    36 2011          .
    36 2012 1790000000
    36 2013 1610000000
    40 1990          .
    40 1991          .
    end
    
    
    by country (year), sort: gen spell = sum(missing(x) != missing(x[_n-1]))
    by country spell (year), sort: gen spell_length = _N
    ipolate x year if !missing(x) | spell_length <= 3, gen(x2)
    In the future, please use the -dataex- command to post example data, as I have done here. Your example was not exceptionally difficult to import to Stata to test this code, but layouts like that often are. To get the -dataex- command, run -ssc install dataex- and then run -help dataex- to read the very simple instructions for using it. When you use -dataex- you make it possible for those who want to help you to easily create a complete and faithful replica of your Stata example with a simple copy/paste operation.

    As an aside, moving forward with this data, you may encounter numerical instability or convergence problems in later analyses. This can arise when variables have markedly different scales, as year and x do here. It probably would make sense to scale x down by a factor of 106 or even 109. (If x is, for example some amount of dollars, you would simply be giving x new units of millions or billions of dollars.)

    Comment


    • #3
      Clyde, thanks so much!
      I will also use the dataex command in the future and look into the scale issue.

      Comment


      • #4
        -ipolate- fills in gaps without needing to be told where they are.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          -ipolate- fills in gaps without needing to be told where they are.
          I guess it is because Jaehee only wants to interpolate x for 1996-1998, but not for 2006-2011.

          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment


          • #6
            see also -tsspell- from SSC which yields information on length of spells

            Comment

            Working...
            X