Interpolate (extrapolate) values (mipolate)

Ali Taleb

Join Date: Feb 2018
Posts: 70

Interpolate (extrapolate) values (mipolate)

21 May 2019, 04:54

Hi Everyone,

As my data set has some missing values, I have used the "mipolate" command found in stata. Using the stata code (see below), all the missings are replaced by estimated values. However, the replaced missings for cities, as shown in the example below, are negative. I tried to use "nearest" instead of "spline", but the values will be replaced just by the nearest known value and not going to be estimated.

Would anyone please help with this issue? I need non-replicated values (rather estimated values) and non-negative values as well.

Code:

sort city year

by city: mipolate graduates year, gen(Grad) spline epolate

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double graduates float year str21 city double Grad
   . 2005 "americancanyon"   8.89
   . 2006 "americancanyon"   8.49
   . 2007 "americancanyon"   8.09
   . 2008 "americancanyon"   7.69
 7.3 2009 "americancanyon"                 7.3
 6.9 2010 "americancanyon"                 6.9
10.5 2011 "americancanyon"                10.5
11.4 2012 "americancanyon"                11.4
10.8 2013 "americancanyon"                10.8
 9.2 2014 "americancanyon"                 9.2
 9.4 2015 "americancanyon"                 9.4
 7.7 2016 "americancanyon"                 7.7
 7.2 2017 "americancanyon"                 7.2
   . 2018 "americancanyon"  6.70
   . 2005 "arvin"            1.90
   . 2006 "arvin"           1.69
   . 2007 "arvin"                          1.5
   . 2008 "arvin"           1.300
 1.1 2009 "arvin"                          1.1
  .9 2010 "arvin"                           .9
  .9 2011 "arvin"                           .9
  .9 2012 "arvin"                           .9
  .5 2013 "arvin"                           .5
  .4 2014 "arvin"                           .4
  .3 2015 "arvin"                           .3
  .2 2016 "arvin"                           .2
  .5 2017 "arvin"                           .5
   . 2018 "arvin"            .800
   . 2005 "kerman"          -5.099
   . 2006 "kerman"         -3.4
   . 2007 "kerman"         -1.8
   . 2008 "kerman"         -.29
 1.3 2009 "kerman"                         1.3
 2.9 2010 "kerman"                         2.9
 2.2 2011 "kerman"                         2.2
   2 2012 "kerman"                           2
 2.8 2013 "kerman"                         2.8
 3.4 2014 "kerman"                         3.4
 1.8 2015 "kerman"                         1.8
 1.9 2016 "kerman"                         1.9
   2 2017 "kerman"                           2
   . 2018 "kerman"          2.09
end

the nearest:

Code:

sort city year

by city: mipolate graduates year, gen(Grad) nearest epolate

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double graduates float year str21 city double Grad1
   . 2005 "americancanyon"  7.3
   . 2006 "americancanyon"  7.3
   . 2007 "americancanyon"  7.3
   . 2008 "americancanyon"  7.3
 7.3 2009 "americancanyon"  7.3
 6.9 2010 "americancanyon"  6.9
10.5 2011 "americancanyon" 10.5
11.4 2012 "americancanyon" 11.4
10.8 2013 "americancanyon" 10.8
 9.2 2014 "americancanyon"  9.2
 9.4 2015 "americancanyon"  9.4
 7.7 2016 "americancanyon"  7.7
 7.2 2017 "americancanyon"  7.2
   . 2018 "americancanyon"  7.2
   . 2005 "arvin"           1.1
   . 2006 "arvin"           1.1
   . 2007 "arvin"           1.1
   . 2008 "arvin"           1.1
 1.1 2009 "arvin"           1.1
  .9 2010 "arvin"            .9
  .9 2011 "arvin"            .9
  .9 2012 "arvin"            .9
  .5 2013 "arvin"            .5
  .4 2014 "arvin"            .4
  .3 2015 "arvin"            .3
  .2 2016 "arvin"            .2
  .5 2017 "arvin"            .5
   . 2018 "arvin"            .5
   . 2005 "kerman"          1.3
   . 2006 "kerman"          1.3
   . 2007 "kerman"          1.3
   . 2008 "kerman"          1.3
 1.3 2009 "kerman"          1.3
 2.9 2010 "kerman"          2.9
 2.2 2011 "kerman"          2.2
   2 2012 "kerman"            2
 2.8 2013 "kerman"          2.8
 3.4 2014 "kerman"          3.4
 1.8 2015 "kerman"          1.8
 1.9 2016 "kerman"          1.9
   2 2017 "kerman"            2
   . 2018 "kerman"            2
end

Thank you
Ali

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35697
#2

21 May 2019, 05:14

You got what you asked for: it's just not what you wanted. Extrapolation with cubic splines isn't much less hazardous than with straight lines, and the problem with straight lines is that they can go anywhere.

Plotting your data and results should make that seem both obvious and unsurprising. Here's one of your cities:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double graduates int year str8 City double Grad . 2005 `""kerman""' -5.099 . 2006 `""kerman""' -3.4 . 2007 `""kerman""' -1.8 . 2008 `""kerman""' -.29 1.3 2009 `""kerman""' 1.3 2.9 2010 `""kerman""' 2.9 2.2 2011 `""kerman""' 2.2 2 2012 `""kerman""' 2 2.8 2013 `""kerman""' 2.8 3.4 2014 `""kerman""' 3.4 1.8 2015 `""kerman""' 1.8 1.9 2016 `""kerman""' 1.9 2 2017 `""kerman""' 2 . 2018 `""kerman""' 2.09 end scatter Grad year if graduates == ., ms(+) || scatter grad year , legend(order(1 "extrapolated" 2 "data")) yla(, ang(h)) /// ytitle(data and extrapolated) xtitle("") yli(0, lc(gs12) lw(vthin))

Also, there is no sense in which mipolate (which isn't "found" in Stata; it has to be installed from SSC) knows what values make substantive sense.

For a measured variable that is positive percents (?) I would take logits, interpolate on logit scale, and then reverse the transformation. Zeros are harder -- and easier -- to deal with -- because then the only plausible extrapolation is just zero.
1 like
Comment
Ali Taleb

Join Date: Feb 2018

Posts: 70
#3

21 May 2019, 06:17

Thank you Nick for your reply. It helps very much!
Comment
Ali Taleb

Join Date: Feb 2018

Posts: 70
#4

21 May 2019, 06:58

Excuse me, Nick, I've got one more thing if you could help, please.

Some of the variables that I have includes zeros, or they are dummy variables with values equal to zero or one. Taking logarithm generates missings for these variables. So interpolation would be harder in this case. I also tried taking square roots before the interpolation, however, this would never ensure positive values (some of the results are negatives actually).

Best
Ali
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#5

21 May 2019, 07:52

If a response is a dummy -- I prefer "indicator" (*) -- with values 0 or 1, what is a reasonable interpolation? My suggestion with such variables is to interpolate linearly and ignore fractions. Thus 0 missing 0 interpolates to 0 0 0 and 1 missing 1 interpolates to 1 1 1, which are both likely to seem reasonable, but 0 missing 1 interpolates to 0 0.5 1 which is unlikely to be acceptable, but what's acceptable is also a researcher's decision. Those examples are for gaps of length 1 but the idea is more general. Note that nowhere do I urge taking logarithms of indicators.

I didn't suggest taking square roots, which as you say won't ensure positive extrapolated values.

(*) https://journals.sagepub.com/doi/pdf...36867X19830921 is as I write visible as an entire .pdf. See Section 2 on names.
Comment
Ali Taleb

Join Date: Feb 2018

Posts: 70
#6

21 May 2019, 08:03

Much appreciated Nick.
Comment

Announcement

Interpolate (extrapolate) values (mipolate)

Comment

Comment

Comment

Comment

Comment