Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpolate (extrapolate) values (mipolate)

    Hi Everyone,

    As my data set has some missing values, I have used the "mipolate" command found in stata. Using the stata code (see below), all the missings are replaced by estimated values. However, the replaced missings for cities, as shown in the example below, are negative. I tried to use "nearest" instead of "spline", but the values will be replaced just by the nearest known value and not going to be estimated.

    Would anyone please help with this issue? I need non-replicated values (rather estimated values) and non-negative values as well.

    Code:
    sort city year
    
    by city: mipolate graduates year, gen(Grad) spline epolate
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double graduates float year str21 city double Grad
       . 2005 "americancanyon"   8.89
       . 2006 "americancanyon"   8.49
       . 2007 "americancanyon"   8.09
       . 2008 "americancanyon"   7.69
     7.3 2009 "americancanyon"                 7.3
     6.9 2010 "americancanyon"                 6.9
    10.5 2011 "americancanyon"                10.5
    11.4 2012 "americancanyon"                11.4
    10.8 2013 "americancanyon"                10.8
     9.2 2014 "americancanyon"                 9.2
     9.4 2015 "americancanyon"                 9.4
     7.7 2016 "americancanyon"                 7.7
     7.2 2017 "americancanyon"                 7.2
       . 2018 "americancanyon"  6.70
       . 2005 "arvin"            1.90
       . 2006 "arvin"           1.69
       . 2007 "arvin"                          1.5
       . 2008 "arvin"           1.300
     1.1 2009 "arvin"                          1.1
      .9 2010 "arvin"                           .9
      .9 2011 "arvin"                           .9
      .9 2012 "arvin"                           .9
      .5 2013 "arvin"                           .5
      .4 2014 "arvin"                           .4
      .3 2015 "arvin"                           .3
      .2 2016 "arvin"                           .2
      .5 2017 "arvin"                           .5
       . 2018 "arvin"            .800
       . 2005 "kerman"          -5.099
       . 2006 "kerman"         -3.4
       . 2007 "kerman"         -1.8
       . 2008 "kerman"         -.29
     1.3 2009 "kerman"                         1.3
     2.9 2010 "kerman"                         2.9
     2.2 2011 "kerman"                         2.2
       2 2012 "kerman"                           2
     2.8 2013 "kerman"                         2.8
     3.4 2014 "kerman"                         3.4
     1.8 2015 "kerman"                         1.8
     1.9 2016 "kerman"                         1.9
       2 2017 "kerman"                           2
       . 2018 "kerman"          2.09
    end
    the nearest:
    Code:
    sort city year
    
    by city: mipolate graduates year, gen(Grad) nearest epolate
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double graduates float year str21 city double Grad1
       . 2005 "americancanyon"  7.3
       . 2006 "americancanyon"  7.3
       . 2007 "americancanyon"  7.3
       . 2008 "americancanyon"  7.3
     7.3 2009 "americancanyon"  7.3
     6.9 2010 "americancanyon"  6.9
    10.5 2011 "americancanyon" 10.5
    11.4 2012 "americancanyon" 11.4
    10.8 2013 "americancanyon" 10.8
     9.2 2014 "americancanyon"  9.2
     9.4 2015 "americancanyon"  9.4
     7.7 2016 "americancanyon"  7.7
     7.2 2017 "americancanyon"  7.2
       . 2018 "americancanyon"  7.2
       . 2005 "arvin"           1.1
       . 2006 "arvin"           1.1
       . 2007 "arvin"           1.1
       . 2008 "arvin"           1.1
     1.1 2009 "arvin"           1.1
      .9 2010 "arvin"            .9
      .9 2011 "arvin"            .9
      .9 2012 "arvin"            .9
      .5 2013 "arvin"            .5
      .4 2014 "arvin"            .4
      .3 2015 "arvin"            .3
      .2 2016 "arvin"            .2
      .5 2017 "arvin"            .5
       . 2018 "arvin"            .5
       . 2005 "kerman"          1.3
       . 2006 "kerman"          1.3
       . 2007 "kerman"          1.3
       . 2008 "kerman"          1.3
     1.3 2009 "kerman"          1.3
     2.9 2010 "kerman"          2.9
     2.2 2011 "kerman"          2.2
       2 2012 "kerman"            2
     2.8 2013 "kerman"          2.8
     3.4 2014 "kerman"          3.4
     1.8 2015 "kerman"          1.8
     1.9 2016 "kerman"          1.9
       2 2017 "kerman"            2
       . 2018 "kerman"            2
    end

    Thank you
    Ali

  • #2
    You got what you asked for: it's just not what you wanted. Extrapolation with cubic splines isn't much less hazardous than with straight lines, and the problem with straight lines is that they can go anywhere.

    Plotting your data and results should make that seem both obvious and unsurprising. Here's one of your cities:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double graduates int year str8 City double Grad
      . 2005 `""kerman""' -5.099
      . 2006 `""kerman""'   -3.4
      . 2007 `""kerman""'   -1.8
      . 2008 `""kerman""'   -.29
    1.3 2009 `""kerman""'    1.3
    2.9 2010 `""kerman""'    2.9
    2.2 2011 `""kerman""'    2.2
      2 2012 `""kerman""'      2
    2.8 2013 `""kerman""'    2.8
    3.4 2014 `""kerman""'    3.4
    1.8 2015 `""kerman""'    1.8
    1.9 2016 `""kerman""'    1.9
      2 2017 `""kerman""'      2
      . 2018 `""kerman""'   2.09
    end
    
    scatter Grad year if graduates == ., ms(+)  || scatter grad year , legend(order(1 "extrapolated" 2 "data"))  yla(, ang(h)) ///
    ytitle(data and extrapolated) xtitle("") yli(0, lc(gs12) lw(vthin))
    Click image for larger version

Name:	epolate.png
Views:	1
Size:	17.3 KB
ID:	1499277


    Also, there is no sense in which mipolate (which isn't "found" in Stata; it has to be installed from SSC) knows what values make substantive sense.

    For a measured variable that is positive percents (?) I would take logits, interpolate on logit scale, and then reverse the transformation. Zeros are harder -- and easier -- to deal with -- because then the only plausible extrapolation is just zero.

    Comment


    • #3
      Thank you Nick for your reply. It helps very much!

      Comment


      • #4
        Excuse me, Nick, I've got one more thing if you could help, please.

        Some of the variables that I have includes zeros, or they are dummy variables with values equal to zero or one. Taking logarithm generates missings for these variables. So interpolation would be harder in this case. I also tried taking square roots before the interpolation, however, this would never ensure positive values (some of the results are negatives actually).

        Best
        Ali

        Comment


        • #5
          If a response is a dummy -- I prefer "indicator" (*) -- with values 0 or 1, what is a reasonable interpolation? My suggestion with such variables is to interpolate linearly and ignore fractions. Thus 0 missing 0 interpolates to 0 0 0 and 1 missing 1 interpolates to 1 1 1, which are both likely to seem reasonable, but 0 missing 1 interpolates to 0 0.5 1 which is unlikely to be acceptable, but what's acceptable is also a researcher's decision. Those examples are for gaps of length 1 but the idea is more general. Note that nowhere do I urge taking logarithms of indicators.

          I didn't suggest taking square roots, which as you say won't ensure positive extrapolated values.

          (*) https://journals.sagepub.com/doi/pdf...36867X19830921 is as I write visible as an entire .pdf. See Section 2 on names.

          Comment


          • #6
            Much appreciated Nick.

            Comment

            Working...
            X