Thanks to Kit Baum as usual, a new program mipolate is now available from SSC.
Use
to install.
Stata version 12 is required (but see below for a note for any people on
version 10 or 11 who may be interested).
mipolate is for interpolation, and extrapolation too, for
one-dimensional series, replacing missing values with interpolated
values in a copy of a variable. It is in effect an unofficial
generalisation of the official command ipolate. It builds in the content
of, and thus supersedes, the previously issued cipolate, csipolate,
pchipolate and nnipolate (all SSC), and adds yet more methods.
The by: prefix is allowed, as with ipolate. In particular, that means
support for panel or longitudinal data.
mipolate uses one of the following methods: linear, cubic, cubic spline,
pchip (piecewise cubic Hermite interpolation), idw (inverse distance
weighted), forward, backward, nearest neighbour, groupwise. The default
method is linear.
mipolate does not require tsset or xtset data and makes no check for, or
use of, any such settings.
linear specifies linear interpolation using known values
before and after any missing values. This is the default method.
cubic specifies cubic interpolation, using exact fitting of a cubic
curve to two data points before and two data points after each
observation for which there is a missing. Missing values are thus produced
whenever fewer than two data points are present on either side. Note
that this is not a spline method.
spline specifies natural cubic spline interpolation. The method uses official
Mata functions spline3() and spline3eval().
pchip specifies piecewise cubic Hermite interpolation. This method uses
piecewise cubics that join smoothly, so that both the interpolated
function and its first derivative are continuous. In addition, the
interpolant is shape-preserving in the sense that it cannot overshoot
locally; sections in which the observed is increasing, decreasing or
constant remain so after interpolation, and local extremes
(maxima, maxima) also remain so. This interpolation method also
extrapolates.
idw[(power)] specifies inverse distance weighted interpolation. This
method uses a weighted average of non-missing values, the weights being
reciprocals of the powered distance between values, the power being zero
or positive. The default power is 2; any other choice must be specified.
Thus with power 2, values at distance 1 from a point with unknown values
have weight 1, values at distance 2 from a point have weight 1/4,
distance 3 weight 1/9, and so forth. If the power is 0, all known
points have equal weight and the interpolant reduces to the average of
all values. As the power becomes large, only those values that are
nearest have appreciable weight. This interpolation method also
extrapolates.
forward specifies forward interpolation, so that any known value just
before one or more missing values is copied in cascade to provide
interpolated values, constant within any such block.
backward specifies backward interpolation, so that any known value just
after one or more missing values is copied in cascade to provide
interpolated values, constant within any such block.
nearest specifies nearest neighbour interpolation, which means using
known values either before or after missing values, depending on
which is nearer. When values before and after are
equally distant from a known value, there is a choice of rules that may
be applied. The default rule uses the mean of the two values. The
ties() option provides alternative rules. This method also
extrapolates, as unknown values before the first known value and unknown
values after the last known value are replaced by those respective known
values.
groupwise specifies that non-missing values be copied to missing values
if, and only if, just one distinct non-missing value occurs in each
group. Thus a group of values ., 42, ., . qualifies as 42 is not missing
and is the only non-missing value in the group. Hence the missing values
in the group will be replaced with 42 in the new variable. By the same
rules 42, ., 42, . qualifies but 42, ., 43, . does not. Normally, but
not necessarily, this option is used in conjunction with by:, which is
how groups are specified; otherwise the (single) group is the entire set
of observations being used.
(So what about users of version 10 or 11? The code works fine in those
versions. The problem is that some SMCL directives that work in Stata 12
up will not work in 10 or 11. Anyone who downloaded the files from SSC,
edited the version statement in the ado file and edited the help files
would get a serviceable variant on mipolate if they did that correctly,
but that's your responsibility.)
Use
Code:
ssc inst mipolate
Stata version 12 is required (but see below for a note for any people on
version 10 or 11 who may be interested).
mipolate is for interpolation, and extrapolation too, for
one-dimensional series, replacing missing values with interpolated
values in a copy of a variable. It is in effect an unofficial
generalisation of the official command ipolate. It builds in the content
of, and thus supersedes, the previously issued cipolate, csipolate,
pchipolate and nnipolate (all SSC), and adds yet more methods.
The by: prefix is allowed, as with ipolate. In particular, that means
support for panel or longitudinal data.
mipolate uses one of the following methods: linear, cubic, cubic spline,
pchip (piecewise cubic Hermite interpolation), idw (inverse distance
weighted), forward, backward, nearest neighbour, groupwise. The default
method is linear.
mipolate does not require tsset or xtset data and makes no check for, or
use of, any such settings.
linear specifies linear interpolation using known values
before and after any missing values. This is the default method.
cubic specifies cubic interpolation, using exact fitting of a cubic
curve to two data points before and two data points after each
observation for which there is a missing. Missing values are thus produced
whenever fewer than two data points are present on either side. Note
that this is not a spline method.
spline specifies natural cubic spline interpolation. The method uses official
Mata functions spline3() and spline3eval().
pchip specifies piecewise cubic Hermite interpolation. This method uses
piecewise cubics that join smoothly, so that both the interpolated
function and its first derivative are continuous. In addition, the
interpolant is shape-preserving in the sense that it cannot overshoot
locally; sections in which the observed is increasing, decreasing or
constant remain so after interpolation, and local extremes
(maxima, maxima) also remain so. This interpolation method also
extrapolates.
idw[(power)] specifies inverse distance weighted interpolation. This
method uses a weighted average of non-missing values, the weights being
reciprocals of the powered distance between values, the power being zero
or positive. The default power is 2; any other choice must be specified.
Thus with power 2, values at distance 1 from a point with unknown values
have weight 1, values at distance 2 from a point have weight 1/4,
distance 3 weight 1/9, and so forth. If the power is 0, all known
points have equal weight and the interpolant reduces to the average of
all values. As the power becomes large, only those values that are
nearest have appreciable weight. This interpolation method also
extrapolates.
forward specifies forward interpolation, so that any known value just
before one or more missing values is copied in cascade to provide
interpolated values, constant within any such block.
backward specifies backward interpolation, so that any known value just
after one or more missing values is copied in cascade to provide
interpolated values, constant within any such block.
nearest specifies nearest neighbour interpolation, which means using
known values either before or after missing values, depending on
which is nearer. When values before and after are
equally distant from a known value, there is a choice of rules that may
be applied. The default rule uses the mean of the two values. The
ties() option provides alternative rules. This method also
extrapolates, as unknown values before the first known value and unknown
values after the last known value are replaced by those respective known
values.
groupwise specifies that non-missing values be copied to missing values
if, and only if, just one distinct non-missing value occurs in each
group. Thus a group of values ., 42, ., . qualifies as 42 is not missing
and is the only non-missing value in the group. Hence the missing values
in the group will be replaced with 42 in the new variable. By the same
rules 42, ., 42, . qualifies but 42, ., 43, . does not. Normally, but
not necessarily, this option is used in conjunction with by:, which is
how groups are specified; otherwise the (single) group is the entire set
of observations being used.
(So what about users of version 10 or 11? The code works fine in those
versions. The problem is that some SMCL directives that work in Stata 12
up will not work in 10 or 11. Anyone who downloaded the files from SSC,
edited the version statement in the ado file and edited the help files
would get a serviceable variant on mipolate if they did that correctly,
but that's your responsibility.)
Comment