Interpolation : mipolate, stripolate, ... ?

Morad Bali

Join Date: Apr 2016

Posts: 101
#1

Interpolation : mipolate, stripolate, ... ?

08 Apr 2018, 10:53

Dear Members,

I would like to have a better understanding of the different tools that can be used in order to do an interpolation.

In this topic:
https://www.statalist.org/forums/for...-interpolation

Nic Cox put forward two new programs: mipolate and stripolate.
What is the difference between these two?

Working mostly on economic data, what criteria will help me to decide if I have to either use mipolate or stripolate?

In my case I would like to transform these quarterly data into monthly. It is a GDP.

y date
16070.9 2014q1
16136.9 2014q2
16022.4 2014q3
15989.1 2014q4
15790 2015q1
15572.7 2015q2
15586.8 2015q3
15542.1 2015q4
15570.3 2016q1
15511.1 2016q2
15510.6 2016q3
15598.8 2016q4
15718.4 2017q1
15880.5 2017q2
15894.4 2017q3
15894.4 2017q4

Thank you for you help!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29937
#2

08 Apr 2018, 12:29

Nic [sic] Cox put forward two new programs: mipolate and stripolate.
1. What is the difference between these two?

-mipolate- is for use with numeric outcome variables, -stripolate- is for use with string outcome variables.

2. Working mostly on economic data, what criteria will help me to decide if I have to either use mipolate or stripolate?

It depends on whether the variable you are trying to interpolate is numeric or string. In economics, it would surprise me to see a situation where you need to interpolate string variables, but I am not an economist, so perhaps my perspective is too limited here.

The value of -mipolate- is that it offers more ways of filling in the gaps than just simple linear interpolation, which was already available with official Stata's -ipolate- (although, if memory serves, -ipolate-, too, was originally written by Nick Cox before StataCorp took it over.) The real question you need to ask is which type of interpolation function, if any, is most appropriate for use with GDP data over the timeframe you are dealing with. There are numerous economists active on this Forum and perhaps one will give you some guidance here. But basically, you need help from an economist about this; it's not a statistical question.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35402
#3

09 Apr 2018, 02:12

Clive's helpful advice just needs one small qualification. ipolate was introduced as part of official Stata a long while ago. I had no part in it. FWIW, I do not answer to Nic.

On the specific problem here the simplest option is just to replicate each quarterly value 3 times. You can do that with mipolate

Large principles and small practicalities here:

1. There is no white magic in interpolation. There is more than one way to interpolate and no obvious one correct way. mipolate sticks to deterministic methods, and doesn't extend to smarter ones such as Gaussian processes. That's not a expression of disdain, just a matter of work to be done (by someone).

2. There are no estimates of error without extra machinery being introduced. Interpolated values in most cases will be smoothed compared with the real unknown values. This uncertainty about uncertainty may affect (infect!) everything done with the interpolated data.

3. For quarterly data, I would carry the quarterly values over to the middle month of each quarter and then interpolate between.

4. For economic data, consider interpolation on logarithmic scale where it may make more sense, followed by exponentiation back. (For other measures, consider logit scale, etc.)

5. Always plot results. (Never use dopey redundant titles for the time axis.)

6. Always compare two or more methods. (Not done here by me, but I am laying down some guidance, not doing all the work.) My using pchip here is nothing but drawing attention to a method that works well for me in quite different problems. Whatever you do should be explained, which may dictate use of a very simple method.

7. I have no idea here about measurement units, that is whether monthly estimates should be divided by 3. That is an important triviality.

8. A constraint that monthly values sum to, or average to, quarterly values would be a separate matter.

9. None of the methods in mipolate know or care about seasonality. They won't impute December or August according to anything special about December or August. (Economists usually regard seasonality as a confounded nuisance any way.)

Code:

clear input GDP str6 sdate 16070.9 2014q1 16136.9 2014q2 16022.4 2014q3 15989.1 2014q4 15790 2015q1 15572.7 2015q2 15586.8 2015q3 15542.1 2015q4 15570.3 2016q1 15511.1 2016q2 15510.6 2016q3 15598.8 2016q4 15718.4 2017q1 15880.5 2017q2 15894.4 2017q3 15894.4 2017q4 end gen date = quarterly(sdate, "YQ") gen mdate = mofd(dofq(date)) + 1 tsset mdate format mdate %tm tsfill sort mdate gen logGDP = log(GDP) mipolate logGDP mdate, gen(logGDP2) pchip gen iGDP = exp(logGDP2) set scheme s1color twoway connected iGDP mdate, ms(+) || scatter GDP mdate, /// legend(order(1 "guessed" 2 "known")) xtitle("") yla(, ang(h)) ytitle(GDP, orient(horiz))
Comment
Morad Bali

Join Date: Apr 2016

Posts: 101
#4

09 Apr 2018, 08:23

Hello,

Thank you both for your answers. It is very useful !

Nick, thank you for the code and the figure. I have a way better understanding of it all.

2. There are no estimates of error without extra machinery being introduced. Interpolated values in most cases will be smoothed compared with the real unknown values. This uncertainty about uncertainty may affect (infect!) everything done with the interpolated data.

At first I didn't really understand what you meant by "smoothed" (maybe because I am not a native speaker), and how it would affect further calculations. In order to have a better understanding I decided to compare the mipolate(iGDP) method to two other interpolations provided by a colleague (Prof. Rapelanoro) :
-the first from the Denton average match method.
-the second from the Quadratic average.
He used Eviews to run both of them. Values are here :

Code:

quadra denton iGDP 16065.9 16077.6 16070.91 16096.1 16119.6 16111.15 16141.4 16141.6 16131.37 16149.4 16143.6 16136.91 16119.9 16125.5 16111 16053.1 16048.6 16059.7 16013.4 16019.4 16022.42 16000.8 15999.3 16010.36 16015.5 16024.7 16002.36 15999.3 15995.2 15989.07 15952.4 15947.3 15944.32 15874.8 15859.1 15868.09 15791.8 15790.7 15790 15703.5 15720.3 15702.72 15609.8 15610.8 15613.5 15558.5 15564.1 15572.67 15549.7 15543.1 15576.32 15583.2 15590.8 15583.1 15594.3 15588.9 15586.75 15582.8 15580.6 15575.16 15548.7 15546.2 15553.66 15535.2 15539.4 15542.09 15542.3 15540.7 15549.4 15570 15573.9 15562.99 15577.1 15573.6 15570.31 15563.8 15563.5 15555.02 15530.1 15522.2 15526.6 15507.4 15509 15511.11 15495.9 15502.3 15510.81 15495.4 15497.6 15510.63 15506.7 15507.3 15510.57 15529.6 15526.8 15525.91 15564.3 15564.8 15560.87 15598.9 15597.7 15598.81 15633.3 15634 15634.68 15667.6 15672.3 15674.56 15714.3 15716.9 15718.44 15773.4 15766.2 15778.89 15844.8 15848.5 15844.86 15889.5 15886 15880.54 15907.4 15907.1 15887.91 15898.5 15891.8 15892.69 15893.2 15894.9 15894.39 15891.5 15896.4 15894.39 15893.3 15896.4 15894.39 15894.6 15894.9 15894.39

And the graph is here :

It seems to me that Denton average match and Quadratic average methods are somehow more precise than the use of mipolate.
It is nonetheless very hard for me to make a choice between Denton / Quatratic.

i) What do you think about these results ? Do you agree ?
ii) Are these two methods (Denton average match and Quadratic average) available on STATA ?

There are numerous economists active on this Forum and perhaps one will give you some guidance here.

I would be glad to have their opinion too, let's hope one of them check this post.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35402
#5

09 Apr 2018, 09:11

Smoothed means that an interpolated series will be smoother than the true series, which you necessarily don't have. So, even in the simplest case of linear interpolation, it seems likely that the true series wiggles and waggles more than a linearly interpolated series.

I don't know what you mean by precise.

Precision means to me replicability of repeated measurements or estimates, in contrast to accuracy, which refers to lack of bias in attempts at a true value. I don't know how you assess interpolation under either heading as you have one outcome for given data only and no variability (and no true series).

Precise is not a flexible synonym for, let's say, appropriate to my data and my problem.

That aside, not only am I happy that you choose a method on those criteria, I encourage you to do that.

As for Denton, that's an easy keyword to look for

Code:

---------------------------------------------------------------------------------------------------------------- search for Denton (manual: [R] search) ---------------------------------------------------------------------------------------------------------------- Search of official help files, FAQs, Examples, SJs, and STBs Web resources from Stata and other users (contacting http://www.stata.com) 1 package found (Stata Journal and STB listed first) ---------------------------------------------------- denton from http://fmwww.bc.edu/RePEc/bocode/d 'DENTON': module to interpolate a flow or stock series from low-frequency totals via proportional Denton method / denton computes the proportional Denton method of interpolation / of a low-frequency flow time series by use of an associated / high-frequency "indicator series", imposing the

Like Clyde I am not an economist (except for A-level Economics Grade A; only British people are expected to understand that), but I understand this Denton method to be close to what you want, but you need other information in order to apply it. That's quite different from any method covered by mipolate.

I have never used Eviews and don't know what quadratic average interpolation is there, but would welcome accessible references.
Comment
Janys Ung

Join Date: Dec 2016

Posts: 35
#6

17 Apr 2020, 16:24

Dear Nick,

Many thanks for sharing the code in post #3. May I know why would you add a value of 1 to the following code?

Code:

gen mdate = mofd(dofq(date)) + 1

My data starts in Q1. If I have that value of 1, my 'mdate' starts from m2 instead of m1. I am just thinking shouldn't it start at m1? My apologies if I've misunderstood the interpolation stuff.

Best wishes,
Janys

Last edited by Janys Ung; 17 Apr 2020, 16:30.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35402
#7

18 Apr 2020, 02:27

#6 Each quarter is three months, 1,2,3 to 10, 11, 12. Note that code like

Code:

. display %tm mofd(dofq(yq(2020, 2))) 2020m4

returns the first month of each quarter. It seems to me more logical to regard each quarter as centred [centered] on its middle or second month, one month later.

Code:

. display %tm mofd(dofq(yq(2020, 2))) + 1 2020m5

Note that this was explained in #3:

3. For quarterly data, I would carry the quarterly values over to the middle month of each quarter and then interpolate between.

Last edited by Nick Cox; 18 Apr 2020, 02:30.
Comment
Janys Ung

Join Date: Dec 2016

Posts: 35
#8

03 Jul 2020, 03:52

Dear Nick, Many thanks for your reply! Best wishes, Janys
Comment
Patrick Shin

Join Date: Mar 2018

Posts: 12
#9

12 Aug 2022, 00:44

Nick Cox First of all, thanks for sharing this information. I am curious that it works for monthly to daily. If so, how we create daily date from monthly? I know how to code the monthly, generate mdate = montly(sdate, YM), but somewhat confused how to create daily from it. In other words, how to create daily date? Thank you Nick.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35402
#10

12 Aug 2022, 03:16

Patrick Shin Please start a different thread and show a data example with examples of your mdate and sdate.
Comment
Patrick Shin

Join Date: Mar 2018

Posts: 12
#11

12 Aug 2022, 13:26

Nick Cox Thank you. I followed your methods to regenerate date variable (Monthly -> Daily). As you see below, the initial date looks like daily, but monthly. Once I run the code, daily and monthly variables created, but empty and no rows created like your initial instructions. I may mistake your guidelines, but it would be really appreciated if you advise on my code.

clear all

// Import Daily data

input str11 sdate GNP(%)
date TNA
2019-12-31 0.743
2020-01-31 1.037
2020-02-29 1.37
2020-03-31 1.923
2020-04-30 3.132
2020-05-31 3.593
2020-06-30 4.401
2020-07-31 6.694
2020-08-31 7.213
2020-09-30 7.952
2020-10-31 11.893
2020-11-30 16.556
2020-12-31 37.326
2021-01-31 56.148
2021-02-28 71.492
2021-03-31 74.326
2021-04-30 85.461
2021-05-31 98.696
2021-06-30 94.388
2021-07-31 106.678
2021-08-31 126.547
2021-09-30 126.002
2021-10-31 151.083
2021-11-30 153.071
2021-12-31 130.322
2022-01-31 98.111
end

generate mdate = monthly(sdate, "YMD")
format mdate %tm
generate ddate = day(dofc(mdate))
tsset ddate
format ddate %td
tsfill
sort ddate
Comment
Patrick Shin

Join Date: Mar 2018

Posts: 12
#12

12 Aug 2022, 13:27

Nick Cox I will repost this work. Thanks for your advice.
Comment
amy farzana

Join Date: Jul 2024

Posts: 1
#13

12 Nov 2024, 19:51

Can anybody help me convert annual data of 71 countries into quarterly data ? I tried using E views and stata but it gives values only for 1 country not for all countries?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35402
#14

13 Nov 2024, 02:35

amy farzana I doubt that anyone can guess what you did without seeing your code and a data example. Otherwise the best answer from previous posts in this thread is to consider whether the Denton method can do what you want.
Comment

Announcement

Interpolation : mipolate, stripolate, ... ?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment