Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpolation : mipolate, stripolate, ... ?

    Dear Members,

    I would like to have a better understanding of the different tools that can be used in order to do an interpolation.

    In this topic:
    https://www.statalist.org/forums/for...-interpolation

    Nic Cox put forward two new programs: mipolate and stripolate.
    1. What is the difference between these two?
    2. Working mostly on economic data, what criteria will help me to decide if I have to either use mipolate or stripolate?
    In my case I would like to transform these quarterly data into monthly. It is a GDP.

    y date
    16070.9 2014q1
    16136.9 2014q2
    16022.4 2014q3
    15989.1 2014q4
    15790 2015q1
    15572.7 2015q2
    15586.8 2015q3
    15542.1 2015q4
    15570.3 2016q1
    15511.1 2016q2
    15510.6 2016q3
    15598.8 2016q4
    15718.4 2017q1
    15880.5 2017q2
    15894.4 2017q3
    15894.4 2017q4

    Thank you for you help!

  • #2
    Nic [sic] Cox put forward two new programs: mipolate and stripolate.
    1. What is the difference between these two?
    -mipolate- is for use with numeric outcome variables, -stripolate- is for use with string outcome variables.

    2. Working mostly on economic data, what criteria will help me to decide if I have to either use mipolate or stripolate?
    It depends on whether the variable you are trying to interpolate is numeric or string. In economics, it would surprise me to see a situation where you need to interpolate string variables, but I am not an economist, so perhaps my perspective is too limited here.

    The value of -mipolate- is that it offers more ways of filling in the gaps than just simple linear interpolation, which was already available with official Stata's -ipolate- (although, if memory serves, -ipolate-, too, was originally written by Nick Cox before StataCorp took it over.) The real question you need to ask is which type of interpolation function, if any, is most appropriate for use with GDP data over the timeframe you are dealing with. There are numerous economists active on this Forum and perhaps one will give you some guidance here. But basically, you need help from an economist about this; it's not a statistical question.

    Comment


    • #3
      Clive's helpful advice just needs one small qualification. ipolate was introduced as part of official Stata a long while ago. I had no part in it. FWIW, I do not answer to Nic.

      On the specific problem here the simplest option is just to replicate each quarterly value 3 times. You can do that with mipolate

      Large principles and small practicalities here:

      1. There is no white magic in interpolation. There is more than one way to interpolate and no obvious one correct way. mipolate sticks to deterministic methods, and doesn't extend to smarter ones such as Gaussian processes. That's not a expression of disdain, just a matter of work to be done (by someone).

      2. There are no estimates of error without extra machinery being introduced. Interpolated values in most cases will be smoothed compared with the real unknown values. This uncertainty about uncertainty may affect (infect!) everything done with the interpolated data.

      3. For quarterly data, I would carry the quarterly values over to the middle month of each quarter and then interpolate between.

      4. For economic data, consider interpolation on logarithmic scale where it may make more sense, followed by exponentiation back. (For other measures, consider logit scale, etc.)

      5. Always plot results. (Never use dopey redundant titles for the time axis.)

      6. Always compare two or more methods. (Not done here by me, but I am laying down some guidance, not doing all the work.) My using pchip here is nothing but drawing attention to a method that works well for me in quite different problems. Whatever you do should be explained, which may dictate use of a very simple method.

      7. I have no idea here about measurement units, that is whether monthly estimates should be divided by 3. That is an important triviality.

      8. A constraint that monthly values sum to, or average to, quarterly values would be a separate matter.

      9. None of the methods in mipolate know or care about seasonality. They won't impute December or August according to anything special about December or August. (Economists usually regard seasonality as a confounded nuisance any way.)

      Code:
      clear
      input GDP str6 sdate
      16070.9 2014q1
      16136.9 2014q2
      16022.4 2014q3
      15989.1 2014q4
      15790 2015q1
      15572.7 2015q2
      15586.8 2015q3
      15542.1 2015q4
      15570.3 2016q1
      15511.1 2016q2
      15510.6 2016q3
      15598.8 2016q4
      15718.4 2017q1
      15880.5 2017q2
      15894.4 2017q3
      15894.4 2017q4
      end
      
      gen date = quarterly(sdate, "YQ")
      gen mdate = mofd(dofq(date)) + 1
      tsset mdate
      format mdate %tm
      tsfill
      sort mdate
      
      gen logGDP = log(GDP)
      
      mipolate logGDP mdate, gen(logGDP2) pchip
      gen iGDP = exp(logGDP2)
      
      set scheme s1color
      
      twoway connected iGDP mdate, ms(+) || scatter GDP mdate, ///
      legend(order(1 "guessed" 2 "known"))  xtitle("") yla(, ang(h)) ytitle(GDP, orient(horiz))
      Click image for larger version

Name:	pchipolate.png
Views:	1
Size:	35.7 KB
ID:	1438326

      Comment


      • #4
        Hello,

        Thank you both for your answers. It is very useful !

        Nick, thank you for the code and the figure. I have a way better understanding of it all.

        2. There are no estimates of error without extra machinery being introduced. Interpolated values in most cases will be smoothed compared with the real unknown values. This uncertainty about uncertainty may affect (infect!) everything done with the interpolated data.
        At first I didn't really understand what you meant by "smoothed" (maybe because I am not a native speaker), and how it would affect further calculations. In order to have a better understanding I decided to compare the mipolate(iGDP) method to two other interpolations provided by a colleague (Prof. Rapelanoro) :
        -the first from the Denton average match method.
        -the second from the Quadratic average.
        He used Eviews to run both of them. Values are here :

        Code:
        quadra    denton    iGDP
        16065.9    16077.6    16070.91
        16096.1    16119.6    16111.15
        16141.4    16141.6    16131.37
        16149.4    16143.6    16136.91
        16119.9    16125.5    16111
        16053.1    16048.6    16059.7
        16013.4    16019.4    16022.42
        16000.8    15999.3    16010.36
        16015.5    16024.7    16002.36
        15999.3    15995.2    15989.07
        15952.4    15947.3    15944.32
        15874.8    15859.1    15868.09
        15791.8    15790.7    15790
        15703.5    15720.3    15702.72
        15609.8    15610.8    15613.5
        15558.5    15564.1    15572.67
        15549.7    15543.1    15576.32
        15583.2    15590.8    15583.1
        15594.3    15588.9    15586.75
        15582.8    15580.6    15575.16
        15548.7    15546.2    15553.66
        15535.2    15539.4    15542.09
        15542.3    15540.7    15549.4
        15570    15573.9    15562.99
        15577.1    15573.6    15570.31
        15563.8    15563.5    15555.02
        15530.1    15522.2    15526.6
        15507.4    15509    15511.11
        15495.9    15502.3    15510.81
        15495.4    15497.6    15510.63
        15506.7    15507.3    15510.57
        15529.6    15526.8    15525.91
        15564.3    15564.8    15560.87
        15598.9    15597.7    15598.81
        15633.3    15634    15634.68
        15667.6    15672.3    15674.56
        15714.3    15716.9    15718.44
        15773.4    15766.2    15778.89
        15844.8    15848.5    15844.86
        15889.5    15886    15880.54
        15907.4    15907.1    15887.91
        15898.5    15891.8    15892.69
        15893.2    15894.9    15894.39
        15891.5    15896.4    15894.39
        15893.3    15896.4    15894.39
        15894.6    15894.9    15894.39
        And the graph is here :

        Click image for larger version

Name:	graph.png
Views:	1
Size:	87.5 KB
ID:	1438400


        It seems to me that Denton average match and Quadratic average methods are somehow more precise than the use of mipolate.
        It is nonetheless very hard for me to make a choice between Denton / Quatratic.

        i) What do you think about these results ? Do you agree ?
        ii) Are these two methods (Denton average match and Quadratic average) available on STATA ?

        There are numerous economists active on this Forum and perhaps one will give you some guidance here.
        I would be glad to have their opinion too, let's hope one of them check this post.

        Comment


        • #5
          Smoothed means that an interpolated series will be smoother than the true series, which you necessarily don't have. So, even in the simplest case of linear interpolation, it seems likely that the true series wiggles and waggles more than a linearly interpolated series.

          I don't know what you mean by precise.

          Precision means to me replicability of repeated measurements or estimates, in contrast to accuracy, which refers to lack of bias in attempts at a true value. I don't know how you assess interpolation under either heading as you have one outcome for given data only and no variability (and no true series).

          Precise is not a flexible synonym for, let's say, appropriate to my data and my problem.

          That aside, not only am I happy that you choose a method on those criteria, I encourage you to do that.

          As for Denton, that's an easy keyword to look for

          Code:
          ----------------------------------------------------------------------------------------------------------------
          search for Denton                                                                          (manual:  [R] search)
          ----------------------------------------------------------------------------------------------------------------
          
          Search of official help files, FAQs, Examples, SJs, and STBs
          
          Web resources from Stata and other users
          
          (contacting http://www.stata.com)
          
          1 package found (Stata Journal and STB listed first)
          ----------------------------------------------------
          
          denton from http://fmwww.bc.edu/RePEc/bocode/d
              'DENTON': module to interpolate a flow or stock series from low-frequency
              totals via proportional Denton method / denton computes the proportional
              Denton method of interpolation / of a low-frequency flow time series by
              use of an associated / high-frequency "indicator series", imposing the
          Like Clyde I am not an economist (except for A-level Economics Grade A; only British people are expected to understand that), but I understand this Denton method to be close to what you want, but you need other information in order to apply it. That's quite different from any method covered by mipolate.

          I have never used Eviews and don't know what quadratic average interpolation is there, but would welcome accessible references.

          Comment


          • #6
            Dear Nick,

            Many thanks for sharing the code in post #3. May I know why would you add a value of 1 to the following code?

            Code:
             gen mdate = mofd(dofq(date)) + 1
            My data starts in Q1. If I have that value of 1, my 'mdate' starts from m2 instead of m1. I am just thinking shouldn't it start at m1? My apologies if I've misunderstood the interpolation stuff.

            Best wishes,
            Janys
            Last edited by Janys Ung; 17 Apr 2020, 17:30.

            Comment


            • #7
              #6 Each quarter is three months, 1,2,3 to 10, 11, 12. Note that code like

              Code:
              . display %tm mofd(dofq(yq(2020, 2)))
               2020m4
              returns the first month of each quarter. It seems to me more logical to regard each quarter as centred [centered] on its middle or second month, one month later.

              Code:
              . display %tm mofd(dofq(yq(2020, 2))) + 1
               2020m5

              Note that this was explained in #3:

              3. For quarterly data, I would carry the quarterly values over to the middle month of each quarter and then interpolate between.
              Last edited by Nick Cox; 18 Apr 2020, 03:30.

              Comment


              • #8
                Dear Nick, Many thanks for your reply! Best wishes, Janys

                Comment


                • #9
                  Nick Cox First of all, thanks for sharing this information. I am curious that it works for monthly to daily. If so, how we create daily date from monthly? I know how to code the monthly, generate mdate = montly(sdate, YM), but somewhat confused how to create daily from it. In other words, how to create daily date? Thank you Nick.

                  Comment


                  • #10
                    Patrick Shin Please start a different thread and show a data example with examples of your mdate and sdate.

                    Comment


                    • #11
                      Nick Cox Thank you. I followed your methods to regenerate date variable (Monthly -> Daily). As you see below, the initial date looks like daily, but monthly. Once I run the code, daily and monthly variables created, but empty and no rows created like your initial instructions. I may mistake your guidelines, but it would be really appreciated if you advise on my code.


                      clear all

                      // Import Daily data

                      input str11 sdate GNP(%)
                      date TNA
                      2019-12-31 0.743
                      2020-01-31 1.037
                      2020-02-29 1.37
                      2020-03-31 1.923
                      2020-04-30 3.132
                      2020-05-31 3.593
                      2020-06-30 4.401
                      2020-07-31 6.694
                      2020-08-31 7.213
                      2020-09-30 7.952
                      2020-10-31 11.893
                      2020-11-30 16.556
                      2020-12-31 37.326
                      2021-01-31 56.148
                      2021-02-28 71.492
                      2021-03-31 74.326
                      2021-04-30 85.461
                      2021-05-31 98.696
                      2021-06-30 94.388
                      2021-07-31 106.678
                      2021-08-31 126.547
                      2021-09-30 126.002
                      2021-10-31 151.083
                      2021-11-30 153.071
                      2021-12-31 130.322
                      2022-01-31 98.111
                      end

                      generate mdate = monthly(sdate, "YMD")
                      format mdate %tm
                      generate ddate = day(dofc(mdate))
                      tsset ddate
                      format ddate %td
                      tsfill
                      sort ddate

                      Comment


                      • #12
                        Nick Cox I will repost this work. Thanks for your advice.

                        Comment


                        • #13
                          Can anybody help me convert annual data of 71 countries into quarterly data ? I tried using E views and stata but it gives values only for 1 country not for all countries?

                          Comment


                          • #14
                            amy farzana I doubt that anyone can guess what you did without seeing your code and a data example. Otherwise the best answer from previous posts in this thread is to consider whether the Denton method can do what you want.

                            Comment

                            Working...
                            X