Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extrapolate values from previous years

    Good morning,

    I have a panel data in which the variable “patent count” is collected in a year base from 2003 to 2014 for a sample of 12,850 firms, but the year 2015 is not included. In order to generate the patent count for 2015 using "ipolate", is it recommended to use all the years available in the panel?

    Also in the same data base I have a variable name “organizational innovation” in which the organization is asked “who developed de organizational innovation in the period t-2 to t? This variable has been collected every year from 2008 to 2013, but the information for 2014 is not included. If I want to use “ipolate” to generate the data for 2014, should I use all the years available (from 2008 to 2013) or only the las two years (2013 and 2012)?

    Thank you very much in advance.

  • #2
    Both questions are really the same question. If you use ipolate then the method used is linear interpolation -- or in this case extrapolation. That means (always) that the last two known values are being used to extrapolate forward and/or that the first two known values are being used to extrapolate backward. There is no choice of which values to use -- unless exceptionally you ignore part of the data using if and/or in.

    If your expertise lead you to thinking that you should use more information than two previous values then that is used either by using a different method of interpolation (check out mipolate from SSC) or by fitting a time series model and forecasting forward.

    Comment


    • #3
      Thank you very much for the advice.!!

      Comment


      • #4
        Originally posted by Nick Cox View Post
        Both questions are really the same question. If you use ipolate then the method used is linear interpolation -- or in this case extrapolation. That means (always) that the last two known values are being used to extrapolate forward and/or that the first two known values are being used to extrapolate backward. There is no choice of which values to use -- unless exceptionally you ignore part of the data using if and/or in.

        If your expertise lead you to thinking that you should use more information than two previous values then that is used either by using a different method of interpolation (check out mipolate from SSC) or by fitting a time series model and forecasting forward.
        In case that I want to use mipolate to extrapolate the missing data in my panel data, how can i know which option of mipolate should use? When I use linear (the default method), it generates negative values in some of the missing data that are replaced in the new variable. As the variables included in the data base provide information about the number of request of different types of patents, the values of the variables can not be negatives.

        My panel data looks like this:

        ident year patnum patoepm patepo patuspto patpct
        20 2011 1 1 0 0 0
        20 2012 1 1 1 0 1
        20 2013 1 1 1 0 1
        20 2014 1 1 0 0 0
        20 2015 . . . . .
        22 2011 8 4 4 0 0
        22 2012 7 3 0 0 4
        22 2013 6 3 0 0 3
        22 2014 7 7 0 2 0
        22 2015 . . . . .
        30 2011 0 0 0 0 0
        30 2012 0 0 0 0 0
        30 2013 1 0 0 0 1
        30 2014 1 1 0 0 0
        30 2015 . . . . .
        48 2011 2 0 0 0 2
        48 2012 2 0 0 0 2
        48 2013 0 0 0 0 0
        48 2014 1 1 0 0 0
        48 2015 . . . . .
        57 2011 1 1 0 0 0
        57 2012 12 12 0 0 0
        57 2013 3 3 0 0 0
        57 2014 1 1 0 0 0
        57 2015 . . . . .
        82 2011 0 0 0 0 0
        82 2012 0 0 0 0 0
        82 2013 0 0 0 0 0
        82 2014 0 0 0 0 0
        82 2015 . . . . .
        88 2011 0 0 0 0 0
        88 2012 1 1 0 0 0
        88 2013 1 1 0 0 0
        88 2014 1 1 0 0 0
        88 2015 . . . . .

        As you can see, I have missing data in the observations of 2015: Therefore, I would like to use mipolate to replace the missing data using information from previous years. What option of mipolate should I use?

        Thank you very much in advance for the help.

        Comment


        • #5
          There isn't a flip answer. Indeed, some of your variables look to be categorical or counted. The simplest conclusion I draw from your data example is that you have missings for 2015 in all observations and without an obvious rationale extrapolation by interpolation looks like a bad idea.

          The original leading motivation for interpolation was to get a finer mesh when calculating functions known to be smooth. Then a leading application became filling gaps, especially but not only for time series. But extrapolation has always been risky and extrapolation to extend time series is no kind of white magic. Increasingly, good methods tolerate gaps and make it doubtful whether interpolation is the tool of choice at all: there is no statistical check or model involved, which is a downside for many.

          Now, I wrote mipolate and do use it from time to time, but that doesn't mean that I think using it is always a good idea.

          Re-reading #1 I am led to the view that your panel data stop in 2014 and that's the data you have. If you're interested in forecasting later years that's a different problem.
          Last edited by Nick Cox; 09 Aug 2017, 02:20.

          Comment

          Working...
          X