Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extrapolating values within a specific timeframe (epolate)

    Hi,
    I am working on a dataset with the Gini coefficient for different country-years. Since the original dataset has some missing values, I have used ipolate to generate interpolated values with the following command:

    Code:
    by countryname: ipolate gini year, gen(gini_int)
    The dataset then looks like this:

    Code:
    countryname year gini gini_int
    Argentina 1975 . . .
    Argentina 1976 . . .
    Argentina 1977 . . .
    Argentina 1978 . . .
    Argentina 1979 . . .
    Argentina 1980 40.8 40.799999
    Argentina 1981 . 41.133333
    Argentina 1982 . 41.466666
    Argentina 1983 . 41.799999
    Argentina 1984 . 42.133333
    Argentina 1985 . 42.466666
    Argentina 1986 42.8 42.799999
    Argentina 1987 45.3 45.299999
    Argentina 1988 . 45.674999
    Argentina 1989 . 46.049999
    Argentina 1990 . 46.424999
    Argentina 1991 46.8 46.799999

    Now I would like to estimate the value for years outside of the data range, such as 1978 or 1979. The option epolate, however, extrapolates values for all the years present. I would like to restrict the extrapolation only to a three-years timeframe (three years before the first non-missing value and three years after the last non-missing value). For example, in this case, I would like to extrapolate values for 1977–1979 but not for 1976 or 1975.

    Thanks a lot!

  • #2
    You can do the interpolation/extrapolation in one step, with a little bit of preparation:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 countryname int year float(gini gini_int)
    "Argentina" 1975    .        .
    "Argentina" 1976    .        .
    "Argentina" 1977    .        .
    "Argentina" 1978    .        .
    "Argentina" 1979    .        .
    "Argentina" 1980 40.8     40.8
    "Argentina" 1981    . 41.13333
    "Argentina" 1982    . 41.46667
    "Argentina" 1983    .     41.8
    "Argentina" 1984    . 42.13333
    "Argentina" 1985    . 42.46667
    "Argentina" 1986 42.8     42.8
    "Argentina" 1987 45.3     45.3
    "Argentina" 1988    .   45.675
    "Argentina" 1989    .    46.05
    "Argentina" 1990    .   46.425
    "Argentina" 1991 46.8     46.8
    end
    
    by country (year), sort: egen first_year = min(cond(!missing(gini), year, .))
    by country (year): egen last_year = max(cond(!missing(gini), year, .))
    ipolate gini year if inrange(year, first_year - 3, last_year + 3), epolate gen(wanted)
    In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      The code worked, but I needed to add the condition “by country” before the last line, as well. Otherwise, I had the impression the interpolation did not happen within the specific country-context, but rather by pooling all countries together. In the end, it looked like this:

      Code:
      by country (year), sort: egen first_year = min(cond(!missing(gini), year, .))
      by country (year): egen last_year = max(cond(!missing(gini), year, .))
      by country: ipolate gini year if inrange(year, first_year - 3, last_year + 3), epolate gen(wanted)

      Thanks a lot for your help and for the clarification about dataex, too!

      Comment


      • #4
        Yes, you are right, the -ipolate- command should have been -by country-. I apologize for the error. I tested the code on your example data before posting it, but because the example data only had one country, I didn't notice the problem.

        Comment

        Working...
        X