Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpolation of variable y between a inconsistent range of variable x

    Hello,

    I’m working with longitudinal data, for which I have quite a few variables matched by country (variable name countryname) and year (variable name year). I’m looking at data between 2014 and 2020 for each country, and need to interpolate the missing observations for my covariates.

    The tricky part is that I should not interpolate any missing values for years before the earliest year for which there is data, or after the last year with data for any given country. For example, if there is data for Botswana in 2016, 2018, and 2019, I would only want to interpolate between 2016 and 2019.

    The problem is that the range of years that I would need to interpolate between varies country to country. Is there a way to interpolate missing values while indicating that the interpolation needs to be EXCLUSIVELY between whatever the first and last year of observations happen to be for each country? I’m ideally looking for code that would generate one new variable (e.g. covariate_new) with all the interpolated data for all the countries instead of having to go through country by country and merge them later.

    I hope this was a clear enough description. Thank you in advance for any advice you might have!
    Last edited by Liv Miscall; 16 Feb 2023, 17:25.

  • #2
    not exactly sure why ipolate doesn't work. But if not, then you might try this.

    xtset your data
    bys id year: g t = _n
    ipolate t xvar, gen(newxvar)



    Comment


    • #3
      Generating values outside the data range is referred to as extrapolation, which is not the default in ipolate. The way you interpolate with panel data is by using the bysort prefix. See mipolate from SSC for other interpolation methods other than linear interpolation.

      Code:
      bysort countryname (year): ipolate varx year, g(wanted)

      Comment


      • #4
        I've created a demonstration data set that I think is similar to your situation. The following code interpolates for all non-initial and non-terminal gaps in the data, without extrapolating beyond the earliest and latest years originally available. (Ignore the variable u in this data set: it was involved in creating the example, but is not relevant to the question posed.)

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int country float(u year x)
         1  9.206752 2014 8.6189785
         1  9.206752 2016  9.609908
         1  9.206752 2018  14.18375
         1  9.206752 2019 16.201302
         1  9.206752 2020 16.744442
         2  3.615079 2014  3.981203
         2  3.615079 2015  3.952614
         2  3.615079 2017  9.799328
         2  3.615079 2020 10.793972
         3  9.283744 2014  9.415925
         3  9.283744 2015 12.621593
         3  9.283744 2016 13.048636
         3  9.283744 2019 16.512445
         3  9.283744 2020  17.89896
         4  9.051945 2014   9.93477
         4  9.051945 2015 10.330553
         4  9.051945 2016 12.611857
         4  9.051945 2017 13.557355
         4  9.051945 2019  16.34025
         4  9.051945 2020  14.77269
         5 4.2060065 2016  8.759401
         5 4.2060065 2018  9.277672
         5 4.2060065 2019 11.138432
         5 4.2060065 2020  9.672301
         6 1.9526895 2016 4.7398934
         6 1.9526895 2017  5.538103
         6 1.9526895 2018  7.620443
         6 1.9526895 2019  9.262527
         7   3.09179 2014 1.7140416
         7   3.09179 2018  8.292501
         7   3.09179 2019  8.903159
         8  7.719056 2014  7.671502
         8  7.719056 2015  9.479094
         8  7.719056 2016 11.013272
         8  7.719056 2017 11.015412
         8  7.719056 2018 14.348335
         8  7.719056 2020 15.885188
         9   3.82849 2014  3.634507
         9   3.82849 2016  6.075161
         9   3.82849 2018  8.301526
        10 13.044621 2014 13.278474
        10 13.044621 2015 13.708678
        10 13.044621 2017 17.986542
        10 13.044621 2018 17.213554
        10 13.044621 2020 17.626713
        end
        
        xtset country year
        tsfill
        by country (year), sort: ipolate x year, gen(x_interpolated)
        In the future, when asking for help with code, please show example data, and use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        When asking for help with code, always show example data. When showing example data, always use -dataex-.

        Added: Crossed with #2 and #3. My solution is different from theirs in that I am assuming that O.P.'s data set contains no observations for the years where the observations are said to be "missing" and creates them. Solutions in #2 and #3 assume that there are observations for these years, and the variables in question contain missing values in those years.
        Last edited by Clyde Schechter; 16 Feb 2023, 17:36.

        Comment


        • #5
          The term missing in a Stata context generally implies missing values on one or more variables in observations present in the dataset. Here observations again is taken in a Stata sense to mean entire cases, records or rows in a dataset.

          Observations that might have been present in a dataset but are not might fairly be described as absent.

          Comment


          • #6
            Thank you for your replies, here is an example of my data. If you look at Algeria, that is an example of a country which I would only want interpolation between 2016 and 2020. However, thus far I've only figured out how to interpolate within a prespecified range

            For instance:
            bysort countryname (year): ipolate pricediff year if !missing(pricediff) | (year>=2015 & year<=2019), gen(pricediff_new) e


            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str52 countryname int year float pricediff
            "Afghanistan" 2014 .849711
            "Afghanistan" 2015 .
            "Afghanistan" 2016 .
            "Afghanistan" 2017 .
            "Afghanistan" 2018 .8689321
            "Afghanistan" 2019 .
            "Afghanistan" 2020 .8823529
            "Albania" 2014 .5538462
            "Albania" 2015 .
            "Albania" 2016 .4326531
            "Albania" 2017 .
            "Albania" 2018 .3758389
            "Albania" 2019 .
            "Albania" 2020 .3939394
            "Algeria" 2014
            "Algeria" 2015 .
            "Algeria" 2016 .6421053
            "Algeria" 2017 .
            "Algeria" 2018 .8509804
            "Algeria" 2019 .
            "Algeria" 2020 .5
            "Andorra" 2014 .27136752
            "Andorra" 2015 .
            "Andorra" 2016 .27249357
            "Andorra" 2017 .
            "Andorra" 2018 .22037914
            "Andorra" 2019 .
            "Andorra" 2020 .3155007
            "Angola" 2014 .2524272
            "Angola" 2015 .
            "Angola" 2016 .
            "Angola" 2017 .
            "Angola" 2018 .
            "Angola" 2019 .
            "Angola" 2020 .75
            "Antigua and Barbuda" 2014 .25
            "Antigua and Barbuda" 2015 .
            "Antigua and Barbuda" 2016 .5
            "Antigua and Barbuda" 2017 .
            "Antigua and Barbuda" 2018 .3310811
            "Antigua and Barbuda" 2019 .
            "Antigua and Barbuda" 2020 .3794326
            "Argentina" 2014 .57894737
            "Argentina" 2015 .
            "Argentina" 2016 .6217228
            "Argentina" 2017 .
            "Argentina" 2018 .5458515
            "Argentina" 2019 .
            "Argentina" 2020 .8127388
            "Armenia" 2014 .7027027
            "Armenia" 2015 .
            "Armenia" 2016 .7301587
            "Armenia" 2017 .
            "Armenia" 2018 .5753425
            "Armenia" 2019 .
            "Armenia" 2020 .525
            "Australia" 2014 .21086077
            "Australia" 2015 .
            "Australia" 2016 .2613838
            "Australia" 2017 .
            "Australia" 2018 .17672184
            "Australia" 2019 .
            "Australia" 2020 .28978103
            "Austria" 2014 .19512194
            "Austria" 2015 .
            "Austria" 2016 .1996403
            "Austria" 2017 .
            "Austria" 2018 .2
            "Austria" 2019 .
            "Austria" 2020 .1754386
            "Azerbaijan" 2014 .7601246
            "Azerbaijan" 2015 .
            "Azerbaijan" 2016 .43564355
            "Azerbaijan" 2017 .
            "Azerbaijan" 2018 .7712418
            "Azerbaijan" 2019 .
            "Azerbaijan" 2020 .5294118
            "Bahamas, The" 2014 .5134146
            "Bahamas, The" 2015 .
            "Bahamas, The" 2016 .5977654
            "Bahamas, The" 2017 .
            "Bahamas, The" 2018 .
            "Bahamas, The" 2019 .
            "Bahamas, The" 2020 .38910505
            "Bahrain" 2014 .5
            "Bahrain" 2015 .
            "Bahrain" 2016 .6666667
            "Bahrain" 2017 .
            "Bahrain" 2018 .599624
            "Bahrain" 2019 .
            "Bahrain" 2020 .6521739
            "Bangladesh" 2014 .8408163
            "Bangladesh" 2015 .
            "Bangladesh" 2016 .772242
            "Bangladesh" 2017 .
            "Bangladesh" 2018 .6655052
            "Bangladesh" 2019 .
            "Bangladesh" 2020 .6785714
            "Barbados" 2014 .20657894
            "Barbados" 2015 .
            end
            [/CODE]

            I'm sorry for the data and code not appearing nicely in a gray box. I'm very new here

            Comment


            • #7
              For instance:
              bysort countryname (year): ipolate pricediff year if !missing(pricediff) | (year>=2015 & year<=2019), gen(pricediff_new) e
              As the replies above suggest, the command without including the -if- qualifier gives you the desired outcome. You are undercutting yourself by including extraneous conditions and adding an -extrapolate- option.

              Code:
              bysort countryname (year): ipolate pricediff year, gen(pricediff_new)

              Comment

              Working...
              X