Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linear Interpolation using specific IDs with full coverage to other IDs

    Hello,

    since I am a rookie in Stata and in the forum also i apologize in advance if you consider that my question has been answered many times in the past. I tried to read all the relative threads and still can't implement what I want to.

    I have a panel dataset in a long format, with different IDs (townships, cities, counties) vertically and hundreds of variables horizontally extending from 1972 to 2011. In specific IDs (which are the most important cities etc), there is full coverage for all these 40years but for the majority, there is not. What I want to do is to use the IDs with frequency 40 (years) as a base to interpolate the other IDs with missing years for all the variables.

    I attach an example with just one variable (C101). As you can see, ID 2 has 40 obs while 1&3 do not. So I want to fill the missing values of 1&3 using as a base the ID 2. I want to apply the same to dozens of variables. The only thing that i managed to do was to fill the missing values using the same ID itself with the specific code:

    "by id, sort : ipolate C101 year4, generate(C101_) epolate", but it is not what i want.

    Thanks in advance for any help.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id int year4 long C101
    1 1972   1269
    1 1973   1284
    1 1974      .
    1 1975      .
    1 1976      .
    1 1977   1804
    1 1978      .
    1 1979   2374
    1 1980   2583
    1 1981   2742
    1 1982   3037
    1 1983   3557
    1 1984   3541
    1 1985   4546
    1 1986   4305
    1 1987   4921
    1 1988      .
    1 1989      .
    1 1990      .
    1 1991      .
    1 1992   4709
    1 1993      .
    1 1994      .
    1 1995      .
    1 1996      .
    1 1997   6951
    1 1998      .
    1 1999      .
    1 2000   8302
    1 2001      .
    1 2002  10579
    1 2003      .
    1 2004  11386
    1 2005  11843
    1 2006  13065
    1 2007  16235
    1 2008  16263
    1 2009  15076
    1 2010  15619
    1 2011  15278
    2 1972   3430
    2 1973   2871
    2 1974   3794
    2 1975   4435
    2 1976   4656
    2 1977   6301
    2 1978   7310
    2 1979   7272
    2 1980   7905
    2 1981   8683
    2 1982   5497
    2 1983   7251
    2 1984   8461
    2 1985  13501
    2 1986  14412
    2 1987  13954
    2 1988  12789
    2 1989  13300
    2 1990  13781
    2 1991  18423
    2 1992  18846
    2 1993  19713
    2 1994  23874
    2 1995  23723
    2 1996  29323
    2 1997  35062
    2 1998  49166
    2 1999  45204
    2 2000  50603
    2 2001  50907
    2 2002  56031
    2 2003  59660
    2 2004  68591
    2 2005  80774
    2 2006 141199
    2 2007 112384
    2 2008 102728
    2 2009 109787
    2 2010 115887
    2 2011 112060
    3 1972   1198
    3 1973      .
    3 1974      .
    3 1975      .
    3 1976      .
    3 1977   1595
    3 1978      .
    3 1979      .
    3 1980      .
    3 1981      .
    3 1982   2930
    3 1983      .
    3 1984      .
    3 1985   3252
    3 1986      .
    3 1987   3010
    3 1988      .
    3 1989   3256
    3 1990   3374
    3 1991   3458
    end
    format %ty year4
    label values id id_
    label def id_ 1 "011001001", modify
    label def id_ 2 "011002002", modify
    label def id_ 3 "011003003", modify
    Last edited by Lefteris Andreadis; 17 Apr 2022, 02:24.

  • #2
    Your question has gone unanswered for over 24 hours now. While I can only speak for myself, I suspect that others have passed it over for the same reason I have: it is not at all clear what you want to do. Interpolation is exactly what -ipolate- does. And interpolation works within individual IDs and does not use information from other IDs. So, whatever you hope to accomplish, it is evidently not interpolation.

    But what is it? What does it mean to "use the IDs with frequency 40 (years) as a base to interpolate the other IDs with missing years for all the variables." I think you should provide a lengthier explanation of the process you have in mind, and perhaps illustrate how you would do the calculation by hand if you had a much smaller data set to work with. Perhaps even create a toy data set with a couple of "complete" ids and one or two with missing data and show what the result would look like.

    Comment

    Working...
    X