Hello,
since I am a rookie in Stata and in the forum also i apologize in advance if you consider that my question has been answered many times in the past. I tried to read all the relative threads and still can't implement what I want to.
I have a panel dataset in a long format, with different IDs (townships, cities, counties) vertically and hundreds of variables horizontally extending from 1972 to 2011. In specific IDs (which are the most important cities etc), there is full coverage for all these 40years but for the majority, there is not. What I want to do is to use the IDs with frequency 40 (years) as a base to interpolate the other IDs with missing years for all the variables.
I attach an example with just one variable (C101). As you can see, ID 2 has 40 obs while 1&3 do not. So I want to fill the missing values of 1&3 using as a base the ID 2. I want to apply the same to dozens of variables. The only thing that i managed to do was to fill the missing values using the same ID itself with the specific code:
"by id, sort : ipolate C101 year4, generate(C101_) epolate", but it is not what i want.
Thanks in advance for any help.
since I am a rookie in Stata and in the forum also i apologize in advance if you consider that my question has been answered many times in the past. I tried to read all the relative threads and still can't implement what I want to.
I have a panel dataset in a long format, with different IDs (townships, cities, counties) vertically and hundreds of variables horizontally extending from 1972 to 2011. In specific IDs (which are the most important cities etc), there is full coverage for all these 40years but for the majority, there is not. What I want to do is to use the IDs with frequency 40 (years) as a base to interpolate the other IDs with missing years for all the variables.
I attach an example with just one variable (C101). As you can see, ID 2 has 40 obs while 1&3 do not. So I want to fill the missing values of 1&3 using as a base the ID 2. I want to apply the same to dozens of variables. The only thing that i managed to do was to fill the missing values using the same ID itself with the specific code:
"by id, sort : ipolate C101 year4, generate(C101_) epolate", but it is not what i want.
Thanks in advance for any help.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long id int year4 long C101 1 1972 1269 1 1973 1284 1 1974 . 1 1975 . 1 1976 . 1 1977 1804 1 1978 . 1 1979 2374 1 1980 2583 1 1981 2742 1 1982 3037 1 1983 3557 1 1984 3541 1 1985 4546 1 1986 4305 1 1987 4921 1 1988 . 1 1989 . 1 1990 . 1 1991 . 1 1992 4709 1 1993 . 1 1994 . 1 1995 . 1 1996 . 1 1997 6951 1 1998 . 1 1999 . 1 2000 8302 1 2001 . 1 2002 10579 1 2003 . 1 2004 11386 1 2005 11843 1 2006 13065 1 2007 16235 1 2008 16263 1 2009 15076 1 2010 15619 1 2011 15278 2 1972 3430 2 1973 2871 2 1974 3794 2 1975 4435 2 1976 4656 2 1977 6301 2 1978 7310 2 1979 7272 2 1980 7905 2 1981 8683 2 1982 5497 2 1983 7251 2 1984 8461 2 1985 13501 2 1986 14412 2 1987 13954 2 1988 12789 2 1989 13300 2 1990 13781 2 1991 18423 2 1992 18846 2 1993 19713 2 1994 23874 2 1995 23723 2 1996 29323 2 1997 35062 2 1998 49166 2 1999 45204 2 2000 50603 2 2001 50907 2 2002 56031 2 2003 59660 2 2004 68591 2 2005 80774 2 2006 141199 2 2007 112384 2 2008 102728 2 2009 109787 2 2010 115887 2 2011 112060 3 1972 1198 3 1973 . 3 1974 . 3 1975 . 3 1976 . 3 1977 1595 3 1978 . 3 1979 . 3 1980 . 3 1981 . 3 1982 2930 3 1983 . 3 1984 . 3 1985 3252 3 1986 . 3 1987 3010 3 1988 . 3 1989 3256 3 1990 3374 3 1991 3458 end format %ty year4 label values id id_ label def id_ 1 "011001001", modify label def id_ 2 "011002002", modify label def id_ 3 "011003003", modify
Comment