Hi all,
I have panel data on a continuous variable where all individuals are missing data for different time periods. I would like to use OLS predictions to impute the missing values using a set of fixed effects and the nearest geographic neighbors values. However, as all individuals are missing some data, using only the nearest geographic neighbor will not result in a balanced panel. For example, consider a situation where the data for all individuals is in wide format where var1 is data for individual 1, var2 is the data for individual 2, and so on.
does not result in a balanced panel as var1 and var2 are both missing data in period 2. Likewise, using var3 as the independent variable suffers from a similar problem.
Ideally, what I would like to do (but I am struggling to accomplish) is construct a model similar to
var1=B0+A1 * B1 * var2 + A2 * B2 var3 + ....
where A1=1 if var2 is the nearest neighbor & var2!=. else A1=0
A2=1 if A1=0 and var3!=. else A2=0
and this goes on for many individuals that grow more geographically distant as the `i' of var`i' increases until the closest individual with data for this time period is found.
Any help or pointers to coding this would be greatly appreciated.
----Alternatively, I have looked into using the -ice- user written command available on ssc, and imputing the missing values using all neighbors via switching regressions, but I fear this may be inefficient from a statistical stand point although I am still new to the technique.
Thanks,
Neil
I have panel data on a continuous variable where all individuals are missing data for different time periods. I would like to use OLS predictions to impute the missing values using a set of fixed effects and the nearest geographic neighbors values. However, as all individuals are missing some data, using only the nearest geographic neighbor will not result in a balanced panel. For example, consider a situation where the data for all individuals is in wide format where var1 is data for individual 1, var2 is the data for individual 2, and so on.
Code:
clear set obs 5 gen period=_n gen var1=uniform() gen var2=uniform() gen var3=uniform() gen imp_var1=. replace var1=. if period==2 replace var1=. if period==4 replace var2=. if period==2 replace var2=. if period==3 replace var3=. if period==4 reg var1 var2 predict yhat replace imp_var1=yhat if var1==.
Ideally, what I would like to do (but I am struggling to accomplish) is construct a model similar to
var1=B0+A1 * B1 * var2 + A2 * B2 var3 + ....
where A1=1 if var2 is the nearest neighbor & var2!=. else A1=0
A2=1 if A1=0 and var3!=. else A2=0
and this goes on for many individuals that grow more geographically distant as the `i' of var`i' increases until the closest individual with data for this time period is found.
Any help or pointers to coding this would be greatly appreciated.
----Alternatively, I have looked into using the -ice- user written command available on ssc, and imputing the missing values using all neighbors via switching regressions, but I fear this may be inefficient from a statistical stand point although I am still new to the technique.
Thanks,
Neil
Comment