Hi, I'm using COLLAPSE to compute sums of variables by persons (who have unique ids) and by year. The data set is a network panel, in which some variables refer to 'neighbors' of persons in a given year. For this reason there can be multiple observations by person and year in the input file, in which each neighbor is a person with their own unique id. The observations are uniquely identified by ids for persons and their neighbors and by year. The observations can assume missing values (.) because data on persons and neighbors occur in spells of varying lengths over time.
I'm assuming that COLLAPSE excludes missing values from the sums. Maybe. But here is another question: what happens when all the observations by person and year for a given variable are missing? Does COLLAPSE drop the observation completely, or does it insert a zero? If it inserts a zero, which my results argue that it does, then I have a problem, because sometime the sum is really zero and at other times the sum is truly missing.
The crux of the identification problem is a requirement that missing values be distinguished from zeroes in the final file that is aggregated by person and year.
I'm sorry to issue this post, but the COLLAPSE documentation does not answer this question.
I'm assuming that COLLAPSE excludes missing values from the sums. Maybe. But here is another question: what happens when all the observations by person and year for a given variable are missing? Does COLLAPSE drop the observation completely, or does it insert a zero? If it inserts a zero, which my results argue that it does, then I have a problem, because sometime the sum is really zero and at other times the sum is truly missing.
The crux of the identification problem is a requirement that missing values be distinguished from zeroes in the final file that is aggregated by person and year.
I'm sorry to issue this post, but the COLLAPSE documentation does not answer this question.
Comment