Hello 
Sorry for this long title, I do not know if it's very clear.
I am cleaning the World Value Survey database. It's a panel dataset, with a lot of variables. For some variables, there are a lot of missing values.
I want to compute the average of these variables, to merge them with a macroeconomic database at the country year level.
However, for my averages to be relevant, for each variables I need to have a certain threshold of non missing data per country per year (let's say less than 75% missings).
Therefore I was thinking about replacing by missing values all observations in each country year if the number of missings exceeds 75%.
I read and used this post, which is quite close to what I would like.
https://www.statalist.org/forums/for...issing-in-data
I tried a lot of things, but nothing works.
For example I tried:
* Generate some data
input x y
1 1
1 1
1 .
. .
end
* Percent missing above which we'll replace observations by missings
glo p=0.75
* Loop over variables
foreach var of varlist A* C* E* F* G* H* X* Y* {
count if `var'==., by (Origin year)
if (r(N)/_N) >= $p replace `var'==., by (Origin year)
}
But it does not work. Same for the other option proposed by Nick Cox in the post I linked.
I hope my problem is clear, it's quite difficult to describe it by text.
Best,
Killian Foubert

Sorry for this long title, I do not know if it's very clear.
I am cleaning the World Value Survey database. It's a panel dataset, with a lot of variables. For some variables, there are a lot of missing values.
I want to compute the average of these variables, to merge them with a macroeconomic database at the country year level.
However, for my averages to be relevant, for each variables I need to have a certain threshold of non missing data per country per year (let's say less than 75% missings).
Therefore I was thinking about replacing by missing values all observations in each country year if the number of missings exceeds 75%.
I read and used this post, which is quite close to what I would like.
https://www.statalist.org/forums/for...issing-in-data
I tried a lot of things, but nothing works.
For example I tried:
* Generate some data
input x y
1 1
1 1
1 .
. .
end
* Percent missing above which we'll replace observations by missings
glo p=0.75
* Loop over variables
foreach var of varlist A* C* E* F* G* H* X* Y* {
count if `var'==., by (Origin year)
if (r(N)/_N) >= $p replace `var'==., by (Origin year)
}
But it does not work. Same for the other option proposed by Nick Cox in the post I linked.
I hope my problem is clear, it's quite difficult to describe it by text.
Best,
Killian Foubert
Comment