Hi all,
I would like to compute a share of ID with missing values (of varlist) above certain threshold and export in excel file.
To visualize what I have for example:
And for instance, if I say I want to generate dummy variable to treat missing var1-3 above 50% of number of observations by each ID (-> and round them), then:
missing_var should be either 0 or 1 by each ID; threshold should be calculated by 50% * (nb of obs by ID) and round them.
And based on this, I would like to compute share of missing values at ID-level. So in this example, share of missing values of var1 would be 1/3; of var2 would be 2/3 and var3 would be 2/3.
Could someone help me to implement this in stata, please ?
Thank you !!
I would like to compute a share of ID with missing values (of varlist) above certain threshold and export in excel file.
To visualize what I have for example:
ID | year | var1 | var2 | var3 |
1 | 2001 | . | 4 | 7 |
1 | 2002 | . | 4 | 7 |
1 | 2003 | . | 4 | . |
1 | 2004 | 333 | 4 | . |
1 | 2005 | 333 | 4 | . |
1 | 2006 | 333 | 4 | . |
2 | 2004 | 55 | 4 | . |
2 | 2005 | 6 | . | 8 |
2 | 2006 | . | . | 8 |
3 | 2003 | 7 | . | . |
3 | 2004 | 7 | . | 5 |
ID | year | var1 | var2 | var3 | threshold | missing_var1 | missing_var2 | missing_var3 |
1 | 2001 | . | 4 | 7 | 3 | 1 | 0 | 1 |
1 | 2002 | . | 4 | 7 | 3 | 1 | 0 | 1 |
1 | 2003 | . | 4 | . | 3 | 1 | 0 | 1 |
1 | 2004 | 333 | 4 | . | 3 | 1 | 0 | 1 |
1 | 2005 | 333 | 4 | . | 3 | 1 | 0 | 1 |
1 | 2006 | 333 | 4 | . | 3 | 1 | 0 | 1 |
2 | 2004 | 55 | 4 | . | 2 | 0 | 1 | 0 |
2 | 2005 | 6 | . | 8 | 2 | 0 | 1 | 0 |
2 | 2006 | . | . | 8 | 2 | 0 | 1 | 0 |
3 | 2003 | 7 | . | . | 1 | 0 | 1 | 1 |
3 | 2004 | 7 | . | 5 | 1 | 0 | 1 | 1 |
And based on this, I would like to compute share of missing values at ID-level. So in this example, share of missing values of var1 would be 1/3; of var2 would be 2/3 and var3 would be 2/3.
Could someone help me to implement this in stata, please ?
Thank you !!
Comment