Dear Stata Experts,
I have the following dataset where id (1 & 2) are company identifiers. Other variables are year, score, rank_score and amount. The original dataset has more companies and years with different scores.
There are repeated observations for each year (5 to 6 observations per year). I would like to drop year and id1 wise repeated observations; however, there is a problem(s).
For a given year and id1 if all observations in amount are missing I need to keep the one with the highest value in score.
For year 2000, id1 1004, there are 3 observations in variable amount that have values and 2 are missing. I would like to keep the observation that has non-missing observation with highest value in score i.e., row 1 with score of .0617.
For year 2000, id1 2004, I will need to keep observation that has the score .021 as that is the only one with a value for amount.
So I need help on how to code this conditional command on Stata.
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(year id1) long id2 float score byte rank_score1 long amount
2000 1004 5567 .0617 1 20000
2000 1004 9698 .0502 2 .
2000 1004 2033 .0447 3 5000
2000 1004 64433 .042 4 .
2000 1004 10983 .0309 5 5000
2001 1004 5567 .0822 1 .
2001 1004 9698 .0588 2 .
2001 1004 63147 .0423 3 .
2001 1004 2033 .0374 4 .
2001 1004 14638 .0276 5 .
2002 1004 5567 .0784 1 .
2002 1004 9698 .0767 2 .
2002 1004 64433 .0395 3 .
2002 1004 5229 .0365 4 .
2002 1004 2033 .0311 5 .
2000 2004 2033 .0975 1 .
2000 2004 9698 .0469 2 .
2000 2004 14638 .042 3 .
2000 2004 63876 .0254 4 .
2000 2004 5229 .021 5 6000
2001 2004 9698 .0435 1 .
2001 2004 5229 .0311 2 9000
2001 2004 2033 .0208 3 .
2001 2004 64433 .0155 4 .
2001 2004 11580 .0149 5 6000
2001 2004 63876 .0149 5 83125
2002 2004 9698 .0345 1 .
2002 2004 1914 .0279 2 .
2002 2004 64433 .0254 3 .
2002 2004 2033 .0162 4 .
2002 2004 5229 .0118 5 16609
end
I have the following dataset where id (1 & 2) are company identifiers. Other variables are year, score, rank_score and amount. The original dataset has more companies and years with different scores.
There are repeated observations for each year (5 to 6 observations per year). I would like to drop year and id1 wise repeated observations; however, there is a problem(s).
For a given year and id1 if all observations in amount are missing I need to keep the one with the highest value in score.
For year 2000, id1 1004, there are 3 observations in variable amount that have values and 2 are missing. I would like to keep the observation that has non-missing observation with highest value in score i.e., row 1 with score of .0617.
For year 2000, id1 2004, I will need to keep observation that has the score .021 as that is the only one with a value for amount.
So I need help on how to code this conditional command on Stata.
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(year id1) long id2 float score byte rank_score1 long amount
2000 1004 5567 .0617 1 20000
2000 1004 9698 .0502 2 .
2000 1004 2033 .0447 3 5000
2000 1004 64433 .042 4 .
2000 1004 10983 .0309 5 5000
2001 1004 5567 .0822 1 .
2001 1004 9698 .0588 2 .
2001 1004 63147 .0423 3 .
2001 1004 2033 .0374 4 .
2001 1004 14638 .0276 5 .
2002 1004 5567 .0784 1 .
2002 1004 9698 .0767 2 .
2002 1004 64433 .0395 3 .
2002 1004 5229 .0365 4 .
2002 1004 2033 .0311 5 .
2000 2004 2033 .0975 1 .
2000 2004 9698 .0469 2 .
2000 2004 14638 .042 3 .
2000 2004 63876 .0254 4 .
2000 2004 5229 .021 5 6000
2001 2004 9698 .0435 1 .
2001 2004 5229 .0311 2 9000
2001 2004 2033 .0208 3 .
2001 2004 64433 .0155 4 .
2001 2004 11580 .0149 5 6000
2001 2004 63876 .0149 5 83125
2002 2004 9698 .0345 1 .
2002 2004 1914 .0279 2 .
2002 2004 64433 .0254 3 .
2002 2004 2033 .0162 4 .
2002 2004 5229 .0118 5 16609
end
Comment