Hi. I am trying to replace the missing values of a variable with non-missing values of the same variable by psu and hhid. Please help.
My dataset has 4 variables: psu, hhid, idcode, total_migrants and around 160,000 observations. Some idcode has total_migrants values but some has missing. Note that, for all unique combinations of psu and hhid, total_migrants should be the same; i.e. the total number of migrants from each household should be the same for all household members i.e. for all idcode. How do I achieve this? My dataset looks something like this:
psu hhid
Many thank you in advance!
My dataset has 4 variables: psu, hhid, idcode, total_migrants and around 160,000 observations. Some idcode has total_migrants values but some has missing. Note that, for all unique combinations of psu and hhid, total_migrants should be the same; i.e. the total number of migrants from each household should be the same for all household members i.e. for all idcode. How do I achieve this? My dataset looks something like this:
psu hhid
psu | hhid | idcode | total_migrants |
1001 | 1 | 1 | . |
1001 | 1 | 2 | 1 |
1001 | 1 | 3 | . |
1002 | 2 | 1 | . |
1002 | 2 | 2 | 2 |
1002 | 3 | 3 | . |
1002 | 3 | 4 | 2 |
1003 | 3 | 1 | 0 |
1003 | 3 | 2 | 0 |
Comment