Hi everyone
I have a dataset that looks like this:
The data records income received from individuals from various sources. pid is the individual identifier; and hhid is the household identifier. Some variables measure details of the income received - eg cod_tipo_ocup is about the type of income source; and val_renm_bruto is the gross income earned from the particular job. renda_total is total individual income.
Some people have more than one source of income - eg pids 1, 4, and 7.
How do I restructure this dataset so that it contains one individual per row (7 rows/observations), without losing any of the information for the second (or third) sources of income, while maintaining the constants like renda_total?
Any advice using loops would also be great, as the full dataset has 30+ variables and about 150,000 observations. I'm using Stata v16.1, Windows 11.
Thanks,
Zoheb
I have a dataset that looks like this:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(pid hhid) byte cod_tipo_ocup double(val_renm_bruto renda_total) 1 1 2 2000 6547.82 1 1 1 3000 6547.82 2 1 1 620 6547.82 3 2 1 999999 35222.59 4 2 1 7000 35222.59 4 2 2 5000 35222.59 5 2 1 999999 35222.59 6 3 1 999999 3142.23 7 3 2 640 3142.23 7 3 1 488 3142.23 end
Some people have more than one source of income - eg pids 1, 4, and 7.
How do I restructure this dataset so that it contains one individual per row (7 rows/observations), without losing any of the information for the second (or third) sources of income, while maintaining the constants like renda_total?
Any advice using loops would also be great, as the full dataset has 30+ variables and about 150,000 observations. I'm using Stata v16.1, Windows 11.
Thanks,
Zoheb
Comment