Restructuring data: changing rows to uniquely reflect individuals rather than (one or more) income sources per individual

Zoheb Khan

Join Date: Jul 2015

Posts: 23
#1

Restructuring data: changing rows to uniquely reflect individuals rather than (one or more) income sources per individual

31 Mar 2024, 18:52

Hi everyone

I have a dataset that looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(pid hhid) byte cod_tipo_ocup double(val_renm_bruto renda_total) 1 1 2 2000 6547.82 1 1 1 3000 6547.82 2 1 1 620 6547.82 3 2 1 999999 35222.59 4 2 1 7000 35222.59 4 2 2 5000 35222.59 5 2 1 999999 35222.59 6 3 1 999999 3142.23 7 3 2 640 3142.23 7 3 1 488 3142.23 end

The data records income received from individuals from various sources. pid is the individual identifier; and hhid is the household identifier. Some variables measure details of the income received - eg cod_tipo_ocup is about the type of income source; and val_renm_bruto is the gross income earned from the particular job. renda_total is total individual income.

Some people have more than one source of income - eg pids 1, 4, and 7.

How do I restructure this dataset so that it contains one individual per row (7 rows/observations), without losing any of the information for the second (or third) sources of income, while maintaining the constants like renda_total?

Any advice using loops would also be great, as the full dataset has 30+ variables and about 150,000 observations. I'm using Stata v16.1, Windows 11.

Thanks,
Zoheb
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2396
#2

31 Mar 2024, 19:59

The best rule of thumb in Stata is that if you think you need a loop, you don't. For your task, see -help reshape wide-. In your example, the pid uniquely identifies an individual, and the hhid is not needed for that purpose. Assuming that applies to your actual data set, try this:

Code:

reshape wide val_renm_bruto , i(pid) j(cod_tipo_ocup)
Comment
Zoheb Khan

Join Date: Jul 2015

Posts: 23
#3

01 Apr 2024, 08:02

Thanks Mike. This works - a much better solution than pursuing loops.
Comment

Announcement

Restructuring data: changing rows to uniquely reflect individuals rather than (one or more) income sources per individual

Comment

Comment