Creating a variable for family income

Giorgio Nocerino

Join Date: Jun 2024

Posts: 11
#1

Creating a variable for family income

26 Jun 2024, 03:31

Hello!
I am trying to create a variable that captures the family income. The dataset I'm working on has a variable that identifies family and a variable that identifies the n member of the family.
I created a variable id =group(nfamily nmember).
The information about the family income is captured only for the indivual in the family that has nmember=1, I would like to take this information and assign it also to the other members of the family. The dataset is panel.

Can anyone help?
Thanks in advance
Tags: None
Daniel Feenberg

Join Date: Oct 2014

Posts: 323
#2

26 Jun 2024, 06:50

I don't quite understand your notation, but perhaps this is something like what you want:

Code:

egen family_income=total(income),by(family_id)

-total- works because all but one of the famly members will have missing income (or maybe zero)?.-min- or-mean- might also work, depending on what the other records look like.

See https://www.stata.com/manuals/degen.pdf and https://www.stata.com/manuals/dmissingvalues.pdf
1 like
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1400

26 Jun 2024, 07:42

The solution in #2 works if you want to add up the income of all family members. Your question is somewhat unclear; if you are instead looking for the income of one person (e.g. the head of the household, who is always identified as 1) then you might need something different.

Since you have not provided any data extract, I have made up some toy data below and shown the two solutions:

Code:

// CREATE TOY DATA
clear
input byte(family_id person_id) float income
1 1 2500
1 2 1100
1 3 .
1 4 400
2 3 1000
2 2 .
2 1 6000
end

// SOLUTION STARTS HERE
egen family_income = total(income), by(family_id)
egen head_income = max(cond(person_id == 1, income, .)), by(family_id)

which produces the following:

Code:

. list, sepby(family_id) noobs abbrev(13)

  +--------------------------------------------------------------+
  | family_id   person_id   income   family_income   head_income |
  |--------------------------------------------------------------|
  |         1           1     2500            4000          2500 |
  |         1           2     1100            4000          2500 |
  |         1           3        .            4000          2500 |
  |         1           4      400            4000          2500 |
  |--------------------------------------------------------------|
  |         2           3     1000            7000          6000 |
  |         2           2        .            7000          6000 |
  |         2           1     6000            7000          6000 |
  +--------------------------------------------------------------+

In future, it might be easier to provide a data extract (see the FAQ, esp section 12, on how to ask questions more effectively, and especially on how to provide a data extract). It is often also helpful to add a variable in the data extract that shows what values you would want your new variable to take for those observations.

Last edited by Hemanshu Kumar; 26 Jun 2024, 07:45.

Comment

ericmelse

Join Date: May 2014

Posts: 434
#4

26 Jun 2024, 09:01

Both Daniel and Hemanshu offer sound advise. But, my take of Giorgio's question is that he wants to replicate the family income that is in the data set stored for member '1' into the cases of the (new) income variable of the other member(s) of the same family.
To get this done code to run an iterative loop could do the task at hand, like (using the example of Hemanshu):

Code:

// CREATE TOY DATA clear input byte(family_id person_id) float income 1 1 2500 1 2 . 1 3 . 1 4 . 2 3 . 2 2 . 2 1 6000 3 1 1100 3 2 . end // Get income of each family first member and replicate that data for any other family member forvalues i = 1(1)3 { qui sum income if family_id==`i' & person_id==1 replace income=r(mean) if family_id==`i' & person_id!=1 }

Note that in the above code you could use a new variable for income as to maintain data integrity (something I always do).
Note also that the constraint set to the code using sum includes person_id==1 just to be certain that we are using that data as there is always the chance of some error in the data panel (missings for member 1 or two incomes present in the data for the same family).

http://publicationslist.org/eric.melse
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1400
#5

26 Jun 2024, 09:12

ericmelse As far as I can see, the code in #3

Code:

egen head_income = max(cond(person_id == 1, income, .)), by(family_id)

produces a variable with exactly the same values as income in #4. I imagine a difference could arise if there are multiple persons with id 1 in a family, but not otherwise. Or am I missing something?
Comment
Giorgio Nocerino

Join Date: Jun 2024

Posts: 11
#6

27 Jun 2024, 02:13

Originally posted by Hemanshu Kumar View Post

ericmelse As far as I can see, the code in #3

Code:

egen head_income = max(cond(person_id == 1, income, .)), by(family_id)

produces a variable with exactly the same values as income in #4. I imagine a difference could arise if there are multiple persons with id 1 in a family, but not otherwise. Or am I missing something?

thanks!! it worked perfectly
Comment

Announcement