Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a variable for family income

    Hello!
    I am trying to create a variable that captures the family income. The dataset I'm working on has a variable that identifies family and a variable that identifies the n member of the family.
    I created a variable id =group(nfamily nmember).
    The information about the family income is captured only for the indivual in the family that has nmember=1, I would like to take this information and assign it also to the other members of the family. The dataset is panel.

    Can anyone help?
    Thanks in advance

  • #2

    I don't quite understand your notation, but perhaps this is something like what you want:

    Code:
    egen family_income=total(income),by(family_id)
    -total- works because all but one of the famly members will have missing income (or maybe zero)?.-min- or-mean- might also work, depending on what the other records look like.

    See https://www.stata.com/manuals/degen.pdf and https://www.stata.com/manuals/dmissingvalues.pdf

    Comment


    • #3
      The solution in #2 works if you want to add up the income of all family members. Your question is somewhat unclear; if you are instead looking for the income of one person (e.g. the head of the household, who is always identified as 1) then you might need something different.

      Since you have not provided any data extract, I have made up some toy data below and shown the two solutions:

      Code:
      // CREATE TOY DATA
      clear
      input byte(family_id person_id) float income
      1 1 2500
      1 2 1100
      1 3 .
      1 4 400
      2 3 1000
      2 2 .
      2 1 6000
      end
      
      // SOLUTION STARTS HERE
      egen family_income = total(income), by(family_id)
      egen head_income = max(cond(person_id == 1, income, .)), by(family_id)
      which produces the following:

      Code:
      . list, sepby(family_id) noobs abbrev(13)
      
        +--------------------------------------------------------------+
        | family_id   person_id   income   family_income   head_income |
        |--------------------------------------------------------------|
        |         1           1     2500            4000          2500 |
        |         1           2     1100            4000          2500 |
        |         1           3        .            4000          2500 |
        |         1           4      400            4000          2500 |
        |--------------------------------------------------------------|
        |         2           3     1000            7000          6000 |
        |         2           2        .            7000          6000 |
        |         2           1     6000            7000          6000 |
        +--------------------------------------------------------------+
      In future, it might be easier to provide a data extract (see the FAQ, esp section 12, on how to ask questions more effectively, and especially on how to provide a data extract). It is often also helpful to add a variable in the data extract that shows what values you would want your new variable to take for those observations.
      Last edited by Hemanshu Kumar; 26 Jun 2024, 08:45.

      Comment


      • #4
        Both Daniel and Hemanshu offer sound advise. But, my take of Giorgio's question is that he wants to replicate the family income that is in the data set stored for member '1' into the cases of the (new) income variable of the other member(s) of the same family.
        To get this done code to run an iterative loop could do the task at hand, like (using the example of Hemanshu):
        Code:
        // CREATE TOY DATA
        clear
        input byte(family_id person_id) float income
        1 1 2500
        1 2 .
        1 3 .
        1 4 .
        2 3 .
        2 2 .
        2 1 6000
        3 1 1100
        3 2 .
        end
        
        // Get income of each family first member and replicate that data for any other family member
        forvalues i = 1(1)3 {
            qui sum income if family_id==`i' & person_id==1
            replace income=r(mean) if family_id==`i' & person_id!=1
        }
        Note that in the above code you could use a new variable for income as to maintain data integrity (something I always do).
        Note also that the constraint set to the code using sum includes person_id==1 just to be certain that we are using that data as there is always the chance of some error in the data panel (missings for member 1 or two incomes present in the data for the same family).
        http://publicationslist.org/eric.melse

        Comment


        • #5
          ericmelse As far as I can see, the code in #3

          Code:
          egen head_income = max(cond(person_id == 1, income, .)), by(family_id)
          produces a variable with exactly the same values as income in #4. I imagine a difference could arise if there are multiple persons with id 1 in a family, but not otherwise. Or am I missing something?

          Comment


          • #6
            Originally posted by Hemanshu Kumar View Post
            ericmelse As far as I can see, the code in #3

            Code:
            egen head_income = max(cond(person_id == 1, income, .)), by(family_id)
            produces a variable with exactly the same values as income in #4. I imagine a difference could arise if there are multiple persons with id 1 in a family, but not otherwise. Or am I missing something?
            thanks!! it worked perfectly

            Comment

            Working...
            X