Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Re-arrange individual observations into a family setting

    Hi,
    I'm struggling to find a way to transfer observations on an individual level to rows equal to family levels. You can find a picture of the data attached. Most lables are straightforward, famnr is the individual family number based on city and family, fampos is the position of the individual within the family (1=child, 2=spouse, 3=household head).

    The final arrangement should include information on the age of the individual person (mother, father or child) so that each family can be displayed in one row. And related to children, there should be variables sex_first child, age_first child, sex_second child, age_second child and so on.

    Does anybody have an idea how to start?

    Thanks in advance!

    Cheers,
    NM
    Attached Files

  • #2
    You almost never want to do that. Most tasks in Stata are a lot easier in the so-called long format that you have. So why do you think that you want to make that transformation?
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Welcome to Statalist.

      Let me agree with Maarten and expand on his answer.

      What you have is data organized in a long layout, and what you seek is data organized in a wide layout. The reshape command is the tool for accomplishing this reorganization, and for the the reverse as well.

      The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. You should try to achieve what you need with the data organized as it currently is, and seek the help of Statalist in doing so. The sort of problems you will encounter trying to use your reshaped data will almost certainly be solved by reshaping the data. It is much easier, for example, to compare the second observation to the first, the third to the second, and so on, than it is to compare the second variable to the first, the third to the second, etc.

      I'd be inclined to include some sample code here, but my version of Stata cannot import pictures of data, and "I want to retype sample data from a picture" said nobody, ever.

      To improve your future posts, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

      Please be sure to use the dataex command to show example data. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      When asking for help with code, always show example data. When showing example data, always use dataex.


      Comment


      • #4
        So I've entered your data (at least for most of the variables). I moved famnr to the left since it is the unique family identifier. Let me echo Maarten and William that for most of your analysis, running regressions, etc you will want your data in the long layout (like it is now). Wide format is generally a way data is entered (so you then have to reshape it to long), or people use it to more easily scan their data, because wide is more compact.

        Code:
        * Also, I created a father var, with gen father = (fampos==3 & sex==0), but I assume you already have a variable like that in your data
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte(famnr personid age sex nchild fampos mother) float(father new_personid)
        11 4  5 0 2 1 0 0 1
        11 1 12 1 2 1 0 0 2
        11 2 30 1 2 2 1 0 3
        11 3 33 0 2 3 0 1 4
        12 1 30 1 0 3 0 0 1
        13 1  6 1 1 1 0 0 1
        13 2 25 1 1 3 1 0 2
        21 3  7 0 2 1 0 0 1
        21 1 10 0 2 1 0 0 2
        21 2 34 1 2 3 1 0 3
        end
        The easiest way to reshape to wide is below, although, for reasons I will list, this won't give you exactly what you want. Also, you will want to save your dataset as a test_dataset or something, because you will need to drop variables to be able to reshape.

        Code:
        sort famnr fampos age  // this essentially is as you have it now. However, this puts youngest child first
        // Also, I sorted like this so that head_of_household is last even if spouse is older
        by famnr: gen new_personid = _n  // I did this so it would increase within the fam. In your 1st obs, the youngest child is person_id==4.
        drop fampos father mother personid // since these are not constant within a family, reshape doesn't know how to handle them
        
        reshape wide age sex, i(famnr) j(new_personid)
        brow
        When done, your data will look like this:

        Code:
        . list  // I also put a screenshot below
        
             +------------------------------------------------------------------------+
             | famnr   age1   sex1   age2   sex2   age3   sex3   age4   sex4   nchild |
             |------------------------------------------------------------------------|
          1. |    11      5      0     12      1     30      1     33      0        2 |
          2. |    12     30      1      .      .      .      .      .      .        0 |
          3. |    13      6      1     25      1      .      .      .      .        1 |
          4. |    21      7      0     10      0     34      1      .      .        2 |
             +------------------------------------------------------------------------+
        Click image for larger version

Name:	reshape wide.png
Views:	1
Size:	6.3 KB
ID:	1471654




        This gets the data to wide format. This may be all you need. However, in its current formulation, it goes from youngest child (youngest person in household really) to oldest, so you then have to create new variables if you wanted age_mother or age_father. You would also need to create new variables if you wanted age_child1 to be the age of the oldest child.

        Note that head_of_household (hoh) will always be the person furthest to the right in the family. (And in 1st position if in a solo household with no children, as with famnr==12).

        Also, I wasn't sure how you wanted to classify the person in family==12. This is a 30-year-old female HOH with no children. Did you want her classified as age_mother (with mother here meaning "female head of household, regardless of whether she has children") or age_hoh (and reserving mother for someone with children). Either is doable.
        Last edited by David Benson; 23 Nov 2018, 00:00.

        Comment


        • #5
          Thank you all for your comments!
          The reason to reshape and transform the data is to analyze the impact of the first child's gender on family-related variables (in this example: father living with family or the number of children in total) similar to what Dahl and Moretti (2008) did.

          They arranged their data like this:
          Code:
          clear
          input byte(nchild age marst ksex1) float nodad
          1 37 4 0 0
          3 29 4 1 0
          3 37 4 0 0
          end
          The command would look something like this (without further controls):
          Code:
          regress nodad ksex1
          or
          Code:
          regress nchild ksex1
          Since my data is only available on an individual level (age, family position, gender etc.), I thought it would be useful to re-arrange the data and to group them into families. Relevant variables per family would then be number of children or oldest child's sex.

          Please let me know if there is a more convenient way to perform this analysis!

          Comment


          • #6
            By taking a look at your screenshot in #1, it seems you have clustered data, hence a multi-level approach could be among the options. Please type - help mixed - in the command window.
            Best regards,

            Marcos

            Comment


            • #7
              Since my data is only available on an individual level (age, family position, gender etc.), I thought it would be useful to re-arrange the data and to group them into families
              With the explanation of your actual goal in post #5, we can see that as Martin and I suggested in posts #2 and #3, achieving this goal does not require going through the process of reshaping the data into the arrangement that was your stated goal in post #1. Your situation is what Martin and I anticipated it would be - you thought you needed a wide layout, but that's not the appropriate approach to creating family-level observations using Stata.

              The sample code below show how to create family observations including nchild, nodad, and ksex1 variables as suggested in post #5. My definition of nodad and ksex1 may not be exactly what you seek (what are nodad and ksex1 if there is no child?) but the code indicates a general approach which avoids the problems induced by creating a wide layout.
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input byte(famnr fampos personid age sex)
              11 1 4  5 0
              11 1 1 12 1
              11 2 2 30 1
              11 3 3 33 0
              12 3 1 30 1
              13 1 1  6 1
              13 3 2 25 1
              21 1 3  7 0
              21 1 1 10 0
              21 3 2 34 1
              end
              
              generate child = fampos==1
              generate notdad = fampos==1 | sex==1
              bysort famnr fampos (age): generate ksex1 = cond(fampos==1 & _n==_N, sex, .)
              
              collapse (sum) nchild=child (min) nodad=notdad ksex1, by(famnr)
              list, clean
              Code:
              . list, clean
              
                     famnr   nchild   nodad   ksex1  
                1.      11        2       0       1  
                2.      12        0       1       .  
                3.      13        1       1       1  
                4.      21        2       1       0
              Last edited by William Lisowski; 23 Nov 2018, 09:53.

              Comment


              • #8
                Thanks for your reply! That is exactly what I was looking for!

                Comment

                Working...
                X