Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rearranging Data

    Hello

    The dataset I'm using is from the Wisconsin Longitudinal Survey, which contains data on ~10,300 graduates of Wisconsin high schools in 1957 who are followed over several waves until 2011. A parallel survey is also carried out on one of their siblings, chosen at random. The dataset also provides detailed information on every child of both the graduates and siblings. I want to carry out regression analysis on the children but at present information regarding the children is simply a variable attached to either the graduate's / sibling's personal ID (e.g. the graduate / sibling was asked 'what is your first child's marital status?').

    The only way I can think to carry out analysis on the children is to rearrange the data by creating an extended family identifier that links the sibling pairs and their children together, a nuclear family identifier that links children to their parents, and a personal ID for the children themselves. I would then also have some variable that identifies whether the individual is part of the sibling pair or a child. I'm not sure how to do this or if there's a better way to achieve the same outcome.

    Any help would be much appreciated.

    Many thanks
    Owen

  • #2
    Welcome to Statalist.

    Very often a simple description like yours really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what has been shared.

    Please help us help you. Show example data. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output.

    To present your data, use the dataex command. By default it wants to output all the variables in the first 100 observations. That will not be as help as if you narrow down the variables to those needed to identify the graduates/siblings and a few variables with typical values (e.g., child's marital status), and select observations for a few graduate/sibling pairs - and I will assume there will be a few graduates without siblings, but no siblings without graduates.

    Comment


    • #3
      Dear William

      Apologies for not making things clear.

      My data look like this:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long familypub byte personid str1 rtype byte(gssex totchever11 ch1marstat11) float child1age11
      1205260 1 "g" 1 4   2 38
      1203989 1 "g" 1 3   2 49
      1203989 2 "s" 1 8   2 56
      1202617 1 "g" 1 2   2 47
      1206449 1 "g" 1 3   2 41
      1209454 2 "g" 1 2   2 50
      1209454 1 "s" 1 2  -2  .
      1207174 1 "g" 1 3   2 44
      1202207 1 "g" 2 4   2 52
      1203793 1 "g" 1 4  -2 50
      1205982 2 "s" 1 2   2 39
      1206866 1 "g" 1 4   2 50
      1206866 2 "s" 2 0  -2  .
      1206646 1 "g" 2 2 -30  .
      1209076 1 "g" 1 2   2 51
      1204601 2 "s" 1 3   2 40
      1209072 1 "g" 2 2   2 49
      1209072 2 "s" 1 4   6 42
      1204470 1 "g" 1 5   3 53
      1201315 1 "g" 2 1   1 45
      end
      label values gssex QQGENDER
      label def QQGENDER 1 "male", modify
      label def QQGENDER 2 "female", modify
      label values totchever11 QQSINCE75NKIDS
      label def QQSINCE75NKIDS 0 "NO CHILD EVER REPORTED", modify
      label values ch1marstat11 QQMARSTATUS
      label def QQMARSTATUS -30 "NOT PART OF MOSAQ", modify
      label def QQMARSTATUS -2 "inap", modify
      label def QQMARSTATUS 1 "NEVER MARRIED", modify
      label def QQMARSTATUS 2 "married", modify
      label def QQMARSTATUS 3 "divorced", modify
      label def QQMARSTATUS 6 "cohabiting", modify
      In my mind the desired outcome for a given extended family unit (using only the first child in this example) would look something like this:
      extfamilypub familypub personid rtype gssex totchever11 ch1marstat11 child1age11
      1 1 1 g male 3 . .
      1 1 3 c . . married 30
      1 2 2 s female 8 . .
      1 2 3 c . . married 31
      extfamilypub = extended family identifier

      familypub = nuclear family identifier

      personid = 1 for gradute, 2 for sibling, 3 for child

      rtype = g for graduate, s for sibling, c for child

      gssex = graduate / sibling's sex

      tochever11 = graduate / sibling's total number of children

      ch1marstat11 = marital status of first child

      child1age11 = age of first child

      I would then go onto make it so there is only one variable for marital status / age / sex of all individuals as these are currently given as separate variables for graduates / siblings and children.

      I hope this helps clarify.

      Many thanks
      Owen

      Comment


      • #4
        I have created an example with data for 3 children of the graduate and 2 children of the sibling based your example data. For simplicity I have omitted value labels. I then suggest that reshape long is what you need to create a useful child-level dataset.
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input long familypub byte personid str1 rtype byte(gssex totchever11 ch1marstat11 child1age11 ch2marstat11 child2age11 ch3marstat11 child3age11)
        7654321 1 "g" 1 3 2 49 1 32 1 33
        7654321 2 "s" 1 2 2 56 1 32 .  .
        end
        reshape long ch@marstat11 child@age11, i(familypub personid) j(childnum)
        drop if childnum>totchever11
        list, noobs abbreviate(20) sepby(personid)
        Code:
        . reshape long ch@marstat11 child@age11, i(familypub personid) j(childnum)
        (j = 1 2 3)
        
        Data                               Wide   ->   Long
        -----------------------------------------------------------------------------
        Number of observations                2   ->   6           
        Number of variables                  11   ->   8           
        j variable (3 values)                     ->   childnum
        xij variables:
         ch1marstat11 ch2marstat11 ch3marstat11   ->   chmarstat11
            child1age11 child2age11 child3age11   ->   childage11
        -----------------------------------------------------------------------------
        
        . drop if childnum>totchever11
        (1 observation deleted)
        
        . list, noobs abbreviate(20) sepby(personid)
        
          +------------------------------------------------------------------------------------------+
          | familypub   personid   childnum   rtype   gssex   totchever11   chmarstat11   childage11 |
          |------------------------------------------------------------------------------------------|
          |   7654321          1          1       g       1             3             2           49 |
          |   7654321          1          2       g       1             3             1           32 |
          |   7654321          1          3       g       1             3             1           33 |
          |------------------------------------------------------------------------------------------|
          |   7654321          2          1       s       1             2             2           56 |
          |   7654321          2          2       s       1             2             1           32 |
          +------------------------------------------------------------------------------------------+

        Comment


        • #5
          Many thanks for this, William. Worked like a charm.

          Best
          Owen

          Comment

          Working...
          X