Rearranging Data

Owen Wallbanks

Join Date: Jan 2022

Posts: 22
#1

Rearranging Data

17 Jan 2022, 03:56

Hello

The dataset I'm using is from the Wisconsin Longitudinal Survey, which contains data on ~10,300 graduates of Wisconsin high schools in 1957 who are followed over several waves until 2011. A parallel survey is also carried out on one of their siblings, chosen at random. The dataset also provides detailed information on every child of both the graduates and siblings. I want to carry out regression analysis on the children but at present information regarding the children is simply a variable attached to either the graduate's / sibling's personal ID (e.g. the graduate / sibling was asked 'what is your first child's marital status?').

The only way I can think to carry out analysis on the children is to rearrange the data by creating an extended family identifier that links the sibling pairs and their children together, a nuclear family identifier that links children to their parents, and a personal ID for the children themselves. I would then also have some variable that identifies whether the individual is part of the sibling pair or a child. I'm not sure how to do this or if there's a better way to achieve the same outcome.

Any help would be much appreciated.

Many thanks
Owen
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

17 Jan 2022, 09:15

Welcome to Statalist.

Very often a simple description like yours really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what has been shared.

Please help us help you. Show example data. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output.

To present your data, use the dataex command. By default it wants to output all the variables in the first 100 observations. That will not be as help as if you narrow down the variables to those needed to identify the graduates/siblings and a few variables with typical values (e.g., child's marital status), and select observations for a few graduate/sibling pairs - and I will assume there will be a few graduates without siblings, but no siblings without graduates.
Comment

Owen Wallbanks

Join Date: Jan 2022
Posts: 22

18 Jan 2022, 06:38

Dear William

Apologies for not making things clear.

My data look like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long familypub byte personid str1 rtype byte(gssex totchever11 ch1marstat11) float child1age11
1205260 1 "g" 1 4   2 38
1203989 1 "g" 1 3   2 49
1203989 2 "s" 1 8   2 56
1202617 1 "g" 1 2   2 47
1206449 1 "g" 1 3   2 41
1209454 2 "g" 1 2   2 50
1209454 1 "s" 1 2  -2  .
1207174 1 "g" 1 3   2 44
1202207 1 "g" 2 4   2 52
1203793 1 "g" 1 4  -2 50
1205982 2 "s" 1 2   2 39
1206866 1 "g" 1 4   2 50
1206866 2 "s" 2 0  -2  .
1206646 1 "g" 2 2 -30  .
1209076 1 "g" 1 2   2 51
1204601 2 "s" 1 3   2 40
1209072 1 "g" 2 2   2 49
1209072 2 "s" 1 4   6 42
1204470 1 "g" 1 5   3 53
1201315 1 "g" 2 1   1 45
end
label values gssex QQGENDER
label def QQGENDER 1 "male", modify
label def QQGENDER 2 "female", modify
label values totchever11 QQSINCE75NKIDS
label def QQSINCE75NKIDS 0 "NO CHILD EVER REPORTED", modify
label values ch1marstat11 QQMARSTATUS
label def QQMARSTATUS -30 "NOT PART OF MOSAQ", modify
label def QQMARSTATUS -2 "inap", modify
label def QQMARSTATUS 1 "NEVER MARRIED", modify
label def QQMARSTATUS 2 "married", modify
label def QQMARSTATUS 3 "divorced", modify
label def QQMARSTATUS 6 "cohabiting", modify

In my mind the desired outcome for a given extended family unit (using only the first child in this example) would look something like this:

extfamilypub	familypub	personid	rtype	gssex	totchever11	ch1marstat11	child1age11
1	1	1	g	male	3	.	.
1	1	3	c	.	.	married	30
1	2	2	s	female	8	.	.
1	2	3	c	.	.	married	31

extfamilypub = extended family identifier

familypub = nuclear family identifier

personid = 1 for gradute, 2 for sibling, 3 for child

rtype = g for graduate, s for sibling, c for child

gssex = graduate / sibling's sex

tochever11 = graduate / sibling's total number of children

ch1marstat11 = marital status of first child

child1age11 = age of first child

I would then go onto make it so there is only one variable for marital status / age / sex of all individuals as these are currently given as separate variables for graduates / siblings and children.

I hope this helps clarify.

Many thanks
Owen

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

18 Jan 2022, 10:39

I have created an example with data for 3 children of the graduate and 2 children of the sibling based your example data. For simplicity I have omitted value labels. I then suggest that reshape long is what you need to create a useful child-level dataset.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long familypub byte personid str1 rtype byte(gssex totchever11 ch1marstat11 child1age11 ch2marstat11 child2age11 ch3marstat11 child3age11)
7654321 1 "g" 1 3 2 49 1 32 1 33
7654321 2 "s" 1 2 2 56 1 32 .  .
end
reshape long ch@marstat11 child@age11, i(familypub personid) j(childnum)
drop if childnum>totchever11
list, noobs abbreviate(20) sepby(personid)

Code:

. reshape long ch@marstat11 child@age11, i(familypub personid) j(childnum)
(j = 1 2 3)

Data                               Wide   ->   Long
-----------------------------------------------------------------------------
Number of observations                2   ->   6           
Number of variables                  11   ->   8           
j variable (3 values)                     ->   childnum
xij variables:
 ch1marstat11 ch2marstat11 ch3marstat11   ->   chmarstat11
    child1age11 child2age11 child3age11   ->   childage11
-----------------------------------------------------------------------------

. drop if childnum>totchever11
(1 observation deleted)

. list, noobs abbreviate(20) sepby(personid)

  +------------------------------------------------------------------------------------------+
  | familypub   personid   childnum   rtype   gssex   totchever11   chmarstat11   childage11 |
  |------------------------------------------------------------------------------------------|
  |   7654321          1          1       g       1             3             2           49 |
  |   7654321          1          2       g       1             3             1           32 |
  |   7654321          1          3       g       1             3             1           33 |
  |------------------------------------------------------------------------------------------|
  |   7654321          2          1       s       1             2             2           56 |
  |   7654321          2          2       s       1             2             1           32 |
  +------------------------------------------------------------------------------------------+

Comment

Owen Wallbanks

Join Date: Jan 2022

Posts: 22
#5

19 Jan 2022, 04:10

Many thanks for this, William. Worked like a charm.

Best
Owen
Comment

Announcement

Comment

Comment

Comment

Comment