Regression on individual and sibship level - what observations to include?

Ariane Arbol

Join Date: May 2021
Posts: 36

Regression on individual and sibship level - what observations to include?

14 Jan 2022, 11:16

Dear all,

I want to investigate which child in a sibling group takes over the care of a parent. The focus is on gender (of both the caregiving child and the siblings), but other characteristics of the children and siblings (e.g., education, employment status, own children, spatial proximity to the parent) will also be examined as influencing factors.

The dataset contains parents with all respective children and their characteristics. Following Grigoryeva (2017), I would like to first conduct an individual-level analysis (what factors influence a child's adoption of parental care (separated into sons and daughters)) and, in a second step, a sibling-level analysis (do sibling characteristics (characteristics of brothers, characteristics of sisters) influence the care time of brothers and sisters, respectively (as respective gender sibling groups)?).

For the sibling-level analysis, summary statistics of the individual-level independent variables were designed separately for sons and daughters, i.e., the number of children with the characteristic of interest was summed up (for dichotomous variables) or the mean thereof was calculated (for continuous variables).
The dependent variables are the total days of care (absolute measure) of all sons in a sibling group and the total days of care (absolute measure) of all daughters in a sibling group. In addition, a standardized proportion (relative measure) was created from each by setting the care time of sons in proportion to the care time of all siblings, and this in turn in proportion to the proportion of sons in a sibling group (the same for daughters). (As an example: If a sibling group of three sons and one daughter jointly provide 120 days of care, and the daughters provide 100 days of that care, then the daughters provide a share of 100/120. The share they would have provided under equal division of caring labour would be 3/4. The standardized proportion of daughters' care time is thus (100/120)/(3/4) and equals 1.11, meaning that daughters jointly provide about 11 percent more care than they would have provided if the division of care work in the sibship had been equal.)

The data set with individual- and sibling-level variables looks like the following (shortened in variables and observations) example. Individual-level variables are highlighted in blue, sibship-level variables are highlighted in red. Variables contained are:
child_helpfreq_abs - absolute care time of child (days per year)
child_helpfreq_rel - standardized relative care time of child (calculation: (child_helpfreq_abs/child_helpfreq_all)/(1/nr of children in sibship)) (same principle as on sibship level (see explanation above))
child_helpfreq_all - total care time of all children in a sibship (auxiliary variable for calcuation of standardized shares)
child_age - age of child
child_faraway - child lives far away from parent (yes/no)
nr_sons/nr_daught - number of sons/daughters in the sibship
sons_helpfreq_abs / daught_helpfreq_abs - total care time (days per year) of all sons/daughters in a sibship
sons_helpfreq_rel / daught_helpfreq_rel - standardized share of care time of all sons/daughters in a sibship (see explanation above)
sons_meanage / daught_meanage - mean age of sons/daughters in a sibship
sons_faraway / daught_faraway - number of sons/daughters in a sibship living far away from the parent

parentid	childid	childsex	child_helpfreq_abs	child_helpfreq_rel	child_helpfreq_all	child_age	child_faraway	nr_sons	nr_daught	sons_helpfreq_abs	sons_helpfreq_rel	daught_helpfreq_abs	daught_helpfreq_rel	sons_meanage	sons_faraway	daught_meanage	daught_faraway
1	1-1	female	52	1,8	58	37	yes	1	1	6	0,2	52	1,8	39	0	37	1
1	1-2	male	6	0,2	58	39	no	1	1	6	0,2	52	1,8	39	0	37	1
2	2-1	female	0	0	0	28	no	0	1	0	.	0	0	.	0	28	0
3	3-1	male	12	0,08	429	25	yes	2	1	64	0,2	365	2,6	26,5	1	30	0
3	3-2	female	365	2,55	429	30	no	2	1	64	0,2	365	2,6	26,5	1	30	0
3	3-3	male	52	0,36	429	28	no	2	1	64	0,2	365	2,6	26,5	1	30	0
4	4-1	male	0	0	116	40	yes	2	2	52	0,9	64	1,1	28,5	1	45,5	1
4	4-2	male	52	1,8	116	17	no	2	2	52	0,9	64	1,1	28,5	1	45,5	1
4	4-3	female	52	1,8	116	48	yes	2	2	52	0,9	64	1,1	28,5	1	45,5	1
4	4-4	female	12	0,4	116	43	no	2	2	52	0,9	64	1,1	28,5	1	45,5	1
5	5-1	male	52	1	52	22	no	1	0	52	1	0	.	52	0	.	0
6	6-1	female	12	1,33	18	32	no	0	2	0	.	18	1	.	0	33,5	0
6	6-2	female	6	0,66	18	35	no	0	2	0	.	18	1	.	0	33,5	0
7	7-1	male	12	0,51	70	47	no	1	2	12	0,5	58	1,2	47	0	46,5	2
7	7-2	female	6	0,26	70	48	yes	1	2	12	0,5	58	1,2	47	0	46,5	2
7	7-3	female	52	2,23	70	45	yes	1	2	12	0,5	58	1,2	47	0	46,5	2

Since sibship-level values are logically the same within a sibling group, I am now a little bit confused which observations to include in my analyses. The regression commands on the sibship level would look like this:

Code:

reg sons_helpfreq_abs sons_meanage sons_faraway daught_meanage daught_faraway nr_sons nr_daught

reg daught_helpfreq_abs sons_meanage sons_faraway daught_meanage daught_faraway nr_sons nr_daught

Basically, I have three thoughts:

1) Take one child per sibling group and use this sample to calculate both sons' and daughters' care time. (Thought behind: Because all sibship-level observations within a sibling group are the same, using only one is enough.)

2) Use the observations of sons for the calculation of sons' care time, and the observations of daughters for the calculation of daughters' care time, i.e.:

Code:

reg sons_helpfreq_abs sons_meanage sons_faraway daught_meanage daught_faraway nr_sons nr_daught if childsex==male

reg daught_helpfreq_abs sons_meanage sons_faraway daught_meanage daught_faraway nr_sons nr_daught if childsex==female

Thought behind: Some kind of weighting with the number of sons and daughters, respectively, in each estimation (?).

3) Use all observations, both for the calculation of sons' and daughters' care time. Would this be a kind of weighting (larger sibling groups are included in the analysis with more (identical) observations)? Do I have to cluster the errors due to the fact that observations within a sibship are more similar than between sibships?

I'm just not sure how to do it right and I'm really confused now. As long as I understand, Grigoryeva (2017) used for the calculation of sons' care time all observations from sibships with at least one son (meaning that male single children, male-only sibships and mixed-gender sibships are contained), and for the calculation of daughters' care time all observations from sibships with at least one daughter. With this, she has different sample sizes for both estimations. I honestly don't get how she did this, because at least mean values in the independent variables of sons or daughters, respectively, could not be calculated for female or male single children and for samesex sibships (meaning that if there is no sister or no brother, a calucation of the mean age of sisters or brothers in the sibship cannot be done, resulting in a missing value). As a result, only mixed-gender sibships can be used for analyses at the sibship level. That being said, I still don't know what the correct approach for the estimation is ...

Sorry for the long text and thank you for reading it. Any thought, hint, and/or literature reference is welcome! Thanks!

Tags: None

Ariane Arbol

Join Date: May 2021

Posts: 36
#2

18 Jan 2022, 03:57

May I push my question up again? Maybe the solution is very simple, a hint would be helpful. If any further information is needed, please tell me.Thank you very much!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#3

18 Jan 2022, 05:15

Ariane:
too long queries are challenging to read and are often skipped even by potentially interested listers.
Bumping (which is also discouraged by the FAQ) is not helpful; a more coincise post is possibly the way to go. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Ariane Arbol

Join Date: May 2021

Posts: 36
#4

18 Jan 2022, 12:05

Thanks for the hint, Carlo. I'm really sorry for my detailed description, I thought it would be easier to understand what my concern is if I make it more than clear. But I'll try a short version of it:

I want to investigate the influence of brothers' and sisters' characteristics on the total caretime of brothers and sisters, respectively, in a sibling group. The regression commands would look like these examples (shortened in variables):

Code:

reg sons_helpfreq_abs sons_meanage sons_faraway daught_meanage daught_faraway nr_sons nr_daught reg daught_helpfreq_abs sons_meanage sons_faraway daught_meanage daught_faraway nr_sons nr_daught

Explanatory variables are summary statistics of the respective characteristic, meaning that sons_faraway is the number of sons within a sibling group living far away from the parent (likewise for all dichotomous variables) and sons_meanage is the mean age of all sons within a sibling group (likewise for all continuos variables).

Since values on these variables are logically the same within a sibling group and I do have all siblings of a sibship in the dataset, I am not sure which observations to include in my analyses. Do I only take one observation per sibship, because values are equal for all siblings? Do I have to take all observations so that a kind of weighting is in the analysis (larger sibling groups are included in the analysis with more (identical) observations? Do I have to separate between sons and daughters and use the observations of sons for the calculation of sons' care time, and the observations of daughters for the calculation of daughters' care time, i.e.:

Code:

reg sons_helpfreq_abs sons_meanage sons_faraway daught_meanage daught_faraway nr_sons nr_daught if childsex==male reg daught_helpfreq_abs sons_meanage sons_faraway daught_meanage daught_faraway nr_sons nr_daught if childsex==female

I am a little confused due to the mixture of sibship level measures and individual level observations, and not sure how to do it right. It would be great if anybody has an idea to share! Thanks!
Comment

Announcement

Regression on individual and sibship level - what observations to include?

Comment

Comment

Comment