I have a dataset organised in three levels
I'm trying to estimate the effect that an individuals number of siblings has on the probability of that individual being married:
\[ married_{ijk} = \alpha+\beta_1 sibsize_{jk}+\beta_2 \textbf{z}_{jk}+\beta_3 \textbf{x}_{ijk}+e_k+c_{jk}+\epsilon_{ijk} \]
\[
\Delta married_{ijk} = \beta_1 \Delta sibsize_{jk}+\beta_2 \Delta \textbf{z}_{jk}+\beta_3 \Delta \textbf{x}_{ijk}+c_{jk}+\Delta \epsilon_{ijk}
\]
Which removes the extended family fixed effect. I then combine with an IV component to identify the causal effect of family size net of family FE but this is not relevant to the problem.
Running:
obviously does not take into account that multiple individuals are nested within the same nuclear and extended families and since number of siblings does not vary between individuals within a nuclear family unit, there will be clusters of individuals for which variation in the marriage variable about its mean is going to be explained only by variation in covariates about their means, pointing toward an insignificant coefficient on sibling size.
Instead, I believe I need to use a multi-level model that identifies the effect of sibling size on the probability of being married from differences in sibling size between nuclear families from the same extended family.
Sorry if some of the above seems superfluous, I thought some context would be helpful.
Any help would be greatly appreciated.
Best
Owen
- Individuals, nested within...
- Nuclear families (their parents and siblings), nested within...
- Extended families (their uncle/aunt and first cousins)
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long extendedfamilyid float nuclearfamilyid byte nuclearfamily float(married meanmarried11 marriedfe) byte sibsize float(meansibsize sibsizefe age meanage agefe) 1200002 1200004 2 0 .3333333 -.3333333 3 3 0 45 41.66667 3.333332 1200002 1200004 2 0 .3333333 -.3333333 3 3 0 42 41.66667 .3333321 1200002 1200004 2 1 .3333333 .6666666 3 3 0 38 41.66667 -3.666668 1200004 1200005 1 1 1 0 2 2 0 42 39.5 2.5 1200004 1200005 1 1 1 0 2 2 0 37 39.5 -2.5 1200005 1200006 1 1 .5 .5 2 2 0 49 48.5 .5 1200005 1200006 1 0 .5 -.5 2 2 0 48 48.5 -.5 1200007 1200008 1 1 .8333333 .1666667 3 3 0 38 40.8 -2.799999 1200007 1200008 1 0 .8333333 -.8333333 3 3 0 35 40.8 -5.799999 1200007 1200008 1 1 .8333333 .1666667 3 3 0 . 40.8 . 1200007 1200009 2 1 .8333333 .1666667 3 3 0 46 40.8 5.200001 1200007 1200009 2 1 .8333333 .1666667 3 3 0 43 40.8 2.2000008 1200007 1200009 2 1 .8333333 .1666667 3 3 0 42 40.8 1.2000008 1200009 1200010 1 1 1 0 9 7.5 1.5 50 46.41667 3.583332 1200009 1200010 1 1 1 0 9 7.5 1.5 48 46.41667 1.583332 1200009 1200010 1 1 1 0 9 7.5 1.5 47 46.41667 .58333206 1200009 1200010 1 1 1 0 9 7.5 1.5 46 46.41667 -.4166679 1200009 1200010 1 1 1 0 9 7.5 1.5 45 46.41667 -1.416668 1200009 1200010 1 1 1 0 9 7.5 1.5 44 46.41667 -2.416668 1200009 1200010 1 1 1 0 9 7.5 1.5 44 46.41667 -2.416668 1200009 1200010 1 1 1 0 9 7.5 1.5 41 46.41667 -5.416668 1200009 1200010 1 1 1 0 9 7.5 1.5 33 46.41667 -13.416668 1200009 1200011 2 1 1 0 3 7.5 -4.5 60 46.41667 13.583332 1200009 1200011 2 1 1 0 3 7.5 -4.5 53 46.41667 6.583332 1200009 1200011 2 1 1 0 3 7.5 -4.5 46 46.41667 -.4166679 1200010 1200012 2 0 .5 -.5 2 2 0 34 32 2 1200010 1200012 2 1 .5 .5 2 2 0 30 32 -2 1200011 1200012 1 1 1 0 2 2 0 45 44 1 1200011 1200012 1 1 1 0 2 2 0 43 44 -1 1200012 1200013 1 1 .75 .25 4 4 0 47 43.25 3.75 end label values sibsize QQNKIDSALLSRC
\[ married_{ijk} = \alpha+\beta_1 sibsize_{jk}+\beta_2 \textbf{z}_{jk}+\beta_3 \textbf{x}_{ijk}+e_k+c_{jk}+\epsilon_{ijk} \]
- 'marriedijk' is a dummy for whether child i (i=1,...,n) belonging to nuclear family j (j=1,...,n) and to extended family k (k=1,...,n) is married (1=yes)
- Sibsize is a continuous variable for the number of siblings
- z is a vector representing other nuclear family characteristics
- x is a vector of individual child characteristics
- ek is a time invariant fixed effect specific to each extended family
- cjk is a fixed effect specific to each nuclear family
\[
\Delta married_{ijk} = \beta_1 \Delta sibsize_{jk}+\beta_2 \Delta \textbf{z}_{jk}+\beta_3 \Delta \textbf{x}_{ijk}+c_{jk}+\Delta \epsilon_{ijk}
\]
Which removes the extended family fixed effect. I then combine with an IV component to identify the causal effect of family size net of family FE but this is not relevant to the problem.
Running:
Code:
reg chmarried11fe totch11fe sex6fe (...other covariates)
Instead, I believe I need to use a multi-level model that identifies the effect of sibling size on the probability of being married from differences in sibling size between nuclear families from the same extended family.
Sorry if some of the above seems superfluous, I thought some context would be helpful.
Any help would be greatly appreciated.
Best
Owen
Comment