Hi everyone, I have some household panel data. It contains ID variable for every respondent (id) and a family ID variable (hhid) to identify siblings. It also contains a time-invariant continuous variable (var1). I am interested in obtaining the average of the absolute difference between sibling pairs. Thus, I would like to do the following
1) first obtaining the absolute differences between all possible sibling pairs within the household (e.g., abs(sib1 - sib2); abs(sib1-sib2); abs(sib2-sib3))
2) and, then, averaging these differences by the number of sibling comparisons
Lastly, I would like to assign the the average to all corresponding members of the household in a new variable (avgdiff).
I have my data in long format, but, in wide format, it looks like this:
And I want it to look like this:
I also don't need to keep the differences of each individual comparison.
I am quite stuck on how to do this efficiently, so any help would be appreciated!
1) first obtaining the absolute differences between all possible sibling pairs within the household (e.g., abs(sib1 - sib2); abs(sib1-sib2); abs(sib2-sib3))
2) and, then, averaging these differences by the number of sibling comparisons
Lastly, I would like to assign the the average to all corresponding members of the household in a new variable (avgdiff).
I have my data in long format, but, in wide format, it looks like this:
Code:
clear input hhid id var1 100 1 0 100 2 3 100 3 3 200 4 5 200 6 4 300 7 2 300 8 6 300 9 4 300 10 1 end
Code:
clear input hhid id var1 sib1vsib2 sib1vsib3 sib1vsib4 sib2vsib3 sib2vsib4 sib3bvsib4 avgdiff 100 1 0 3 3 . 0 . . 2 100 2 3 3 3 . 0 . . 2 100 3 3 3 3 . 0 . . 2 200 4 5 1 . . . . . 1 200 6 4 1 . . . . . 1 300 7 2 4 2 1 2 5 3 2.83 300 8 6 4 2 1 2 5 3 2.83 300 9 4 4 2 1 2 5 3 2.83 300 10 1 4 2 1 2 5 3 2.83 end
I am quite stuck on how to do this efficiently, so any help would be appreciated!
Comment