Calculating Average Within-Group Differences With Varying Group Size

Jason Smith

Join Date: Nov 2023

Posts: 18
#1

Calculating Average Within-Group Differences With Varying Group Size

28 Sep 2024, 19:24

Hi everyone, I have some household panel data. It contains ID variable for every respondent (id) and a family ID variable (hhid) to identify siblings. It also contains a time-invariant continuous variable (var1). I am interested in obtaining the average of the absolute difference between sibling pairs. Thus, I would like to do the following

1) first obtaining the absolute differences between all possible sibling pairs within the household (e.g., abs(sib1 - sib2); abs(sib1-sib2); abs(sib2-sib3))

2) and, then, averaging these differences by the number of sibling comparisons

Lastly, I would like to assign the the average to all corresponding members of the household in a new variable (avgdiff).

I have my data in long format, but, in wide format, it looks like this:

Code:

clear input hhid id var1 100 1 0 100 2 3 100 3 3 200 4 5 200 6 4 300 7 2 300 8 6 300 9 4 300 10 1 end

And I want it to look like this:

Code:

clear input hhid id var1 sib1vsib2 sib1vsib3 sib1vsib4 sib2vsib3 sib2vsib4 sib3bvsib4 avgdiff 100 1 0 3 3 . 0 . . 2 100 2 3 3 3 . 0 . . 2 100 3 3 3 3 . 0 . . 2 200 4 5 1 . . . . . 1 200 6 4 1 . . . . . 1 300 7 2 4 2 1 2 5 3 2.83 300 8 6 4 2 1 2 5 3 2.83 300 9 4 4 2 1 2 5 3 2.83 300 10 1 4 2 1 2 5 3 2.83 end

I also don't need to keep the differences of each individual comparison.

I am quite stuck on how to do this efficiently, so any help would be appreciated!
Tags: clustered data, group-mean, panel data, sibling

Clyde Schechter

Join Date: Apr 2014
Posts: 29796

28 Sep 2024, 20:08

Code:

preserve
isid hhid id
by hhid (id), sort: replace id = _n
tempfile copy
save `copy'
rangejoin id . -1 using `copy', by(hhid)

drop if missing(id_U)
gen sibvsib = abs(var1-var1_U)
by hhid (id id_U), sort: egen avgdiff = mean(sibvsib)

egen pair = concat(id_U id), punct("_")
drop id id_U var1 var1_U
reshape wide sibvsib, i(hhid) j(pair) string
rename sibvsib(#)_(#) sib(#)[1]vsib(#)[2]
save `"`copy'"', replace

restore
merge m:1 hhid using `copy', assert(match) nogenerate

-rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also at SSC.

Announcement

Calculating Average Within-Group Differences With Varying Group Size

Comment