Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Average Within-Group Differences With Varying Group Size

    Hi everyone, I have some household panel data. It contains ID variable for every respondent (id) and a family ID variable (hhid) to identify siblings. It also contains a time-invariant continuous variable (var1). I am interested in obtaining the average of the absolute difference between sibling pairs. Thus, I would like to do the following

    1) first obtaining the absolute differences between all possible sibling pairs within the household (e.g., abs(sib1 - sib2); abs(sib1-sib2); abs(sib2-sib3))

    2) and, then, averaging these differences by the number of sibling comparisons

    Lastly, I would like to assign the the average to all corresponding members of the household in a new variable (avgdiff).

    I have my data in long format, but, in wide format, it looks like this:

    Code:
    clear 
    input hhid id var1 
    100 1 0 
    100 2 3
    100 3 3
    
    200 4 5
    200 6 4
    
    300 7 2
    300 8 6
    300 9 4
    300 10 1
    end
    And I want it to look like this:

    Code:
    clear 
    input hhid id var1 sib1vsib2 sib1vsib3 sib1vsib4 sib2vsib3 sib2vsib4 sib3bvsib4 avgdiff
    100 1 0 3 3 . 0 . . 2
    100 2 3 3 3 . 0 . . 2
    100 3 3 3 3 . 0 . . 2
    
    200 4 5 1 . . . . . 1
    200 6 4 1 . . . . . 1
    
    300 7 2 4 2 1 2 5 3 2.83
    300 8 6 4 2 1 2 5 3 2.83
    300 9 4 4 2 1 2 5 3 2.83
    300 10 1 4 2 1 2 5 3 2.83
    end
    I also don't need to keep the differences of each individual comparison.

    I am quite stuck on how to do this efficiently, so any help would be appreciated!

  • #2
    Code:
    preserve
    isid hhid id
    by hhid (id), sort: replace id = _n
    tempfile copy
    save `copy'
    rangejoin id . -1 using `copy', by(hhid)
    
    drop if missing(id_U)
    gen sibvsib = abs(var1-var1_U)
    by hhid (id id_U), sort: egen avgdiff = mean(sibvsib)
    
    egen pair = concat(id_U id), punct("_")
    drop id id_U var1 var1_U
    reshape wide sibvsib, i(hhid) j(pair) string
    rename sibvsib(#)_(#) sib(#)[1]vsib(#)[2]
    save `"`copy'"', replace
    
    restore
    merge m:1 hhid using `copy', assert(match) nogenerate
    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also at SSC.

    Comment

    Working...
    X