Hi
I am attempting to construct a variable at the cluster level - the 'cluster average of women's working status'. I am working with DHS data and clusters corresponding to geographical units like villages.
Women's working status is a variable (jobb) in binary form (0:not working and 1: working)
The cluster ID variable is in continuous form.
I need to generate the variable cluster average of women's working status.
I am confused if this command is correct because I do not understand this variable correctly.
1) Can the cluster average variable take up decimal values while the base variable 'women's working status' is in binary form?
2) How do I deal with missing values while constructing the cluster average?
3) What will be the total count (frequency) of the cluster average variable we generate - should that be the same as that of the base variable 'women's working status'?
4) How do we construct the cluster average in a way that excludes the woman being considered to avoid correlation? (leaving-one-out technique)
Any advice will be helpful. Thanking you.
I am attempting to construct a variable at the cluster level - the 'cluster average of women's working status'. I am working with DHS data and clusters corresponding to geographical units like villages.
Women's working status is a variable (jobb) in binary form (0:not working and 1: working)
The cluster ID variable is in continuous form.
I need to generate the variable cluster average of women's working status.
Code:
egen cluster_avg = mean(jobb), by(v001) OR
by v001,sort: egen clustaverage=mean(jobb)
1) Can the cluster average variable take up decimal values while the base variable 'women's working status' is in binary form?
2) How do I deal with missing values while constructing the cluster average?
3) What will be the total count (frequency) of the cluster average variable we generate - should that be the same as that of the base variable 'women's working status'?
4) How do we construct the cluster average in a way that excludes the woman being considered to avoid correlation? (leaving-one-out technique)
Any advice will be helpful. Thanking you.
Code:
dataex v001 clustaverage jobb in 60/100
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long v001 float(clustaverage jobb) 102 . . 102 . . 102 . . 102 . . 102 . . 102 . . 103 .06666667 0 103 .06666667 . 103 .06666667 0 103 .06666667 0 103 .06666667 . 103 .06666667 . 103 .06666667 . 103 .06666667 1 103 .06666667 0 103 .06666667 0 103 .06666667 . 103 .06666667 . 103 .06666667 0 103 .06666667 . 103 .06666667 0 103 .06666667 0 103 .06666667 0 103 .06666667 0 103 .06666667 0 103 .06666667 . 103 .06666667 . 103 .06666667 . 103 .06666667 . 103 .06666667 0 103 .06666667 . 103 .06666667 0 103 .06666667 . 103 .06666667 . 103 .06666667 . 103 .06666667 0 103 .06666667 . 104 . . 104 . . 104 . . 104 . .
Comment