Dear all,
I am new with stata and wish to ask for help in matching two correlated variables to derive an index/dummy variable which should also be adjusted for either weights or clusters (rural/urban) to account for differences across households. My data is a repeated cross-sectional (randomly stratified).However, the two variables are measured differently i.e. time taken ( t1 & t2) in minutes and mode (m1 & m2) i.e. by car/bus/ or walk all measures distance to reach point A to point B. I created dummies for all variables but from there am stuck on how to derive the main index/dummy adjusted for weights or clusters.
My procedure in arriving to this dummies(walk bus max10min max30min max1hr above2hr)
foreach v of varlist m1 m2{
labrec `v'(1=1 )(2/10=0)
ta `v', g(`v'a)
drop `v'
}
foreach x of varlist t1 t2{
labrec `x'(1=1 )(2=2 ) (3=3 ) (4/10=4 ")
ta `x', g(`x'd)
drop `x'
}
egen bus =anymatch(m1a1 m2a1) if m1a1==1 & m2a1==1, v(1) //for those who reported to have used at least a car/bicycle
egen walk =anymatch( m1a2 m2a2) if m1a2==1 & m2a2==1, v(1) //for those who reported walking
egen max10min =anymatch(t1d1 t2d1) if t1d1==1 & t2d1==1, v(1) //took max 10min
egen max30min =anymatch(t1d2 t2d2) if t1d2==1 & t2d2==1, v(1) //took max 30min
egen max1hr =anymatch(t1d3 t2d3) if t1d3==1 & t2d3==1, v(1) //took max 1 hr
egen above2hr =anymatch(t1d4 t2d4) if t1d4==1 & t2d4==1, v(1) //took about or above 2hrs
drop mode1a1- minutes2d4
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(uniqkey hhid) byte(bus walk max10min max30min max1hr above2hr)
34 11 0 1 1 0 0 0
35 87 0 1 0 0 0 0
96 18 0 0 0 0 0 0
98 5 0 0 1 0 0 0
84 11 1 0 0 0 0 1
85 6 1 0 0 0 0 1
115 29 0 0 0 0 0 1
116 20 1 0 0 0 0 1
117 24 1 0 0 0 1 0
798 45 0 1 0 1 0 0
343 43 1 0 0 0 0 1
344 47 1 0 0 0 0 1
end
Regards,
Gatelik
I am new with stata and wish to ask for help in matching two correlated variables to derive an index/dummy variable which should also be adjusted for either weights or clusters (rural/urban) to account for differences across households. My data is a repeated cross-sectional (randomly stratified).However, the two variables are measured differently i.e. time taken ( t1 & t2) in minutes and mode (m1 & m2) i.e. by car/bus/ or walk all measures distance to reach point A to point B. I created dummies for all variables but from there am stuck on how to derive the main index/dummy adjusted for weights or clusters.
My procedure in arriving to this dummies(walk bus max10min max30min max1hr above2hr)
foreach v of varlist m1 m2{
labrec `v'(1=1 )(2/10=0)
ta `v', g(`v'a)
drop `v'
}
foreach x of varlist t1 t2{
labrec `x'(1=1 )(2=2 ) (3=3 ) (4/10=4 ")
ta `x', g(`x'd)
drop `x'
}
egen bus =anymatch(m1a1 m2a1) if m1a1==1 & m2a1==1, v(1) //for those who reported to have used at least a car/bicycle
egen walk =anymatch( m1a2 m2a2) if m1a2==1 & m2a2==1, v(1) //for those who reported walking
egen max10min =anymatch(t1d1 t2d1) if t1d1==1 & t2d1==1, v(1) //took max 10min
egen max30min =anymatch(t1d2 t2d2) if t1d2==1 & t2d2==1, v(1) //took max 30min
egen max1hr =anymatch(t1d3 t2d3) if t1d3==1 & t2d3==1, v(1) //took max 1 hr
egen above2hr =anymatch(t1d4 t2d4) if t1d4==1 & t2d4==1, v(1) //took about or above 2hrs
drop mode1a1- minutes2d4
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(uniqkey hhid) byte(bus walk max10min max30min max1hr above2hr)
34 11 0 1 1 0 0 0
35 87 0 1 0 0 0 0
96 18 0 0 0 0 0 0
98 5 0 0 1 0 0 0
84 11 1 0 0 0 0 1
85 6 1 0 0 0 0 1
115 29 0 0 0 0 0 1
116 20 1 0 0 0 0 1
117 24 1 0 0 0 1 0
798 45 0 1 0 1 0 0
343 43 1 0 0 0 0 1
344 47 1 0 0 0 0 1
end
Regards,
Gatelik
Comment