Hi everyone!
- I'm kind of stuck with the data cleaning process of my master thesis because I don't know how to create a variable that reflects the age of the head of the household in each observation. So far I've managed to create the variable sex of the head of the household using the following code:
*Generate var sex HH head*
bysort hhid : gen sexhhhead=1 if relhhhead==1 & sex>1 & sex<=2
replace sexhhhead=0 if sexhhhead==.
egen sexhhhead1 = max(sexhhhead), by (hhid)
- This code is helpful in case of dealing with dummy variables or categorical variables, I've also applied a similar code for generating the variable education of the head of the household (years of education):
bysort hhid : gen educhhhead=1 if relhhhead==1 & yearseduc>0 & yearseduc<=1
replace educhhhead=0 if educhhhead==.
egen educhhhead1 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=2 if relhhhead==1 & yearseduc>1 & yearseduc<=2
replace educhhhead=0 if educhhhead==.
egen educhhhead2 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=3 if relhhhead==1 & yearseduc>2 & yearseduc<=3
replace educhhhead=0 if educhhhead==.
egen educhhhead3 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=4 if relhhhead==1 & yearseduc>3 & yearseduc<=4
replace educhhhead=0 if educhhhead==.
egen educhhhead4 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=5 if relhhhead==1 & yearseduc>4 & yearseduc<=5
replace educhhhead=0 if educhhhead==.
egen educhhhead5 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=6 if relhhhead==1 & yearseduc>5 & yearseduc<=6
replace educhhhead=0 if educhhhead==.
egen educhhhead6 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=7 if relhhhead==1 & yearseduc>6 & yearseduc<=7
replace educhhhead=0 if educhhhead==.
egen educhhhead7 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=8 if relhhhead==1 & yearseduc>7 & yearseduc<=8
replace educhhhead=0 if educhhhead==.
egen educhhhead8 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=9 if relhhhead==1 & yearseduc>8 & yearseduc<=9
replace educhhhead=0 if educhhhead==.
egen educhhhead9 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=10 if relhhhead==1 & yearseduc>9 & yearseduc<=10
replace educhhhead=0 if educhhhead==.
egen educhhhead10 = max(educhhhead), by (hhid)
replace educhhhead1=. if educhhhead1==0
replace educhhhead2=. if educhhhead2==0
replace educhhhead3=. if educhhhead3==0
replace educhhhead4=. if educhhhead4==0
replace educhhhead5=. if educhhhead5==0
replace educhhhead6=. if educhhhead6==0
replace educhhhead7=. if educhhhead7==0
replace educhhhead8=. if educhhhead8==0
replace educhhhead9=. if educhhhead9==0
replace educhhhead10=. if educhhhead10==0
gen educhhhead=max(educhhhead1, educhhhead2, educhhhead3, educhhhead4, educhhhead5,educhhhead6, educhhhead7, educhhhead8, educhhhead9, educhhhead10, educhhhead11, educhhhead12, educhhhead13, educhhhead14, educhhhead15, educhhhead16, educhhhead17, educhhhead18)
- However, since in my data sample the age of the head of the household ranges from 10 to 97, following my previous code methodology would mean an extremely long code, year by year, and that would make me waste a lot of time.
Any idea/recommendation? Thanks a lot!!!!
Daniel.
- I'm kind of stuck with the data cleaning process of my master thesis because I don't know how to create a variable that reflects the age of the head of the household in each observation. So far I've managed to create the variable sex of the head of the household using the following code:
*Generate var sex HH head*
bysort hhid : gen sexhhhead=1 if relhhhead==1 & sex>1 & sex<=2
replace sexhhhead=0 if sexhhhead==.
egen sexhhhead1 = max(sexhhhead), by (hhid)
- This code is helpful in case of dealing with dummy variables or categorical variables, I've also applied a similar code for generating the variable education of the head of the household (years of education):
bysort hhid : gen educhhhead=1 if relhhhead==1 & yearseduc>0 & yearseduc<=1
replace educhhhead=0 if educhhhead==.
egen educhhhead1 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=2 if relhhhead==1 & yearseduc>1 & yearseduc<=2
replace educhhhead=0 if educhhhead==.
egen educhhhead2 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=3 if relhhhead==1 & yearseduc>2 & yearseduc<=3
replace educhhhead=0 if educhhhead==.
egen educhhhead3 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=4 if relhhhead==1 & yearseduc>3 & yearseduc<=4
replace educhhhead=0 if educhhhead==.
egen educhhhead4 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=5 if relhhhead==1 & yearseduc>4 & yearseduc<=5
replace educhhhead=0 if educhhhead==.
egen educhhhead5 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=6 if relhhhead==1 & yearseduc>5 & yearseduc<=6
replace educhhhead=0 if educhhhead==.
egen educhhhead6 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=7 if relhhhead==1 & yearseduc>6 & yearseduc<=7
replace educhhhead=0 if educhhhead==.
egen educhhhead7 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=8 if relhhhead==1 & yearseduc>7 & yearseduc<=8
replace educhhhead=0 if educhhhead==.
egen educhhhead8 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=9 if relhhhead==1 & yearseduc>8 & yearseduc<=9
replace educhhhead=0 if educhhhead==.
egen educhhhead9 = max(educhhhead), by (hhid)
drop educhhhead
bysort hhid : gen educhhhead=10 if relhhhead==1 & yearseduc>9 & yearseduc<=10
replace educhhhead=0 if educhhhead==.
egen educhhhead10 = max(educhhhead), by (hhid)
replace educhhhead1=. if educhhhead1==0
replace educhhhead2=. if educhhhead2==0
replace educhhhead3=. if educhhhead3==0
replace educhhhead4=. if educhhhead4==0
replace educhhhead5=. if educhhhead5==0
replace educhhhead6=. if educhhhead6==0
replace educhhhead7=. if educhhhead7==0
replace educhhhead8=. if educhhhead8==0
replace educhhhead9=. if educhhhead9==0
replace educhhhead10=. if educhhhead10==0
gen educhhhead=max(educhhhead1, educhhhead2, educhhhead3, educhhhead4, educhhhead5,educhhhead6, educhhhead7, educhhhead8, educhhhead9, educhhhead10, educhhhead11, educhhhead12, educhhhead13, educhhhead14, educhhhead15, educhhhead16, educhhhead17, educhhhead18)
- However, since in my data sample the age of the head of the household ranges from 10 to 97, following my previous code methodology would mean an extremely long code, year by year, and that would make me waste a lot of time.
Any idea/recommendation? Thanks a lot!!!!
Daniel.
Comment