Hi everyone, i have built a synthetic panel from repeated cross-section data. Data consist on seven rounds conducted after every two Year from 2004 to 2016. After i construct the relevent variables pertaining to hour_wage, education groups, cohort bin (cbin) and consumption. I collapse the data and run my program. The problem arises when i plot the data those who attained University appeard bellow the regression line means there is a negative change in income and consumption for them and the people from middle school are on top of regression line which is other way round. When i checked the data however, mean hour_wage for University cohorts is higher in comparison to intermediate and middle education category. I have checked the code from every aspect and unable to solve this riddle need your suggestions. Here is my initial code which i apply on every round to build up required variables.
Note: This code is just to give you idea that how i constructed the variables in all the rounds before i append.
The above-listed code is used for all the rounds. Now here comes my data after appending all 7 rounds.
[CODE]
Here is the master code which can be used for above data and it will give you the graphs.
If we look at the attached figure now problem is visiable, negative change in income and consumption for university people. why? when in data mean hour_income is higher for them before taking logs.
As you can see in the figure, i have a similar problem for 2010 to 2012 as well.
Note: This code is just to give you idea that how i constructed the variables in all the rounds before i append.
Code:
********************************************** *********Concerned Variables****************** ********************************************** ***Household Characteristics*** rename sbq01 sex rename sbq04 age drop if age ==0 rename sbq02 rstatus //living in hh or temporarily moved out rename sbq03 rwhead //relationship with head of hh rename sbq05 mstatus rename seq10 type_work //constructing number of childern, number of adults and adult equivalence scale bysort hhcode: egen hhsize=count(hhcode) gen child = 1 if age < 14 replace child =0 if age > = 14 bys hhcode: egen num_childern = sum(child) bysort hhcode: gen num_adult = hhsize - num_childern gen aes= 1 + (num_adult - 1) * 0.5 + num_childern * 0.3 //cleaning replace mstatus=1 if mstatus==3 //1 means not married, 2 means married replace mstatus=1 if mstatus==4 replace mstatus=2 if mstatus==5 //replace rwhead=. if sex==2 & rwhead==1 | sex==2 & mstatus==1 // considering only male head hh, married Couples are included //drop if rwhead==. replace rwhead=11 if rwhead==0 replace rwhead=12 if rwhead==9 replace rwhead=9 if rwhead==8 //mother/father in law replace rwhead=8 if rwhead==7 // daughter/son in law keep if rwhead==1 & sex==1 & mstatus==2 //86,960 observations dropped here ******Creating Age-Cohorts********* drop if age > 50 drop if age < 25 | age == 25 gen year= 2004 gen cohort= year - age summarize cohort, d //i have used same defination for cohort across the waves means below listed coding is same for all rounds in terms of cbin and cohort. recode cohort(1986/1990=1) (1981/1985=2) (1976/1980=3) (1971/1975=4) (1966/1970=5) (1961/1965=6) (1956/1960=7) (1951/1955=8), gen(cbin) gen c_age= 28 if cbin==1 replace c_age=33 if cbin==2 replace c_age=38 if cbin==3 replace c_age=43 if cbin==4 replace c_age=48 if cbin==5 replace c_age=53 if cbin==6 replace c_age=58 if cbin==7 replace c_age=63 if cbin==8 ********Creating Education Groups********* ***Education*** rename scqo4 maxedu rename scq05 ifstudent //if currently studying drop if maxedu==19 //(dropped other education: only 22 observations deleted) // "Junior Middle = 1" "Intermediate=2" "University=3" recode maxedu (min/8=1) (9/11=2) (12/max=3)if ifstudent==2, gen(edu_group) drop if edu_group==. ********Labor Supply and Income******** keep if type_work > 1 //849 observations dropped here rename seq02 selfempl //if the respondents didn't work in last week, they are asked if they have any business, trade etc rename seq11 ifworked_money_m //if worked in the last month rename seq12 days_worked rename seq13 salary_monthly rename seq14 working_months //last year rename seq15 ifworked_money_y //if worked last year rename seq16 salary_yearly //drop if selfempl==1 //(only 12 observations deleted) //Some of the people reported monthly income, whereas others reported yearly income. So we have to construct the measure for wage from both types. keep if working_months > = 10 & days_worked > 20 // considering those who worked at least ten months in a year and 20 days in a month ***226 observations deleted here gen wage_m = salary_monthly / days_worked if ifworked_money_m ==1 gen days_worked_y = days_worked * working_months gen wage_y = salary_yearly / days_worked_y //if ifworked_money_y ==1 egen wage = rsum(wage_m wage_y) replace wage=. if wage == 0 //gen annual_wage= days_worked*working_months*wage gen hour_wage = wage /8 //assuming working day means working 8 hours a day replace hour_wage=. if hour_wage== 0 drop if wage==. drop if hour_wage==. summarize hour_wage, d replace hour_wage=. if hour_wage>r(p99) | hour_wage<r(p1) //59 observations dropped here ******Expenditures****** //constructing adult equivalent consumption gen ae_consumption = consumption / aes summarize ae_consumption, d replace ae_consumption=. if ae_consumption > r(p99) | ae_consumption <r(p1) // 68 observations dropped here //egen id = group(cbin edu_group) //tabstat ae_consumption wage_m if year == 2004, by(id) st(mean min p5 p25 p50 p75 p95 max) ************************************************************ ************************************* ************************* //Here we need to collapse the data in order to make synthetic pannel collapse (mean) ae_consumption hour_wage c_age, by (cbin edu_group) gen year= 2004
[CODE]
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float cbin byte edu_group float(ae_consumption hour_wage c_age year) 3 1 1005.2929 20.011213 38 2004 3 2 1082.5514 29.47328 38 2004 3 3 1159.966 29.82299 38 2004 4 1 958.5524 24.2419 43 2004 4 2 1175.9354 29.41757 43 2004 4 3 1787.5577 47.94204 43 2004 5 1 1000.7888 25.253265 48 2004 5 2 1224.527 32.823265 48 2004 5 3 1624.9316 43.67646 48 2004 6 1 998.0583 28.098703 53 2004 6 2 1294.746 37.223385 53 2004 6 3 1704.804 52.70139 53 2004 7 1 984.4781 28.42537 58 2004 7 2 1279.9153 39.7994 58 2004 7 3 1801.728 57.02699 58 2004 8 1 1021.486 26.16772 63 2004 8 2 1284.8254 39.21839 63 2004 8 3 2012.6095 71.293304 63 2004 3 1 969.6302 19.816727 38 2006 3 2 1155.6779 25.147636 38 2006 3 3 1731.774 41.67512 38 2006 4 1 989.2782 24.987286 43 2006 4 2 1205.359 31.23556 43 2006 4 3 1902.7225 48.40184 43 2006 5 1 1012.4293 25.99362 48 2006 5 2 1260.4014 33.815296 48 2006 5 3 1813.4307 55.81765 48 2006 6 1 1052.937 27.97255 53 2006 6 2 1390.88 40.35061 53 2006 6 3 2079.8054 66.982 53 2006 7 1 1084.501 28.63389 58 2006 7 2 1427.3075 40.12785 58 2006 7 3 2112.4878 72.61096 58 2006 2 1 1427.588 24.52204 33 2008 2 2 1506.9424 32.635303 33 2008 2 3 2903.329 72.82051 33 2008 3 1 1281.8406 19.87822 38 2008 3 2 1626.253 31.83381 38 2008 3 3 2561.5964 46.18956 38 2008 4 1 1332.3922 25.203506 43 2008 4 2 1631.312 31.1539 43 2008 4 3 2836.3516 55.58306 43 2008 5 1 1393.9075 28.29908 48 2008 5 2 1824.202 42.16266 48 2008 5 3 2685.155 63.60912 48 2008 6 1 1492.4818 28.83083 53 2008 6 2 1802.4243 42.72336 53 2008 6 3 3258.335 67.1168 53 2008 7 1 1508.2616 29.27627 58 2008 7 2 1642.3346 35.59089 58 2008 7 3 3077.336 75.02588 58 2008 2 1 1963.9072 35.48345 33 2010 2 2 2196.9714 41.82792 33 2010 2 3 3176.809 58.639 33 2010 3 1 2040.6433 37.93051 38 2010 3 2 2369.518 49.28765 38 2010 3 3 3388.298 81.11602 38 2010 4 1 2143.0024 44.93072 43 2010 4 2 2681.7434 55.55556 43 2010 4 3 3898.507 92.3561 43 2010 5 1 2092.359 45.02077 48 2010 5 2 2871.833 66.70649 48 2010 5 3 3653.634 93.2951 48 2010 6 1 2165.3635 43.30079 53 2010 6 2 2725.326 63.10066 53 2010 6 3 4340.7153 120.56422 53 2010 7 1 2217.2205 39.75362 58 2010 7 2 2891.94 68.46322 58 2010 7 3 5330.741 140.80374 58 2010 1 1 2444.769 48.36358 28 2012 1 2 2476.9495 43.3796 28 2012 1 3 3232.415 96.3141 28 2012 2 1 2171.4912 42.25338 33 2012 2 2 2687.919 59.96129 33 2012 2 3 3611.756 94.42162 33 2012 3 1 2335.1243 51.99603 38 2012 3 2 2863.485 66.5342 38 2012 3 3 4216.1333 104.8941 38 2012 4 1 2424.7256 58.05062 43 2012 4 2 2990.466 73.66813 43 2012 4 3 4066.801 116.4934 43 2012 5 1 2514.4436 61.18449 48 2012 5 2 3121.2656 87.70039 48 2012 5 3 4224.4253 130.04353 48 2012 6 1 2548.613 63.65802 53 2012 6 2 3121.12 91.64032 53 2012 6 3 4410.4614 154.17195 53 2012 1 1 2502.627 50.04775 28 2014 1 2 3062.7764 60.1854 28 2014 1 3 4874.5405 99.27192 28 2014 2 1 2729.51 58.52513 33 2014 2 2 3297.106 71.947495 33 2014 2 3 4670.37 110.41965 33 2014 3 1 2804.0916 61.95866 38 2014 3 2 3333.801 80.93292 38 2014 3 3 4724.3315 127.44823 38 2014 4 1 2926.054 68.65212 43 2014 4 2 3665.392 96.27921 43 2014 4 3 4568.759 137.72145 43 2014 5 1 2979.3774 75.497215 48 2014 end
Code:
************************************************* *** CHOOSE THE INCOME AND CONSUMPTION MEASURE *** ************************************************* local income_measure hour_wage *local income_measure hour_wage *local consumption_measure cosnumption local consumption_measure ae_consumption keep if cbin ~=. //cbin means Cohort bin (Age cohorts) capture program drop residualcy program residualcy, eclass //eclass stores the results of regression egen subgroup = group(cbin `1') //(1 means argument 1 which is pertaining to edu_group 1 to 7) keep if year == `2' | year == `3' //(2 and 3 are also argument e-g year 2004 , 2006 etc) bys year subgroup: egen m_c_group = mean(`4') // (in order to make synthetic panel from repeated cross sections we need to generate subgroups in terms of means. Here it is pertaining to consumption) bys year subgroup: egen m_y_group = mean(`5') //same as above but pertaining to income) keep year c_age m_c_group m_y_group subgroup cbin `1' duplicates drop gen lnm_c_group = ln(m_c_group) //taking logs gen lnm_y_group = ln(m_y_group) bys subgroup (year): gen d_c = lnm_c_group[2]-lnm_c_group[1] //subtracting log group means of two different years between same subgroups. like subtracting year 2006 from year 2004 for consumptiom bys subgroup (year): gen d_y = lnm_y_group[2]-lnm_y_group[1] //same as above but for income gen c_age2 = c_age*c_age //in order to reduce age effect for those cohorts who were interviewed later in the survey (here we make age square) gen c_age3 = c_age2*c_age //age cube //keep if year == `3' drop year m_c_group m_y_group lnm_c_group lnm_y_group duplicates drop reg d_c c_age c_age2 c_age3 //change in consumption on age (residual here is the risk effecting the consumption) predict eps_c, resid reg d_y c_age c_age2 c_age3 //change in income on age (residual here is the risk means income shock) predict eps_y, resid reg eps_c eps_y // income shock is independent here and consumption shock is dependent here. So we can check the consumption insurance hypothesis. end capture program drop adgraph2 program adgraph2 twoway (scatter eps_c eps_y if `1' == 1 , mcolor(dknavy) msymbol(O)) /// (scatter eps_c eps_y if `1' == 2 , mcolor(green) msymbol(o)) /// (scatter eps_c eps_y if `1' == 3 , mcolor(blue) msymbol(O)) /// (lfit eps_c eps_y, lpattern(solid) lcolor(black)) /// , ylabel(`10'(0.1)`11', labsize(small)) xlabel(`10'(0.1)`11',labsize(small) angle(vertical)) scheme(s1mono) xtitle("change in log disposable income",size(small)) /// ytitle("change in log consumption",size(small) angle(vertical)) /// legend(nobox symxsize(3) size(small) pos(12) row(3) region(fcolor(none)) /// order(1 "`2'" 2 "`3'" 3 "`4'" 4 "Slope `:di %4.3f _b[eps_y]' with s.e. `:di %4.3f _se[eps_y]'")) graph save "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view1.gph", replace twoway (scatter eps_c eps_y if cbin == 1, mcolor(blue) msymbol(O)) /// (scatter eps_c eps_y if cbin == 2, mcolor(green) msymbol(D)) /// (scatter eps_c eps_y if cbin == 3, mcolor(purple) msymbol(T)) /// (scatter eps_c eps_y if cbin == 4, mcolor(magenta) msymbol(S)) /// (scatter eps_c eps_y if cbin == 5, mcolor(red) msymbol(+)) /// (scatter eps_c eps_y if cbin == 6, mcolor(brown) msymbol(dh)) /// (scatter eps_c eps_y if cbin == 7, mcolor(gold) msymbol(th)) /// (scatter eps_c eps_y if cbin == 8, mcolor(lavender) msymbol(sh)) /// (lfit eps_c eps_y, lpattern(solid) lcolor(black)) /// , ylabel(`10'(0.1)`11', labsize(small)) xlabel(`10'(0.1)`11',labsize(small) angle(vertical)) scheme(s1mono) xtitle("change in log disposable income",size(small)) /// ytitle("change in log consumption",size(small) angle(vertical)) /// legend(nobox symxsize(3) size(small) pos(12) row(3) region(fcolor(none)) order(1 "26-30" 2 "31-35" 3 "36-40" 4 "41-45" 5 "46-50" 6 "51-55" 7 "56-60" 8 "61-65" 9 "Slope `:di %4.3f _b[eps_y]' with s.e. `:di %4.3f _se[eps_y]'")) graph save "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view2.gph", replace graph combine "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view1.gph" /// "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view2.gph", col(2) scheme(s1mono) title("HIES `5' to `6' by `7'", size(small)) graph save "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'.gph",replace graph export "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'.png",replace erase "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'.gph" erase "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view1.gph" erase "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view2.gph" end preserve keep if edu_group ~=. residualcy edu_group 2004 2006 `consumption_measure' `income_measure' adgraph2 edu_group "Junior Middle" "Intermediate" "University" 2004 2006 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2 restore preserve keep if edu_group ~=. residualcy edu_group 2006 2008 `consumption_measure' `income_measure' adgraph2 edu_group "Junior Middle" "Intermediate" "University" 2006 2008 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2 restore preserve keep if edu_group ~=. residualcy edu_group 2008 2010 `consumption_measure' `income_measure' adgraph2 edu_group "Junior Middle" "Intermediate" "University" 2008 2010 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2 restore preserve keep if edu_group ~=. residualcy edu_group 2010 2012 `consumption_measure' `income_measure' adgraph2 edu_group "Junior Middle" "Intermediate" "University" 2010 2012 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2 restore preserve keep if edu_group ~=. residualcy edu_group 2012 2014 `consumption_measure' `income_measure' adgraph2 edu_group "Junior Middle" "Intermediate" "University" 2012 2014 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2 restore preserve keep if edu_group ~=. residualcy edu_group 2014 2016 `consumption_measure' `income_measure' adgraph2 edu_group "Junior Middle" "Intermediate" "University" 2014 2016 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2 restore
As you can see in the figure, i have a similar problem for 2010 to 2012 as well.
Comment