Dear all,
I have a survey data for different regions and I want to do my data generation in an easy way.
I extracted the data from CSPRO and I have to run for each item as:
clear
run "D:\Agric Data_new\2011\F3Q54.DO"
This (for eg. ) generates the data for crop production.
The data contains households production and amount sold (value for houshold’s who sold their crop produce)
Note: one household could have more than one land piece/field and could produce and sold more than one crop.
So to derive the total crop output value,
I did the following
generate crop_priceperKG= traded_amt/ sold_qty_kg *(this generates crop price, by dividing traded amount to the sold quantity)
keep if q54a_crop_code==1 *(this keeps only crop type 1)
mean crop_priceperKG if crop_priceperKG >0, over(r1_region)*(this derives the mean of crop price for each region based on crop prices greater than one.
I found Mean = 1.714286 and 2.428571 for region 2 and 3
Then manually I replaced the zero/missing value at a regional level as:
replace crop_priceperKG=1.714286 if r1_region==2 & crop_priceperKG==0
replace crop_priceperKG=2.428571 if r1_region==3 & crop_priceperKG==0
If there is no any household who sold crop number 1 (which means no price) in all the observations for a region, then I used sample mean to replace the zero/missing price values as:
mean cr_priceperKG if cr_priceperKG>0
replace cr_priceperKG=2.071428 if crop_priceperKG==0
Then finally I generated Total crop output value for crop I as:
generate Tot_CropOUTP_value= crop production* cr_priceperKG
But, as I said before there are households who appear in the data more than one as they might have produced the same crop from more than one field/land piece.
To solve this, I did the following:
sort yr_process r1_region r2_district r3_admin r4_ea r5_block r6_dwelling r7_hhold
by yr_process r1_region r2_district r3_admin r4_ea r5_block r6_dwelling r7_hhold: generate n1 = _n
reshape wide crop_field_number crop production traded_amt sold_qty_kg crop_priceperKG Tot_CropOUTP_value , i( yr_process r1_region r2_district r3_admin r4_ea r5_block r6_dwelling r7_hhold) j( n1)
This produced: x1…..xn variables
Then I have replaced the missed values as
mvencode crop_field_number1 crop production1 traded_amt1 sold_qty_kg1 crop_priceperKG1 Tot_CropOUTP_value1 crop_field_numberXi crop productionXi traded_amtXi sold_qty_kgXi crop_priceperKGXi Tot_CropOUTP_valueXi, mv(0) override
where Xi is 1…..n depending on the number of duplication of a household
Finally I generate Tot_CropOUTP_valueC1 (total crop output value of crop 1) as:
generate Tot_CropOUTP_valueC1= Tot_CropOUTP_value1+Tot_CropOUTP_value2+Tot_CropOU TP_value3+Tot_CropOUTP_valueXn
My question is,
The procedure I followed is manual and it is tiresome as I have to do this for all crop type which ranges up to 38 type of crops.
Is there any way I can do using local macros and looping?
I very much appreciate any kind of help or comments.
Regards,
Sirak