Good afternoon, everyone.
I have applied a "reshape long" to the dataset at bottom and would like to be able to perform the following actions for each worker.
(With my apologies, the project tags are inconsistent in this test data set in order to properly represent the filing inconsistencies we're dealing with in the data set it represents.)
Actions desired:
1) If "project" is not tagged for any of a worker's observations, leave the observations for that worker as is. (This will be most of them.)
2) If "project" contains values for one of the years, but not both, then drop the year *with* project values. (i.e. drop all Year 1 observations for Worker 3 below)
3) If "project" contains values for both years, then apply the following for all situations, regardless of whether the projects are "A" and "B", "A" and no value, or "B" and no value:
Please let me know if any revisions or clarification are needed, and thank you for your time.
I have applied a "reshape long" to the dataset at bottom and would like to be able to perform the following actions for each worker.
(With my apologies, the project tags are inconsistent in this test data set in order to properly represent the filing inconsistencies we're dealing with in the data set it represents.)
Actions desired:
1) If "project" is not tagged for any of a worker's observations, leave the observations for that worker as is. (This will be most of them.)
2) If "project" contains values for one of the years, but not both, then drop the year *with* project values. (i.e. drop all Year 1 observations for Worker 3 below)
3) If "project" contains values for both years, then apply the following for all situations, regardless of whether the projects are "A" and "B", "A" and no value, or "B" and no value:
- If one project's output is higher than the other project's output for all years observed, keep the worker but drop all year-observations for the lower-output project for that worker.
- Ex: Drop Worker 5's project B, since its output is always lower than the output of the project with no letter specification. Keep the observations with no letter specification
- Ex: Drop Worker 13's project A, since its output is always lower than project B for worker 13. Keep the observations for project B.
- Ex: Drop Worker 17's project with no letter specification, since its value is lower than Project A for all years observed. Keep the observations for Project A.
- If there is conflict as to which project is higher output (i.e. the values cross) or there is a tie at any time, drop all years with project values.
- Ex: In the case of worker 10, there is a tie between project A and the project of no letter specification in year 2. Drop both year 1 and year 2 for worker 10. (If there were an additional year 3 observation with no project specifications, we would keep it per step #2 above)
- Ex: In the case of worker 12, project A has higher output in the first year but project B has higher output in the second year. Drop year 1 and year 2 for worker 12.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(worker year) str1 project byte output 1 1 "" 25 1 2 "" 63 2 1 "" 16 2 2 "" 32 3 1 "A" 50 3 1 "B" 5 3 2 "" 90 4 1 "" 28 4 2 "" 5 5 1 "" 12 5 1 "B" 10 5 2 "" 12 5 2 "B" 11 6 1 "" 92 6 2 "" 71 7 1 "" 67 7 2 "" 94 8 1 "" 100 8 2 "" 36 9 1 "" 20 9 2 "" 16 10 1 "A" 50 10 1 "" 10 10 2 "A" 55 10 2 "" 55 11 1 "" 63 11 2 "" 26 12 1 "A" 40 12 1 "B" 30 12 2 "A" 45 12 2 "B" 50 13 1 "A" 60 13 1 "B" 65 13 2 "A" 65 13 2 "B" 70 14 1 "" 49 14 2 "" 16 15 1 "" 10 15 2 "" 55 16 1 "" 55 16 2 "" 47 17 1 "" 70 17 1 "A" 90 17 2 "" 75 17 2 "A" 90 18 1 "" 90 18 2 "" 64 19 1 "" 86 19 2 "" 16 20 1 "" 76 20 2 "" 74 end
Comment