Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conditional replace code

    Hi,
    I am trying to clean the data below in order to run the collapse command.
    Code:
    collapse tot_emp h_median h_mean a_mean mean_prse emp_prse year, by(prim_state area area_name)
    There should be unique correspondence between variables prim_state and area . However, I find that there are many observations where this is not so. It seems on eyeballing that the the last four observations for every unique entity in variable area is incorrect (or missing) in the raw data itself.

    Since I have more than 6000 observations in my raw data, I am not sure how to use the replace command or write a loop to solve this problem.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 prim_state str4 area str46 area_name str5 occ_code str118 occ_title str3 group long tot_emp double(h_median h_mean) long a_mean double(mean_prse emp_prse) str4(annual mean_aster) int year
    "AK" "0380" "Anchorage, AK MSA"   "98000" "Helpers, Laborers, and Material Movers, Hand"                 "maj"    .     .     .     .    .    . ""     ""    .
    "AK" "0380" "Anchorage, AK MSA"   "98102" "Helpers, Mechanics and Repairers"                             ""     330 12.57 14.93 31040 12.3 13.6 ""     "" 1997
    "AK" "0380" "Anchorage, AK MSA"   "98311" "Helpers, Brick and Stonemasons and Hard Tile Setters"         ""       . 17.25 15.89 33050    6    . ""     "" 1997
    "AK" "0380" "Anchorage, AK MSA"   "98312" "Helpers, Carpenters and Related Workers"                      ""     390 14.56 16.63 34600   15 17.9 ""     "" 1997
    "AK" "0380" "Anchorage, AK MSA"   "98313" "Helpers, Electricians and Power-Line Transmission Installers" ""       .  9.13 11.06 23010  9.6    . ""     "" 1997
    "AK" "0380" "Anchorage, AK MSA"   "98315" "Helpers, Plumbers, Pipefitters, and Steamfitters"             ""      40  16.7 15.81 32880  2.8 28.8 ""     "" 1997
    "AK" "0380" "Anchorage, AK MSA"   "98316" "Helpers, Roofers"                                             ""      40  11.4 10.95 22770  2.4   47 ""     "" 1997
    "AK" "0380" "Anchorage, AK MSA"   "98319" "Helpers, All Other Construction Trades Workers"               ""       . 13.32 12.63 26260  6.3    . ""     "" 1997
    "AL" "0380" "Anchorage, AK MSA"   "98799" "All Other Freight, Stock, and Material Movers, Hand"          ""     490 10.43 11.54 23990  7.1 12.2 ""     "" 1997
    "AL" "0380" "Anchorage, AK MSA"   "98902" "Hand Packers and Packagers"                                   ""     960 10.19   9.6 19970  5.3 48.5 ""     "" 1997
    "AL" "0380" "Anchorage, AK MSA"   "98905" "Vehicle Washers and Equipment Cleaners"                       ""     390   6.9  7.11 14800  4.4 31.2 ""     "" 1997
    "AL" "0380" "Anchorage, AK MSA"   "98999" "All Other Helpers, Laborers, and Material Movers, Hand"       ""    1650 14.45 15.04 31290  3.7 12.2 ""     "" 1997
    "MT" "3040" "Great Falls, MT MSA" "31000" "Teachers, Educators, Librarians, and Related Occupations"     "maj"    .     .     .     .    .    . ""     ""    .
    "MT" "3040" "Great Falls, MT MSA" "31303" "Teachers, Preschool"                                          ""     150  7.17  7.33 15250  5.6 19.8 ""     "" 1997
    "MT" "3040" "Great Falls, MT MSA" "31304" "Teachers, Kindergarten"                                       ""      40     .     . 25430 13.5 40.1 "true" "" 1997
    "MT" "3040" "Great Falls, MT MSA" "31305" "Teachers, Elementary School"                                  ""     560     .     . 26640 13.6 23.2 "true" "" 1997
    "MT" "3040" "Great Falls, MT MSA" "97108" "Bus Drivers"                                                  ""      60  9.69  9.42 19590  3.1 22.4 ""     "" 1997
    "MT" "3040" "Great Falls, MT MSA" "97111" "Bus Drivers, School"                                          ""      50  7.76  7.94 16510 13.6 47.1 ""     "" 1997
    "MT" "3040" "Great Falls, MT MSA" "97117" "Driver/Sales Workers"                                         ""      90 11.73 11.22 23350  3.5 12.2 ""     "" 1997
    "MT" "3040" "Great Falls, MT MSA" "97805" "Service Station Attendants"                                   ""       .   6.1  6.75 14050  7.5    . ""     "" 1997
    "MT" "3040" "Great Falls, MT MSA" "97923" "Excavating and Loading Machine Operators"                     ""      30  11.5 11.89 24720   12   42 ""     "" 1997
    "MT" "3040" "Great Falls, MT MSA" "97947" "Industrial Truck and Tractor Operators"                       ""     110  9.15  9.65 20070  7.2 40.5 ""     "" 1997
    "MT" "3040" "Great Falls, MT MSA" "98000" "Helpers, Laborers, and Material Movers, Hand"                 "maj"    .     .     .     .    .    . ""     ""    .
    "MT" "3040" "Great Falls, MT MSA" "98102" "Helpers, Mechanics and Repairers"                             ""      30  6.81  6.81 14170  6.6 34.1 ""     "" 1997
    "MT" "3040" "Great Falls, MT MSA" "98312" "Helpers, Carpenters and Related Workers"                      ""       . 14.95 14.51 30180  6.3    . ""     "" 1997
    "MT" "3040" "Great Falls, MT MSA" "98315" "Helpers, Plumbers, Pipefitters, and Steamfitters"             ""       .  8.97  9.61 19980  9.9    . ""     "" 1997
    ""   "3040" "Great Falls, MT MSA" "98799" "All Other Freight, Stock, and Material Movers, Hand"          ""     170  7.47  8.03 16710  5.3 15.3 ""     "" 1997
    ""   "3040" "Great Falls, MT MSA" "98902" "Hand Packers and Packagers"                                   ""     130  5.73   6.8 14150  7.8 45.7 ""     "" 1997
    ""   "3040" "Great Falls, MT MSA" "98905" "Vehicle Washers and Equipment Cleaners"                       ""     100  5.91  6.46 13430  4.3 17.5 ""     "" 1997
    ""   "3040" "Great Falls, MT MSA" "98999" "All Other Helpers, Laborers, and Material Movers, Hand"       ""     300 10.79 10.32 21470  5.3 20.8 ""     "" 1997
    end

  • #2
    given your example, the following will do what you want (though this is not completely consistent with what you say):
    Code:
    replace prim_state=prim_state[_n-1] if prim_state=="" & area==area[_n-1]
    if you are unsure that the full data is correctly sorted then you might want to check that, or enforce it with a -sort- command, first

    Comment


    • #3
      Thank you for the syntax Rich.
      The following code does the task without using the sort command -
      Code:
       
       replace prim_state=prim_state[_n-1] if area==area[_n-1]

      Comment

      Working...
      X