Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standardize occupational codes

    Dear Stata Forum,
    hope you are all fine.
    I have the following simple question.
    Let's say I have the occupational codes as such:
    Code
    1234
    5678
    6666.01
    6666.02
    I would like to get rid of the digits after the dot to get the following standardized code:
    1234
    5678
    6666
    6666
    I want to later merge this occupational code with another dataset that also has this occupational code.
    Any ideas?
    I would appreciate your help.
    Best wishes,
    Nico

  • #2
    Your tableau of the data fails to provide the most critical piece of information for solving your problem: is the variable Code a string or numeric variable. This is a classic example of why the Forum FAQ advises people to use the -dataex- command to show example data. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Now, in this case, there are only two possibilities, and the solutions are relatively simple in either case.

    If the variable Code is numeric: -replace code = floor(code)- will do the job.

    If the variable Code is string, and if the standardized code is always four characters long, -replace code = substr(code, 1, 4)- will do it.

    If the variable Code is string, but the standardized codes vary in length:
    Code:
    gen dp_location = strpos(Code, ".")
    replace Code = substr(Code, 1, dp_location - 1) if dp_location > 0
    Note: I'm sure there is an elegant one-line solution for this case using regular-expressions, and perhaps somebody else will post that. But regular expressions just baffle me, and they aren't needed to do this task.

    Finally: please always use -dataex- to show example data. It will save you time as well as saving the time of those who want to help you.

    Comment


    • #3
      Dear Clyde,
      your suggestions and advice and criticism are well taken. I really appreciate your kind and useful reply anyhow.
      Code is numeric, so floor(code) worked just fine.
      Thanks,
      Nico

      Comment

      Working...
      X