Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting An Instance into two existing attributes

    I have a dataset that includes two variables called "NAME" and "TITLE".

    NAME should simply be an individual's birth name (e.g. "John William Figueroa") and title should be anything appended to the end (e.g. OBE, MD, PhD, JD). Trouble is, a lot of entries instead have this information in the NAME column so that it reads "John William Figueroa, PhD".

    Is there an easy way to use the comma (very frequently present) to shift the title into the next column? I'd use the "split" function but I don't want this broken into two new variables, just want to shift some of the information one line over. Thanks so much for your time!

    Best,
    Chuck

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str25(NAME TITLE)
    "John William Figuero"      "PhD"
    "John William Figuero, PhD" ""  
    end
    
    replace TITLE = regexs(1) if regexm(NAME,"^.*,(.*)$")
    replace NAME = regexs(1) if regexm(NAME,"^(.*),.*$")
    or,
    Code:
    replace TITLE = substr(NAME,strpos(NAME,",")+1,.) if strpos(NAME,",")
    replace NAME = substr(NAME,1,strpos(NAME,",")-1) if strpos(NAME,",")

    Comment


    • #3
      (Crossed with and is presumably superseded by Øyvind's suggestion.)

      What do you have in mind by "next column" that is distinct from another variable? I also don't get what you mean by "shift ... one line over," since I'd understand a "line" as a horizontal object, but I'd think of "shift ... over" as an instruction to move something horizontally.
      Last edited by Mike Lacy; 29 Jan 2022, 16:16.

      Comment


      • #4
        Mike, good point. I should have said "One column over". I want the items after the comma to shift one column over horizontally

        Comment


        • #5
          Chuck Hotelier -

          I want to point out that this can easily be done with the split command (not "function", which has a very definite meaning to Stata), which you apparently are familiar with.
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str25(NAME TITLE)
          "John William Figuero"      "PhD"
          "John William Figuero, PhD" ""  
          end
          split NAME, parse(,)
          replace NAME  = NAME1 if NAME2!=""
          replace TITLE = NAME2 if NAME2!=""
          drop NAME1 NAME2
          list, clean
          Code:
          . list, clean
          
                                 NAME   TITLE  
            1.   John William Figuero     PhD  
            2.   John William Figuero     PhD
          As an aside, when talking about Stata, don't refer to rows and columns. Stata is not a spreadsheet: it has observations and variables, not rows and columns. I make this somewhat pedantic point because to become a successful Stata user you have to stop thinking in spreadsheet terms when you use it. Your habits and instincts acquired from using spreadsheets will seldom be helpful and they will frequently lead you in the wrong direction with Stata. To help your mind keep the distinction between a spreadsheet and a Stata data set vivid, it is best to drop the row/column terminology when speaking of Stata.

          Comment

          Working...
          X