Splitting An Instance into two existing attributes

Chuck Hotelier

Join Date: Jan 2022

Posts: 18
#1

Splitting An Instance into two existing attributes

29 Jan 2022, 14:26

I have a dataset that includes two variables called "NAME" and "TITLE".

NAME should simply be an individual's birth name (e.g. "John William Figueroa") and title should be anything appended to the end (e.g. OBE, MD, PhD, JD). Trouble is, a lot of entries instead have this information in the NAME column so that it reads "John William Figueroa, PhD".

Is there an easy way to use the comma (very frequently present) to shift the title into the next column? I'd use the "split" function but I don't want this broken into two new variables, just want to shift some of the information one line over. Thanks so much for your time!

Best,
Chuck
Tags: None

Øyvind Snilsberg

Join Date: Oct 2021
Posts: 591

29 Jan 2022, 16:09

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str25(NAME TITLE)
"John William Figuero"      "PhD"
"John William Figuero, PhD" ""  
end

replace TITLE = regexs(1) if regexm(NAME,"^.*,(.*)$")
replace NAME = regexs(1) if regexm(NAME,"^(.*),.*$")

or,

Code:

replace TITLE = substr(NAME,strpos(NAME,",")+1,.) if strpos(NAME,",")
replace NAME = substr(NAME,1,strpos(NAME,",")-1) if strpos(NAME,",")

Comment

Mike Lacy

Join Date: Apr 2014

Posts: 2396
#3

29 Jan 2022, 16:12

(Crossed with and is presumably superseded by Øyvind's suggestion.)

What do you have in mind by "next column" that is distinct from another variable? I also don't get what you mean by "shift ... one line over," since I'd understand a "line" as a horizontal object, but I'd think of "shift ... over" as an instruction to move something horizontally.

Last edited by Mike Lacy; 29 Jan 2022, 16:16.
Comment
Chuck Hotelier

Join Date: Jan 2022

Posts: 18
#4

29 Jan 2022, 18:41

Mike, good point. I should have said "One column over". I want the items after the comma to shift one column over horizontally
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

29 Jan 2022, 19:34

Chuck Hotelier -

I want to point out that this can easily be done with the split command (not "function", which has a very definite meaning to Stata), which you apparently are familiar with.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str25(NAME TITLE) "John William Figuero" "PhD" "John William Figuero, PhD" "" end split NAME, parse(,) replace NAME = NAME1 if NAME2!="" replace TITLE = NAME2 if NAME2!="" drop NAME1 NAME2 list, clean

Code:

. list, clean NAME TITLE 1. John William Figuero PhD 2. John William Figuero PhD

As an aside, when talking about Stata, don't refer to rows and columns. Stata is not a spreadsheet: it has observations and variables, not rows and columns. I make this somewhat pedantic point because to become a successful Stata user you have to stop thinking in spreadsheet terms when you use it. Your habits and instincts acquired from using spreadsheets will seldom be helpful and they will frequently lead you in the wrong direction with Stata. To help your mind keep the distinction between a spreadsheet and a Stata data set vivid, it is best to drop the row/column terminology when speaking of Stata.
2 likes
Comment

Announcement

Splitting An Instance into two existing attributes

Comment

Comment

Comment

Comment