Hi all,
I'm working with some administrative data in which the names are formatted slightly differently in each year. The goal is to format all names the same way so that I can ask Stata to identify the years in which a person is present in the dataset, using a command such as:
Below is an example of my data where the formatting issues in the "name" variable are clear. I'm using Stata 15.1 on a Mac. Solution?
I'm working with some administrative data in which the names are formatted slightly differently in each year. The goal is to format all names the same way so that I can ask Stata to identify the years in which a person is present in the dataset, using a command such as:
Code:
bysort name year: gen present=_n==1
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str20 name float year "SMITH JONES ,PAUL O" 2012 "SMITH JONES ,PAUL O" 2012 "SMITH JONES, PAUL, O" 2013 "SMITHJONES,PAUL,O" 2014 "SMITH, JONES, PAUL O" 2015 "SMITH, JONES, PAUL O" 2015 "SMITH JONES, PAUL, O" 2016 end
Comment