Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • String manipulation to ensure consistent formatting

    I have two datasets that I am trying to merge. In Dataset 1 I have participant ID's (variable name is Participant_ID) that are a concatenation of first initial, last initial and date of birth in the format of two digit month, two digit day, and 4 digit year without any slashes or dashes in between; see the first dataex below. For example, someone born on January 1st, 2001 with initials AA would be AA01012001 in this dataset. However, for dataset 1 I do NOT have separate variables for first intial, last initial, and date of birth.

    In Dataset 2, I have separate variables for First_Initial, Last_Initial and DOB (date of birth). The format of the data in DOB is not consistent. Some entries have month as 1 digit while other entries have month as 2 digits. For example, some might have January as just a "1" while others may have it as "01" (see the first and second observations in the dataex for Dataset 2 below). The same thing is true for the day of the month. To create a participant ID, I will need to concatenate the three variables in a way that the format matches the format of dataset 1.

    Due to the inconsistencies in how the dates of birth are entered in dataset 2, I am stumped regarding how to generate a Participant_ID variable from dataset 2 that will allow me to merge with dataset 1 and I appreciate any advice on this matter.

    Dataset 1 example:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 Participant_ID
    "AW01032009"
    "BB01142013"
    "WZ05082012"
    "ZZ01072011"
    "CW12052011"
    end
    Dataset 2:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 DOB str13 First_Initial str12 Last_Initial
    "01/03/2009" "A" "W"
    "1/14/2013"  "B" "B"
    "05/8/2012"  "W" "Z"
    "1/7/2011"   "Z" "Z"
    "12/05/2011" "C" "W"
    end
    Best,
    Alyssa Beavers

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 DOB str13 First_Initial str12 Last_Initial
    "01/03/2009" "A" "W"
    "1/14/2013"  "B" "B"
    "05/8/2012"  "W" "Z"
    "1/7/2011"   "Z" "Z"
    "12/05/2011" "C" "W"
    end
    
    split DOB, parse("/") destring gen(dvars)
    gen Participant_ID = First_Initial + Last_Initial ///
        + string(dvars1, "%02.0f") + string(dvars2, "%02.0f") + string(dvars3)
        
    list, noobs clean

    Comment

    Working...
    X