Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing non-numeric characters from a string variable

    Hello,

    I am trying to create a variable denoting days of age (for neonates) from the following data (I already generated two new variables ageday and agemonth).
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 age str7 ageday str8 agemonth
    "9"       ""        ""       
    "13"      ""        ""       
    "13"      ""        ""       
    "8"       ""        ""       
    "11"      ""        ""       
    "19"      ""        ""       
    "28"      ""        ""       
    "20"      ""        ""       
    "22"      ""        ""       
    "19"      ""        ""       
    "45"      ""        ""       
    "45"      ""        ""       
    "23"      ""        ""       
    "15"      ""        ""       
    "16"      ""        ""       
    "28"      ""        ""       
    "8"       ""        ""       
    "52"      ""        ""       
    "10"      ""        ""       
    "34"      ""        ""       
    "12"      ""        ""       
    "16"      ""        ""       
    "21"      ""        ""       
    "15"      ""        ""       
    "19"      ""        ""       
    "29"      ""        ""       
    "9"       ""        ""       
    "28"      ""        ""       
    "30"      ""        ""       
    "8"       ""        ""       
    "13"      ""        ""       
    "15"      ""        ""       
    "23"      ""        ""       
    "21"      ""        ""       
    "21"      ""        ""       
    "15"      ""        ""       
    "25"      ""        ""       
    "9"       ""        ""       
    "67"      ""        ""       
    "5"       ""        ""       
    "16"      ""        ""       
    "21"      ""        ""       
    "20"      ""        ""       
    "3"       ""        ""       
    "43"      ""        ""       
    "43"      ""        ""       
    "43"      ""        ""       
    "5 MONTH" ""        "5 MONTH"
    "5"       ""        ""       
    "46"      ""        ""       
    "1"       ""        ""       
    "62"      ""        ""       
    "13"      ""        ""       
    "65"      ""        ""       
    "45"      ""        ""       
    "0"       ""        ""       
    "49"      ""        ""       
    "23"      ""        ""       
    "66"      ""        ""       
    "66"      ""        ""       
    "66"      ""        ""       
    "66"      ""        ""       
    "66"      ""        ""       
    "66"      ""        ""       
    "0"       ""        ""       
    "57"      ""        ""       
    "57"      ""        ""       
    "41"      ""        ""       
    "56"      ""        ""       
    "56"      ""        ""       
    "0"       ""        ""       
    "35"      ""        ""       
    "54"      ""        ""       
    "17 DAY"  "17 DAY"  ""       
    "24"      ""        ""       
    "65"      ""        ""       
    "22 DAYS" "22 DAYS" ""       
    "12 DAYS" "12 DAYS" ""       
    "38"      ""        ""       
    "38"      ""        ""       
    "49"      ""        ""       
    "22"      ""        ""       
    "50"      ""        ""       
    "50"      ""        ""       
    "30"      ""        ""       
    "30"      ""        ""       
    "47"      ""        ""       
    "40"      ""        ""       
    "40"      ""        ""       
    "40"      ""        ""       
    "3 DAYS"  "3 DAYS"  ""       
    "51"      ""        ""       
    "72"      ""        ""       
    "88"      ""        ""       
    "22"      ""        ""       
    "22"      ""        ""       
    "70"      ""        ""       
    "33"      ""        ""       
    "83"      ""        ""       
    "57"      ""        ""       
    end
    I would like to remove non-numeric characters from ageday and agemonth in order to destring the two variables. I know I can do the following:
    Code:
    replace ageday = subinstr(ageday, "DAY", "", .)
    replace ageday = subinstr(ageday, "DAYS","",.)
    However, is there faster way to do this? For example, is there a way just to drop non-numeric characters. I tried
    Code:
     destring var, replace force
    , but ended up with all missing values. I ask this because the remainder of the data also uses "d", "day", "days", "Day", and "Days" to denote days of age in string format.

    Thank you,

    Katie

  • #2
    You can use egenmore (SSC) and ereplace (SSC) to do this.

    Code:
    //if you don't already have the programmes type:
    ssc install egenmore
    ssc install ereplace
    
    //then:
    ereplace age = sieve(age), keep(numeric) //of course you can repeat this for -ageday- and -agemonth-

    Comment


    • #3
      note also that the -destring- command has an "ignore" option allowing you to make the variable numeric without losing information; whether it is a good way to go depends on whether you have other non-numeric characters in these variables; see
      Code:
      help destring

      Comment


      • #4
        Thank you, Chris and Rich for your help! The ereplace command and sieve function were especially useful; I'm looking forward to utilizing them more in the future.

        Kind regards,

        Katie

        Comment

        Working...
        X