Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • regular expression?

    Dear All, How can I use regular expression to separate the following data
    Code:
    clear
    input float subject_ID str19 SexAge_Race
    1 "MALE41.2_WHITE"  
    2 "FEMALE42_BLACK"
    end
    into
    Code:
    clear
    input float subject_ID str10 Sex float Age str10 Race
    1 "MALE"   41.2 "WHITE"
    2 "FEMALE"   42 "BLACK"
    end
    Thanks.
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

  • #2
    Here's one way. Other solutions can be engineered if there is more data that does not conform to the format you show here (if that is the case please provide a larger data example):

    Code:
    gen wanted = ustrregexra(SexAge_Race,"([A-Za-z]*)((?:\d)*.*$)","$1_$2")
    split wanted,p("_")
    Last edited by Ali Atia; 07 Mar 2022, 18:56.

    Comment


    • #3
      Dear Ali, Many thanks for this helpful suggestion.
      Ho-Chuan (River) Huang
      Stata 17.0, MP(4)

      Comment


      • #4
        This works with the example:

        Code:
        clear
        input float subject_ID str19 SexAge_Race
        1 "MALE41.2_WHITE"
        2 "FEMALE42_BLACK"
        end
        
        gen sex = substr(SexAge_Race, 1, 4)
        replace sex = sex + "LE" if sex == "FEMA"
        
        replace SexAge_Race = subinstr(SexAge_Race, sex, "", .)
        
        split SexAge_Race, p(_) destring
        
        +------------------------------------------------------+
        | subjec~D SexAge_R~e sex SexAge~1 SexAge~2 |
        |------------------------------------------------------|
        1. | 1 41.2_WHITE MALE 41.2 WHITE |
        2. | 2 42_BLACK FEMALE 42 BLACK |
        +------------------------------------------------------+
        Last edited by Nick Cox; 08 Mar 2022, 04:56.

        Comment


        • #5
          Dear Nick, Thanks for this suggestion.
          Ho-Chuan (River) Huang
          Stata 17.0, MP(4)

          Comment


          • #6
            Code:
            replace SexAge_Race = subinstr(SexAge_Race, "MALE", "MALE_", 1)
            split  SexAge_Race , p(_) destring gen(v)

            Comment


            • #7
              Bjarte Aagnes #6 is better than #4!

              Comment


              • #8
                Dear Bjarte, Thanks a lot for this helpful suggestion.

                Ho-Chuan (River) Huang
                Stata 17.0, MP(4)

                Comment

                Working...
                X