Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split when comma is not embedded inside a bracket

    I have a STATA variable which contains thousands of observations. One of those observations is; 070810(faba bean / green bean / lima bean / Mung bean / ...), 070820(string bean / yard-long bean (Vigna unguiculata s...), 070890(Only Pigeon pea (pod)), 071022(Only garden bean (Phaseolus vulgaris), shelled or...). Other observations are similar but might be bigger in size. I want to create new variables that separate the original variable into multiple parts such that the separation only occurs when the comma (,) is not embedded inside a parenthesis. so in the case i provided, variable 1 would be: 070810(faba bean / green bean / lima bean / Mung bean / ...), variable 2:070820(string bean / yard-long bean (Vigna unguiculata s...), variable 3:070890(Only Pigeon pea (pod)) and variable 4:071022(Only garden bean (Phaseolus vulgaris), shelled or...). Note that some observations will require a lot more than just 4. Thank you so much!

  • #2
    Code:
    clear
    input strL text
    "070810(faba bean / green bean / lima bean / Mung bean / ...), 070820(string bean / yard-long bean (Vigna unguiculata s...), 070890(Only Pigeon pea (pod)), 071022(Only garden bean (Phaseolus vulgaris), shelled or...)"
    end
    
    replace text= ustrregexra(text, "\)\,\s+(\d+)", ")|$1")
    split text, p(|)
    list text?
    Res.:

    Code:
    . list text?
    
         +----------------------------------------------------------------------------------------------+
      1. |                                                                        text1                 |
         |                 070810(faba bean / green bean / lima bean / Mung bean / ...)                 |
         |----------------------------------------------------------------------------------------------|
         |                                                        text2 |                         text3 |
         | 070820(string bean / yard-long bean (Vigna unguiculata s...) | 070890(Only Pigeon pea (pod)) |
         |----------------------------------------------------------------------------------------------|
         |                                                                        text4                 |
         |                 071022(Only garden bean (Phaseolus vulgaris), shelled or...)                 |
         +----------------------------------------------------------------------------------------------+

    Comment


    • #3
      Dear Andrew,

      Thanks a lot. That is a smart way of approaching it.
      Hope you have a great day

      Comment

      Working...
      X