Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help using regexm

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id str35 item
      1 "11,12"          
      2 "4,13"           
      4 "1,2,5"              
     83 "1,4,8,10"       
    107 "1,2,8,15"       
    143 "1,2"            
    161 "1,4,5,6,8,11,12"
    end

    I want to create 15 variables called item_1, item_2, .... until item_15 which have value 1 when the string variable "item" includes the value number of the variable's name. For instance,
    id1, item_11=1, item_12=1
    id2, item_4=1, item_13=1
    id4, item_1=1, item_2_=1, item_5=1
    etc.

    I have tried to use the command regexm as below with no success

    foreach x in 1/15 {
    generate item_`x' = regexm(item,"`x'")
    }

    Anyone could help me, please?

  • #2
    regexm requires pipes to separate elements, not commas. I would recommend using its Unicode counterpart ustrregexm as it allows word boundaries, necessary to differentiate, e.g., "1" and "11", since the former is a substring of the latter.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id str35 item
      1 "11,12"          
      2 "4,13"          
      4 "1,2,5"              
     83 "1,4,8,10"      
    107 "1,2,8,15"      
    143 "1,2"            
    161 "1,4,5,6,8,11,12"
    end
    
    gen itemlist= "("+subinstr(trim(itrim(item)), ",", "|", .)+ ")"
    forval x = 1/15{
        generate item_`x' = ustrregexm(itemlist,"\b`x'\b")
    }
    Res.:

    Code:
    . l item itemlist item_1-item_5 item_10-item_12
    
         +----------------------------------------------------------------------------------------------------------------+
         |            item            itemlist   item_1   item_2   item_3   item_4   item_5   item_10   item_11   item_12 |
         |----------------------------------------------------------------------------------------------------------------|
      1. |           11,12             (11|12)        0        0        0        0        0         0         1         1 |
      2. |            4,13              (4|13)        0        0        0        1        0         0         0         0 |
      3. |           1,2,5             (1|2|5)        1        1        0        0        1         0         0         0 |
      4. |        1,4,8,10          (1|4|8|10)        1        0        0        1        0         1         0         0 |
      5. |        1,2,8,15          (1|2|8|15)        1        1        0        0        0         0         0         0 |
         |----------------------------------------------------------------------------------------------------------------|
      6. |             1,2               (1|2)        1        1        0        0        0         0         0         0 |
      7. | 1,4,5,6,8,11,12   (1|4|5|6|8|11|12)        1        0        0        1        1         0         1         1 |
         +----------------------------------------------------------------------------------------------------------------+
    Last edited by Andrew Musau; 08 Apr 2024, 06:36.

    Comment


    • #3
      Here's another way to do it.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float id str35 item
        1 "11,12"          
        2 "4,13"           
        4 "1,2,5"              
       83 "1,4,8,10"       
      107 "1,2,8,15"       
      143 "1,2"            
      161 "1,4,5,6,8,11,12"
      end
      
      gen work = subinstr(item, ",", " ", .)
      
      forval j = 1/15 { 
          gen wanted`j' = strlen(item) > strlen(subinword(work, "`j'", "", .)) 
      }

      Comment

      Working...
      X