Help using regexm

Silvia Girardi

Join Date: Jun 2014

Posts: 7
#1

Help using regexm

08 Apr 2024, 04:40

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float id str35 item 1 "11,12" 2 "4,13" 4 "1,2,5" 83 "1,4,8,10" 107 "1,2,8,15" 143 "1,2" 161 "1,4,5,6,8,11,12" end

I want to create 15 variables called item_1, item_2, .... until item_15 which have value 1 when the string variable "item" includes the value number of the variable's name. For instance,
id1, item_11=1, item_12=1
id2, item_4=1, item_13=1
id4, item_1=1, item_2_=1, item_5=1
etc.

I have tried to use the command regexm as below with no success

foreach x in 1/15 {
generate item_`x' = regexm(item,"`x'")
}

Anyone could help me, please?
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10195

08 Apr 2024, 05:31

regexm requires pipes to separate elements, not commas. I would recommend using its Unicode counterpart ustrregexm as it allows word boundaries, necessary to differentiate, e.g., "1" and "11", since the former is a substring of the latter.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float id str35 item
  1 "11,12"          
  2 "4,13"          
  4 "1,2,5"              
 83 "1,4,8,10"      
107 "1,2,8,15"      
143 "1,2"            
161 "1,4,5,6,8,11,12"
end

gen itemlist= "("+subinstr(trim(itrim(item)), ",", "|", .)+ ")"
forval x = 1/15{
    generate item_`x' = ustrregexm(itemlist,"\b`x'\b")
}

Res.:

Code:

. l item itemlist item_1-item_5 item_10-item_12

     +----------------------------------------------------------------------------------------------------------------+
     |            item            itemlist   item_1   item_2   item_3   item_4   item_5   item_10   item_11   item_12 |
     |----------------------------------------------------------------------------------------------------------------|
  1. |           11,12             (11|12)        0        0        0        0        0         0         1         1 |
  2. |            4,13              (4|13)        0        0        0        1        0         0         0         0 |
  3. |           1,2,5             (1|2|5)        1        1        0        0        1         0         0         0 |
  4. |        1,4,8,10          (1|4|8|10)        1        0        0        1        0         1         0         0 |
  5. |        1,2,8,15          (1|2|8|15)        1        1        0        0        0         0         0         0 |
     |----------------------------------------------------------------------------------------------------------------|
  6. |             1,2               (1|2)        1        1        0        0        0         0         0         0 |
  7. | 1,4,5,6,8,11,12   (1|4|5|6|8|11|12)        1        0        0        1        1         0         1         1 |
     +----------------------------------------------------------------------------------------------------------------+

Last edited by Andrew Musau; 08 Apr 2024, 05:36.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

08 Apr 2024, 10:02

Here's another way to do it.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float id str35 item
  1 "11,12"          
  2 "4,13"           
  4 "1,2,5"              
 83 "1,4,8,10"       
107 "1,2,8,15"       
143 "1,2"            
161 "1,4,5,6,8,11,12"
end

gen work = subinstr(item, ",", " ", .)

forval j = 1/15 { 
    gen wanted`j' = strlen(item) > strlen(subinword(work, "`j'", "", .)) 
}

Announcement

Comment

Comment