Hello Statalist,
I have a string variable which is interspersed with HTML tags (e.g. "<br>" or "</span>"). I want to get rid of all these tags that are identified by angled brackets.
To make things complicated:
1) There is a large variety of these tags, so I cannot simply run a "subinstr()" for a select list of them - I need something that catches them in an automated way via the angled brackets.
2) There can be more than one of these tags per observation.
I tried the following code (looping it 9 times to remove up to 9 tags):
But this doesn't work well for cases with multiple tags. Take the following example: "<br> Does NAME have an <span style="color:red"> AGREEMENT or CONTRACT</span> to return?" - this approach doesn't know which pair of brackets belong together as one tag, and in consequence some of the "real" text between the tags is also removed... and I end up with only "Does NAME have an ACT"...
Any help would be much appreciated!
Best,
Felix
I have a string variable which is interspersed with HTML tags (e.g. "<br>" or "</span>"). I want to get rid of all these tags that are identified by angled brackets.
To make things complicated:
1) There is a large variety of these tags, so I cannot simply run a "subinstr()" for a select list of them - I need something that catches them in an automated way via the angled brackets.
2) There can be more than one of these tags per observation.
I tried the following code (looping it 9 times to remove up to 9 tags):
Code:
foreach num of numlist 1/9 { gen htmltag`num'=substr(textwithtags,strpos(textwithtags,"<"),strpos(textwithtags,">")) replace textwithtags=subinstr(textwithtags,htmltag`num',"",.) }
Any help would be much appreciated!
Best,
Felix
Comment