Page 1 of 1

wxRegEx & HTML

Posted: Sat Apr 04, 2009 6:43 pm
by illnatured
Here's a nice tiny regular expression that was meant to remove all HTML tags from a string:

Code: Select all

wxRegEx ExByeByeTags (wxT("<(.|\\n)+?>"), wxRE_ICASE|wxRE_ADVANCED);
It works, but I have no idea how to modify the expression in order to make it match only tags contained in a table (assuming that all tables look simply like this: <TABLE>...</TABLE>). Are there any regex-geeks here? I'm already having nightmares involving strange ASCII characters ;).

Posted: Mon Apr 06, 2009 2:24 am
by protocol
Please provide a test subject string.

Also check out my app, QuRegExmm on sourceforge, it may be able to help you make the match.

Posted: Mon Apr 06, 2009 5:39 pm
by illnatured
protocol wrote:Please provide a test subject string.

Also check out my app, QuRegExmm on sourceforge, it may be able to help you make the match.
Cool app, surely much better than compiling my wx program every time :). Thank you. This is a test string:
<P>
These tags should remain untouched...
</P>

<table>
<tr><td><span class=A></span>...while these ones should disappear.</span></td></tr>
<tr><td><span class=A></span>&nbsp;It would be nice if this "nbsp" disappeared too</span></td></tr>
<tr><td><span class=A></span>Some text</span></td></tr>
</table>
(not a 100% valid HTML, but this is intentional)

Posted: Wed Apr 15, 2009 5:00 pm
by illnatured
It seems that regular expressions alone aren't powerful enough to easily accomplish this task, so I mixed them with good old std:string functions and it works for me. Nevertheless, your app is so helpful that 5 wxAwards go to you ;).

Posted: Fri Apr 17, 2009 4:00 am
by protocol
Excellent. I'm glad you enjoy the app.

Regards.