Page 1 of 1

Some regex help needed

Posted: Sat Jul 02, 2005 8:52 am
by vdell
I'm trying to get some movie information from IMDb.com (I actually got connected to the damn page :)) but it seems that my regex skills are a bit rusty ATM.

So, the problem is that there's data like:

Code: Select all

<b class="blackcatheader">Directed by</b><br>
<a href="/name/nm0905152/">Andy Wachowski</a><br><a href="/name/nm0905154/">Larry Wachowski</a>
and I'm trying to get all directors from that data. So far I have the following regex:

Code: Select all

(?:<b\s+[^>]*>Directed by</b><br>\s?\n){1}(?:<a\s+[^>]*>([^<]*)</a>(?:<br>)?)*
Which almost works. The problem with the above regex is that it captures all directors to the same group (screenshot1.png) even though I need them to go to their own groups. Any suggestions?

I'm using boost::regex_search.

Posted: Sat Jul 02, 2005 9:05 am
by tiwag
help yourself with the regex-coach
http://weitz.de/regex-coach/

Posted: Sat Jul 02, 2005 11:53 am
by vdell
tiwag wrote:help yourself with the regex-coach
http://weitz.de/regex-coach/
Well, I have tried that along with the one in the screenshot but those tools are not really helpful when I don't have any ideas how I could get the regex working. Thanks anyway.

Posted: Fri Jul 22, 2005 9:28 am
by Ryan Norton
I'd respond but then I'm not that much of a regex master myself. Ask on the irc channel #wxwidgets though - BrianHV who hangs out there is a regex master and has answered many of my questions on this.