Page 1 of 1

Parsing HTML pages

Posted: Mon Nov 28, 2011 2:51 pm
by Muetdhiver
Hi all,

is there some sample that illustrates the wxHtmlParser class or wxHtmlWinParser in order to simply parse an HTML page given as argument ?

I'm not able to find such documentation anywhere.

Thanks for your help.

Re: Parsing HTML pages

Posted: Mon Nov 28, 2011 3:47 pm
by Muetdhiver
As example, i've got a web page in wich I can find a tag like this :

<div class="thumbinner" ...>

What I want to achieve is just to get some information after the div tag.
Guess I must use the parser :

Code: Select all

wxHtmlParser parser = new wxHtmlWinParser( "mypage.html" ) ;
wxHtmlCell* top_level_object = parser->Parse();

top_level_object->Find( wxHTML_COND_ISANCHOR, ????????????????????);
What should I put in the Find method to obtain what I want ?
Thanks a lot.

Bye.

Re: Parsing HTML pages

Posted: Mon Nov 28, 2011 5:03 pm
by doublemax
I don't think this class can be used that easily (i have no idea though).

Depending on what you need, maybe it can be done with wxRegEx or wxXmlDocument.

Re: Parsing HTML pages

Posted: Wed Nov 30, 2011 10:05 am
by evstevemd
with wxWebview (in development on trunk) you can get source code and as DM said you can analyze to get what you want!

Re: Parsing HTML pages

Posted: Wed Nov 30, 2011 2:44 pm
by Auria
evstevemd wrote:with wxWebview (in development on trunk) you can get source code and as DM said you can analyze to get what you want!
webview is only used for displaying the page, AFAIK it doesn't have an API to expose the DOM

Re: Parsing HTML pages

Posted: Thu Dec 01, 2011 6:34 pm
by evstevemd
Auria wrote:
evstevemd wrote:with wxWebview (in development on trunk) you can get source code and as DM said you can analyze to get what you want!
webview is only used for displaying the page, AFAIK it doesn't have an API to expose the DOM
May be but you can get source and analyze it
http://docs.wxwidgets.org/2.9.2/classwx ... f26764f6d9

Re: Parsing HTML pages

Posted: Fri Dec 02, 2011 8:18 am
by Muetdhiver
Hello,

thanks.
Wow, I've not the time to get the sources and analyse them. I would think a solution has been developped already in wxWdigets api.

In fact, after posting theses messages, I read more information about wxXmlDocument class, since HTML is just an implentation of XML norms. So I think it could do the trick, but never tested yet.

Again thanks, bye.

Re: Parsing HTML pages

Posted: Fri Dec 02, 2011 6:53 pm
by Auria
Actually it depends, if you need to parse XHTML then indeed wxXMLDocument will do, though if you parse non-XHTML then it won't work