Get data from a web page Topic is solved

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Get data from a web page

Post by raananb »

A web page displays data in a browser, and I would like to get hold of of the data in an application. The URL is

https://www.boursedirect.fr/fr/marche/e ... PAR/seance

and the data element is the share value (63.150 EUR at this moment).

With the WebView sample (with VS 2017, Windows 10), 'View source' menu item provides the structure where the specific data can be searched (and found).

However, I would like to do the processing in the background without any display.

I tried the following code, but the result is an empty string.

Code: Select all

wxWebView* m_browser = wxWebView::New(pWindow, wxID_ANY, "https://www.boursedirect.fr/fr/marche/euronext-paris/bnp-paribas-FR0000131104-BNP-EUR-XPAR/seance");

m_browser->LoadURL("https://www.boursedirect.fr/fr/marche/euronext-paris/bnp-paribas-FR0000131104-BNP-EUR-XPAR/seance");

wxString page = m_browser->GetSource();
User avatar
doublemax
Moderator
Moderator
Posts: 19116
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: Get data from a web page

Post by doublemax »

When using wxWebView (which might not be the best tool for the job), you have to wait for the wxEVT_WEBVIEW_LOADED event. Then the source will be available.
Use the source, Luke!
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

Effectively, that solution is not very handy.
What tool would be better?
User avatar
doublemax
Moderator
Moderator
Posts: 19116
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: Get data from a web page

Post by doublemax »

Usually wxHTTP. But unfortunately it doesn't support SSL.

Options are:
1) libcurl
https://curl.haxx.se/libcurl/

2) One user added SSL to wxHTTP. Check if his solution works for you:
viewtopic.php?f=1&t=43802
Use the source, Luke!
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

wxWidgets objects are easier to use when they provide the functionality...
PB
Part Of The Furniture
Part Of The Furniture
Posts: 4193
Joined: Sun Jan 03, 2010 5:45 pm

Re: Get data from a web page

Post by PB »

Actually, there may be more options than just those listed by doublemax. POCO library can be one of them, see e.g. https://gist.github.com/t-mat/5c1b68d56179d7af7e98. Perhaps it could be annoying that almost all net libraries using SSL have 3rd party SSL library dependency (such as OpenSSL) but it's no big deal....
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

Thanks for all the references.

wxWebView actually does the job of accessing the data from a https url, but it raises a sync issue for successive requests since issuing a request must wait until a DocumentLoaded event is processed for the previous request.
ONEEYEMAN
Part Of The Furniture
Part Of The Furniture
Posts: 7459
Joined: Sat Apr 16, 2005 7:22 am
Location: USA, Ukraine

Re: Get data from a web page

Post by ONEEYEMAN »

Hi,
I would be surprised if you start the computer, open the browser, type "www.google.com" and ijn a matter of 0.00001 sec. the page will be completely loaded. ;-)
And I'm talking about the regular desktop/laptop computer, not the latest and greatest smartphone, which has a WiFi support of 802.11n...

So, of course, you have to wait for the page (source) to be loaded. Or you can try to use cURL. Or you can try to use Perl WebCrawler capability.

Thank you.
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

I have implemented a simple two-part mechanism:

(1) a 'technical' dialog class which includes a load function where a wxWebView object is created and used to send the query (m_browser = wxWebView::New(pWindow, wxID_ANY, queryCode);), and 'wxEVT_WEBVIEW_LOADED' event. The data for the query is provided in the call to the function, and the results are transmitted through pointer arguments.

(2) In the application, the dialog object is created and its load function is executed with the required parameters. A timer is launched in the application just after the load function is called, and the processing ends. The timer checks if the expected data is available (or if an error message is provided), and processing is resumed when the data (or the error message) become available. To make the user aware that the results will require some time, a wxGauge shows progress. When a query is needed, the application creates the dialog, calls the function and waits until the timer tells it that it can continue.

The actual speed of execution is mostly dependent on the website and on the channel, not so much on the computer. My laptop (I7 Windows 10) is connected at 15 to 30Mb/s, and the response time for a simple query can vary from 10 to 20 seconds. Using the browser, he same url loads in 5 to 10 seconds.

This was much faster for me than learning cURL or Perl WebCrawler.

Thanks for the remarks anyway. I may look into the alternatives sometime in the future.
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: Get data from a web page

Post by eranon »

Generally, you don't have to display anything for scraping and wxWebView is provided to display a page... So, you lose time to wait for interpretation and rendering, and you lose time too dowloading external files (eg. images, .css, .js, etc) you don't need. So, the most direct way is really to go through a communication library (eg. libcurl + openssl), then an XML parser supporting the HTML specificities (eg. libxml2)... This done (considering you have not a poor connection and the site is up and running, of course), you should be able to get the info close to instantly.

Of course, if you don't really need to develop something (because it's for your own use only, because your goal is just to get the info), you can always go through a premade online service (eg. https://www.import.io/).

But what's your goal? Do you want to follow/trace the stock price on the timeline? You have a lot of strong apps for this (ProRealTime, Metatrader, Tradingview, etc).
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

@eranon
My application is not using web access intensively. It is a bank-accounts management application and occasionally will access the internet to get stock-exchange quotes for shares in stock accounts it manages.

The application (free to download) used to get the information from a http website, but the site changed to https, and the only wxWidget component available for immediate use was WebView. The implementation requires some synchronizing gymnastics, but it works with a response time worse than before, but still acceptable. The problem with WebView is mainly that with GTK & OSX it requires additional external software.

I agree that WebView is an overkill, since I dont display anything, and I am trying to replace it with a direct communication package. I started with Poco, but the integration is not straightforward (for me).

If anyone has a successful implementation of a https dialog in a wxWidgets application using curl or Poco, I would be grateful to have a look.
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: Get data from a web page

Post by eranon »

OK, I better understand... So, if you just need a value sometime, maybe you could search for the best source to acquire this data rather than trying to stay compliant to the new requirements (https) of a fixed source. There're a lot of websites displaying stocks data (some being certainly reachable through http) and there're even services offering them through API (see https://www.google.fr/search?q=stock+quote+flow+api).

I don't use POCO, but libcurl + openssl and libxml2 are not straightforward too. It's the reason why I asked you if it (your projetc) was worth the effort.

Frankly, I think the first step would be to look for an HTTP website, then (if not found) a free or cheap service through API for data delivery, and in last ressort invest (time, energy, enthousiasm:) to implement it the hard way (about libcurl, there're a lot of C samples in their website, doc and around -- very well done --, but there's obviously a learning curve as for anything).

You said your app is free to download. Do you have a link, if it's not indiscreet?

--
EDIT: OK, forget my last question; found the answer in the website you listed in your profile :)
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
ONEEYEMAN
Part Of The Furniture
Part Of The Furniture
Posts: 7459
Joined: Sat Apr 16, 2005 7:22 am
Location: USA, Ukraine

Re: Get data from a web page

Post by ONEEYEMAN »

Hi,
You can always look at wxCurl... ;-)

Thank you.
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

I am looking at curl, effectively. Will report when appropriate.
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: Get data from a web page

Post by eranon »

ONEEYEMAN wrote:You can always look at wxCurl... ;-)
Does it handle the HTTPS case? In my opinion, it does not save the economy to learn libcurl... At the minimum, you have to build the underlying libcurl with openssl support.

@raananb: I downloaded your app for a quick preview: I got an exception on first launch (attached screenshot) and entering the Apple stock as example in the "Titres", when I hit the button to add it the dialog stops responding (need to kill the process). My test was under Windows 7 64-bit.
Attachments
snap_0005169.png
snap_0005169.png (20.07 KiB) Viewed 6175 times
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
Post Reply