Get data from a web page Topic is solved
-
- Super wx Problem Solver
- Posts: 488
- Joined: Fri Oct 27, 2006 4:35 pm
- Location: Paris, France
- Contact:
Re: Get data from a web page
curl is designed to handle HTTPS (among other protocoles), but requires some time to learn and integrate.
@eranon
This is precisely the problem I am trying to resolve. The WebView version works under Windows 10, I'll post it on the website. Since none of the other users of the application complained until now, I suppose that they did not enter data for the protfolio...
@eranon
This is precisely the problem I am trying to resolve. The WebView version works under Windows 10, I'll post it on the website. Since none of the other users of the application complained until now, I suppose that they did not enter data for the protfolio...
- eranon
- Can't get richer than this
- Posts: 867
- Joined: Sun May 13, 2012 11:42 pm
- Location: France
- Contact:
Re: Get data from a web page
Don't know to who you reply here. In my previous post I asked for wxCurl (a known wxWidgets-based wrapper). About libcurl (the lib flavor of curl), I know : all my apps are using it...raananb wrote:curl is designed to handle HTTPS (among other protocoles), but requires some time to learn and integrate.
But, why don't you search for an HTTP source instead? For a small feature that nobody is using, it would eases your job a lot.
OK, understood, but the exception at first launch is another (unrelated) issue.raananb wrote:This is precisely the problem I am trying to resolve. The WebView version works under Windows 10, I'll post it on the website. Since none of the other users of the application complained until now, I suppose that they did not enter data for the protfolio...
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
Re: Get data from a web page
eranon,
I know. I would prefer to use libcURL myself.
But it might be easier for someone familiar with wx and not cURL.
That was just a suggestion.
Thank you.
I know. I would prefer to use libcURL myself.
But it might be easier for someone familiar with wx and not cURL.
That was just a suggestion.
Thank you.
- eranon
- Can't get richer than this
- Posts: 867
- Joined: Sun May 13, 2012 11:42 pm
- Location: France
- Contact:
Re: Get data from a web page
I understood, ONEEYEMAN But not sure (not checked) wxCurl is ready to support HTTPS. So, because of this difficulty to quickly switch to a real dedicated communication lib, I suggested to choose a simple HTTP source instead. This way, he can use wxCurl and even, maybe, wxHTTP.
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
Re: Get data from a web page
Agreed.
-
- Super wx Problem Solver
- Posts: 488
- Joined: Fri Oct 27, 2006 4:35 pm
- Location: Paris, France
- Contact:
Re: Get data from a web page
I would love to have wxCurl, but unfortunately it is not available in wxWidgets...
The sources providing the information I am interested in are moving from HTTP to HTTPS (which is why the application crashed as reported earlier), so sooner or later the application would have to cope with HTTPS.
In wxWidgets, wxWebView is the only object I found which allows to access HTTPS websites, and the current Windows version of my application integrates this technology (which works asynchronously, though). Using wxWebView on GTK & OSX requires additional effort, and I prefer concentrating on curl, which provides synchronous exchange on Windows, GTK and OSX.
I will post progress when there is one.
The sources providing the information I am interested in are moving from HTTP to HTTPS (which is why the application crashed as reported earlier), so sooner or later the application would have to cope with HTTPS.
In wxWidgets, wxWebView is the only object I found which allows to access HTTPS websites, and the current Windows version of my application integrates this technology (which works asynchronously, though). Using wxWebView on GTK & OSX requires additional effort, and I prefer concentrating on curl, which provides synchronous exchange on Windows, GTK and OSX.
I will post progress when there is one.
- eranon
- Can't get richer than this
- Posts: 867
- Joined: Sun May 13, 2012 11:42 pm
- Location: France
- Contact:
Re: Get data from a web page
wxCurl is downloadable here: https://sourceforge.net/projects/wxcode ... ts/wxCurl/ -- more info on http://wxcode.sourceforge.net/components/wxcurl/.raananb wrote:I would love to have wxCurl, but unfortunately it is not available in wxWidgets...
There are a lot of websites displaying stocks quotes that are reachable through HTTP.raananb wrote:The sources providing the information I am interested in are moving from HTTP to HTTPS (which is why the application crashed as reported earlier), so sooner or later the application would have to cope with HTTPS.
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
-
- Super wx Problem Solver
- Posts: 488
- Joined: Fri Oct 27, 2006 4:35 pm
- Location: Paris, France
- Contact:
Re: Get data from a web page
The version of wxCurl (1.0) available from sourceforge does not seem to support https.
Continuing with curl.
Continuing with curl.
- eranon
- Can't get richer than this
- Posts: 867
- Joined: Sun May 13, 2012 11:42 pm
- Location: France
- Contact:
Re: Get data from a web page
Yep, we're all (ONEEYEMAN, you, me) agree, raananb, but I still don't understand why you stick to an HTTPS source while there're plenty of HTTP ones around... It could solve your issue on the short term and doesn't stop you to learn/experience libcurl/openssl on the longer one.
--
EDIT: For example, on your first post (of this thread), you shown a link to boursedirect.fr about the "BNP Paribas" stock. Then, you can acquire this stock quote on zonebourse.com through simple HTTP: http://www.zonebourse.com/BNP-PARIBAS-4618/
--
EDIT: For example, on your first post (of this thread), you shown a link to boursedirect.fr about the "BNP Paribas" stock. Then, you can acquire this stock quote on zonebourse.com through simple HTTP: http://www.zonebourse.com/BNP-PARIBAS-4618/
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
-
- Super wx Problem Solver
- Posts: 488
- Joined: Fri Oct 27, 2006 4:35 pm
- Location: Paris, France
- Contact:
Re: Get data from a web page
I agree. I'll do that.
- eranon
- Can't get richer than this
- Posts: 867
- Joined: Sun May 13, 2012 11:42 pm
- Location: France
- Contact:
Re: Get data from a web page
Cool
I had a break today (it's certainly not a reliable solution on the parsing part for the long run, but just to help you get out of the emergency):
I had a break today (it's certainly not a reliable solution on the parsing part for the long run, but just to help you get out of the emergency):
Code: Select all
wxURL url("http://quotes.wsj.com/FR/BNP");
if (url.GetError() != wxURL_NOERR){
wxMessageBox("Unable to connect!");
return;}
wxString html;
wxInputStream* in = url.GetInputStream();
if (!in->IsOk()){
delete in;
wxLogMessage("Unable to fetch!");}
wxStringOutputStream out(&html);
in->Read(out);
delete in;
/* ::SaveTextFile("c:/fetched.html", html, true, false); */
const wxString BEFORE_TAG = "<span id=\"quote_val\">";
const wxString AFTER_TAG = "</span>";
int nBefore = html.Find(BEFORE_TAG);
int nAfter = html.find(AFTER_TAG, nBefore);
if (nBefore == wxNOT_FOUND || nAfter == wxNOT_FOUND){
wxLogMessage("Unable to parse!");
return;}
nBefore += BEFORE_TAG.Len();
wxString price = html.Mid(nBefore, nAfter - nBefore);
wxMessageBox("Price is " + price + " EUR");
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
-
- Super wx Problem Solver
- Posts: 488
- Joined: Fri Oct 27, 2006 4:35 pm
- Location: Paris, France
- Contact:
Re: Get data from a web page
Thanks for the code.
The issue is more complicated than just posting a query to a website: for companies which do not have a Ticker symbols (Saint-Gobain, for example) wsj.com does not provide quotes. BNP happens to be the Ticker symbol for BNP-Paribas in the NYSE.
On the other hand, using the ISIN (International Securities Identification Number) boursier.com can provides quotes on shares traded in New York (Intel, US4581401001).
The investigation continues.
The issue is more complicated than just posting a query to a website: for companies which do not have a Ticker symbols (Saint-Gobain, for example) wsj.com does not provide quotes. BNP happens to be the Ticker symbol for BNP-Paribas in the NYSE.
On the other hand, using the ISIN (International Securities Identification Number) boursier.com can provides quotes on shares traded in New York (Intel, US4581401001).
The investigation continues.
- eranon
- Can't get richer than this
- Posts: 867
- Joined: Sun May 13, 2012 11:42 pm
- Location: France
- Contact:
Re: Get data from a web page
It was just an example to show you you're not forced to overthink the subject in a first place. There are tons of websites displaying stocks and you just have to find the right one listing the stocks you target, on HTTP and with a page structure allowing to parse it with ease. I simply choosen this site among a lot of others because it was the very first one I found with an HTTP access and a simple identifiable tag to isolate the price.
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
Re: Get data from a web page
I am not using POCO but I had it ready as I wanted to look at retriving content from pages using https://
I have no knowledge of POCO, I just blindly adapted the download_ssl sample from POCO to wxWidgets (Windows, MSVC 2015, wxWidgets 3.1). FWIW, I was able to get it running, although I guess for a production code it would need (much) more work. For example I used AcceptCertificateHandler as certificate handler and replaced "rootcert.pem" with an empty string when creating ptrContext.
I have no knowledge of POCO, I just blindly adapted the download_ssl sample from POCO to wxWidgets (Windows, MSVC 2015, wxWidgets 3.1). FWIW, I was able to get it running, although I guess for a production code it would need (much) more work. For example I used AcceptCertificateHandler as certificate handler and replaced "rootcert.pem" with an empty string when creating ptrContext.
Code: Select all
#include <Poco/URIStreamOpener.h>
#include <Poco/StreamCopier.h>
#include <Poco/Path.h>
#include <Poco/URI.h>
#include <Poco/SharedPtr.h>
#include <Poco/Exception.h>
#include <Poco/Net/HTTPStreamFactory.h>
#include <Poco/Net/HTTPSStreamFactory.h>
#include <Poco/Net/FTPStreamFactory.h>
#include <Poco/Net/SSLManager.h>
#include <Poco/Net/KeyConsoleHandler.h>
#include <Poco/Net/AcceptCertificateHandler.h>
#include <memory>
#include <iostream>
#include <sstream>
#include <wx/wx.h>
using Poco::URIStreamOpener;
using Poco::StreamCopier;
using Poco::Path;
using Poco::URI;
using Poco::SharedPtr;
using Poco::Exception;
using Poco::Net::HTTPStreamFactory;
using Poco::Net::HTTPSStreamFactory;
using Poco::Net::FTPStreamFactory;
using Poco::Net::SSLManager;
using Poco::Net::Context;
using Poco::Net::KeyConsoleHandler;
using Poco::Net::PrivateKeyPassphraseHandler;
using Poco::Net::InvalidCertificateHandler;
using Poco::Net::AcceptCertificateHandler;
class SSLInitializer
{
public:
SSLInitializer()
{
Poco::Net::initializeSSL();
}
~SSLInitializer()
{
Poco::Net::uninitializeSSL();
}
};
class MyFrame : public wxFrame
{
public:
MyFrame() : wxFrame(NULL, wxID_ANY, _("Test"))
{
wxMenu *demoMenu = new wxMenu;
demoMenu->Append(wxID_OPEN, _("&Download...\tCtrl+D"));
demoMenu->Append(wxID_EXIT, _("E&xit"));
wxMenuBar *menuBar = new wxMenuBar();
menuBar->Append(demoMenu, _("&Demo"));
SetMenuBar(menuBar);
m_text= new wxTextCtrl(this, wxID_ANY, wxEmptyString,
wxDefaultPosition, wxDefaultSize,
wxTE_MULTILINE | wxTE_READONLY | wxTE_RICH2);
Bind(wxEVT_COMMAND_MENU_SELECTED, &MyFrame::OnDownload, this, wxID_OPEN);
Bind(wxEVT_COMMAND_MENU_SELECTED, [=](wxCommandEvent&) { Close(true);}, wxID_EXIT);
HTTPStreamFactory::registerFactory();
HTTPSStreamFactory::registerFactory();
FTPStreamFactory::registerFactory();
}
private:
SSLInitializer m_sslInitializer;
wxTextCtrl* m_text;
void OnDownload(wxCommandEvent&)
{
static wxString strURI = "https://example.com";
try
{
// Note: we must create the passphrase handler prior Context
// AcceptCertificateHandler is for testing only
SharedPtr<AcceptCertificateHandler> ptrCert = new AcceptCertificateHandler (false);
Context::Ptr ptrContext = new Context(Context::CLIENT_USE, "", "", "", Context::VERIFY_RELAXED, 9, false, "ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH");
SSLManager::instance().initializeClient(0, ptrCert, ptrContext);
strURI = wxGetTextFromUser(_("Enter URL (including protocol)"), _("URL"), strURI);
if ( strURI.empty() )
return;
URI uri(strURI);
std::unique_ptr<std::istream> pStr(URIStreamOpener::defaultOpener().open(uri));
std::stringstream ss;
StreamCopier::copyStream(*pStr.get(), ss);
m_text->SetValue(wxString::FromUTF8(ss.str()));
}
catch (Exception& exc)
{
wxLogError(wxString(exc.displayText()));
}
}
};
class MyApp : public wxApp
{
public:
virtual bool OnInit()
{
(new MyFrame())->Show();
return true;
}
}; wxIMPLEMENT_APP(MyApp);
-
- Super wx Problem Solver
- Posts: 488
- Joined: Fri Oct 27, 2006 4:35 pm
- Location: Paris, France
- Contact:
Re: Get data from a web page
After some trials with curl, I finally settled on a webview-based solution since that was the simplest way to extract data from a single page built by WebView (and identical to the page actually displayed with a standard browser).
I created class GetShareQuote which include the function CheckShareValue(wxWebView* browser, wxString sharel, wxString* quote).
The application creates the object GetShareQuote then the wxWebView browser and calls the function CheckShareValue launches a timer to check if quote is not empty and a wxGauge to display progress.
CheckShareValue creates the url with the share and loads it in the browser. The browser is Connected to a wxWebViewEventHandler with the function OnDocumentLoaded.
When a document is loaded, the value of the share is put into quote in the event handler.
The timer in the application is stopped when quote has a value, otherwise it makes the wxGauge progress a notch. A limit on the number of timer cycles is used for the case of Internet interruption or website not responding.
As the process is not time-critical, the response times are perfectly acceptable (maximum wait is 50 cycles of 300ms, but usually the values are obtained in a few seconds).
The solution works perfectly on Windows, OSX and GTK (Ubuntu).
Thanks to all who chipped in.
I created class GetShareQuote which include the function CheckShareValue(wxWebView* browser, wxString sharel, wxString* quote).
The application creates the object GetShareQuote then the wxWebView browser and calls the function CheckShareValue launches a timer to check if quote is not empty and a wxGauge to display progress.
Code: Select all
wxString quote;
GetShareQuote* GSQ = new GetShareQuote(this);
wxWebView* browser = wxWebView::New(this, wxID_ANY, wxEmptyString);
GSQ->CheckShareQuote(browser, share, "e);
Code: Select all
// store function arguments in local variables
m_browser = browser;
m_quote = quote;
m_browser->LoadURL(url);
m_browser->Hide();
m_browser->Connect(m_browser->GetId(), wxEVT_WEBVIEW_LOADED, wxWebViewEventHandler(GetShareQuote::OnDocumentLoaded), NULL, this);
The timer in the application is stopped when quote has a value, otherwise it makes the wxGauge progress a notch. A limit on the number of timer cycles is used for the case of Internet interruption or website not responding.
As the process is not time-critical, the response times are perfectly acceptable (maximum wait is 50 cycles of 300ms, but usually the values are obtained in a few seconds).
The solution works perfectly on Windows, OSX and GTK (Ubuntu).
Thanks to all who chipped in.