Get data from a web page Topic is solved

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

curl is designed to handle HTTPS (among other protocoles), but requires some time to learn and integrate.

@eranon
This is precisely the problem I am trying to resolve. The WebView version works under Windows 10, I'll post it on the website. Since none of the other users of the application complained until now, I suppose that they did not enter data for the protfolio...
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: Get data from a web page

Post by eranon »

raananb wrote:curl is designed to handle HTTPS (among other protocoles), but requires some time to learn and integrate.
Don't know to who you reply here. In my previous post I asked for wxCurl (a known wxWidgets-based wrapper). About libcurl (the lib flavor of curl), I know : all my apps are using it...

But, why don't you search for an HTTP source instead? For a small feature that nobody is using, it would eases your job a lot.
raananb wrote:This is precisely the problem I am trying to resolve. The WebView version works under Windows 10, I'll post it on the website. Since none of the other users of the application complained until now, I suppose that they did not enter data for the protfolio...
OK, understood, but the exception at first launch is another (unrelated) issue.
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
ONEEYEMAN
Part Of The Furniture
Part Of The Furniture
Posts: 7459
Joined: Sat Apr 16, 2005 7:22 am
Location: USA, Ukraine

Re: Get data from a web page

Post by ONEEYEMAN »

eranon,
I know. I would prefer to use libcURL myself.
But it might be easier for someone familiar with wx and not cURL.

That was just a suggestion.

Thank you.
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: Get data from a web page

Post by eranon »

I understood, ONEEYEMAN ;) But not sure (not checked) wxCurl is ready to support HTTPS. So, because of this difficulty to quickly switch to a real dedicated communication lib, I suggested to choose a simple HTTP source instead. This way, he can use wxCurl and even, maybe, wxHTTP.
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
ONEEYEMAN
Part Of The Furniture
Part Of The Furniture
Posts: 7459
Joined: Sat Apr 16, 2005 7:22 am
Location: USA, Ukraine

Re: Get data from a web page

Post by ONEEYEMAN »

Agreed. ;-)
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

I would love to have wxCurl, but unfortunately it is not available in wxWidgets...

The sources providing the information I am interested in are moving from HTTP to HTTPS (which is why the application crashed as reported earlier), so sooner or later the application would have to cope with HTTPS.

In wxWidgets, wxWebView is the only object I found which allows to access HTTPS websites, and the current Windows version of my application integrates this technology (which works asynchronously, though). Using wxWebView on GTK & OSX requires additional effort, and I prefer concentrating on curl, which provides synchronous exchange on Windows, GTK and OSX.

I will post progress when there is one.
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: Get data from a web page

Post by eranon »

raananb wrote:I would love to have wxCurl, but unfortunately it is not available in wxWidgets...
wxCurl is downloadable here: https://sourceforge.net/projects/wxcode ... ts/wxCurl/ -- more info on http://wxcode.sourceforge.net/components/wxcurl/.
raananb wrote:The sources providing the information I am interested in are moving from HTTP to HTTPS (which is why the application crashed as reported earlier), so sooner or later the application would have to cope with HTTPS.
There are a lot of websites displaying stocks quotes that are reachable through HTTP.
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

The version of wxCurl (1.0) available from sourceforge does not seem to support https.

Continuing with curl.
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: Get data from a web page

Post by eranon »

Yep, we're all (ONEEYEMAN, you, me) agree, raananb, but I still don't understand why you stick to an HTTPS source while there're plenty of HTTP ones around... It could solve your issue on the short term and doesn't stop you to learn/experience libcurl/openssl on the longer one.

--
EDIT: For example, on your first post (of this thread), you shown a link to boursedirect.fr about the "BNP Paribas" stock. Then, you can acquire this stock quote on zonebourse.com through simple HTTP: http://www.zonebourse.com/BNP-PARIBAS-4618/
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

I agree. I'll do that.
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: Get data from a web page

Post by eranon »

Cool =D>

I had a break today (it's certainly not a reliable solution on the parsing part for the long run, but just to help you get out of the emergency):

Code: Select all

    wxURL url("http://quotes.wsj.com/FR/BNP");
    if (url.GetError() != wxURL_NOERR){
        wxMessageBox("Unable to connect!");
        return;}

    wxString html;
    wxInputStream* in = url.GetInputStream();
    if (!in->IsOk()){
        delete in;
        wxLogMessage("Unable to fetch!");}
    wxStringOutputStream out(&html);
    in->Read(out);
    delete in;

    /* ::SaveTextFile("c:/fetched.html", html, true, false); */

    const wxString BEFORE_TAG = "<span id=\"quote_val\">";
    const wxString AFTER_TAG = "</span>";
    int nBefore = html.Find(BEFORE_TAG);
    int nAfter = html.find(AFTER_TAG, nBefore);
    if (nBefore == wxNOT_FOUND || nAfter == wxNOT_FOUND){
        wxLogMessage("Unable to parse!");
        return;}
    nBefore += BEFORE_TAG.Len();

    wxString price = html.Mid(nBefore, nAfter - nBefore);
    wxMessageBox("Price is " + price + " EUR");
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

Thanks for the code.

The issue is more complicated than just posting a query to a website: for companies which do not have a Ticker symbols (Saint-Gobain, for example) wsj.com does not provide quotes. BNP happens to be the Ticker symbol for BNP-Paribas in the NYSE.

On the other hand, using the ISIN (International Securities Identification Number) boursier.com can provides quotes on shares traded in New York (Intel, US4581401001).

The investigation continues.
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: Get data from a web page

Post by eranon »

It was just an example to show you you're not forced to overthink the subject in a first place. There are tons of websites displaying stocks and you just have to find the right one listing the stocks you target, on HTTP and with a page structure allowing to parse it with ease. I simply choosen this site among a lot of others because it was the very first one I found with an HTTP access and a simple identifiable tag to isolate the price.
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
PB
Part Of The Furniture
Part Of The Furniture
Posts: 4193
Joined: Sun Jan 03, 2010 5:45 pm

Re: Get data from a web page

Post by PB »

I am not using POCO but I had it ready as I wanted to look at retriving content from pages using https://

I have no knowledge of POCO, I just blindly adapted the download_ssl sample from POCO to wxWidgets (Windows, MSVC 2015, wxWidgets 3.1). FWIW, I was able to get it running, although I guess for a production code it would need (much) more work. For example I used AcceptCertificateHandler as certificate handler and replaced "rootcert.pem" with an empty string when creating ptrContext.

Code: Select all

#include <Poco/URIStreamOpener.h>
#include <Poco/StreamCopier.h>
#include <Poco/Path.h>
#include <Poco/URI.h>
#include <Poco/SharedPtr.h>
#include <Poco/Exception.h>
#include <Poco/Net/HTTPStreamFactory.h>
#include <Poco/Net/HTTPSStreamFactory.h>
#include <Poco/Net/FTPStreamFactory.h>
#include <Poco/Net/SSLManager.h>
#include <Poco/Net/KeyConsoleHandler.h>
#include <Poco/Net/AcceptCertificateHandler.h>

#include <memory>
#include <iostream>
#include <sstream> 

#include <wx/wx.h>

using Poco::URIStreamOpener;
using Poco::StreamCopier;
using Poco::Path;
using Poco::URI;
using Poco::SharedPtr;
using Poco::Exception;
using Poco::Net::HTTPStreamFactory;
using Poco::Net::HTTPSStreamFactory;
using Poco::Net::FTPStreamFactory;
using Poco::Net::SSLManager;
using Poco::Net::Context;
using Poco::Net::KeyConsoleHandler;
using Poco::Net::PrivateKeyPassphraseHandler;
using Poco::Net::InvalidCertificateHandler;
using Poco::Net::AcceptCertificateHandler;


class SSLInitializer
{
public:
    SSLInitializer()
    {
        Poco::Net::initializeSSL();
    }

    ~SSLInitializer()
    {
        Poco::Net::uninitializeSSL();
    }
};


class MyFrame : public wxFrame
{
public:
    MyFrame() : wxFrame(NULL, wxID_ANY, _("Test"))
    {
        wxMenu *demoMenu = new wxMenu;
        demoMenu->Append(wxID_OPEN, _("&Download...\tCtrl+D"));
        demoMenu->Append(wxID_EXIT, _("E&xit"));

        wxMenuBar *menuBar = new wxMenuBar();
        menuBar->Append(demoMenu, _("&Demo"));
        SetMenuBar(menuBar);

        m_text= new wxTextCtrl(this, wxID_ANY, wxEmptyString, 
                                    wxDefaultPosition, wxDefaultSize, 
                                    wxTE_MULTILINE | wxTE_READONLY | wxTE_RICH2);        

        Bind(wxEVT_COMMAND_MENU_SELECTED, &MyFrame::OnDownload, this, wxID_OPEN);
        Bind(wxEVT_COMMAND_MENU_SELECTED, [=](wxCommandEvent&) { Close(true);}, wxID_EXIT);

        HTTPStreamFactory::registerFactory();
        HTTPSStreamFactory::registerFactory();
        FTPStreamFactory::registerFactory();        
    }	    
private:    
    SSLInitializer m_sslInitializer;
    wxTextCtrl* m_text;
    
    void OnDownload(wxCommandEvent&)
    {
        static wxString strURI = "https://example.com";

        try
        {
            // Note: we must create the passphrase handler prior Context 
            // AcceptCertificateHandler  is for testing only
            SharedPtr<AcceptCertificateHandler> ptrCert = new AcceptCertificateHandler (false); 
            Context::Ptr ptrContext = new Context(Context::CLIENT_USE, "", "", "", Context::VERIFY_RELAXED, 9, false, "ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH");
            SSLManager::instance().initializeClient(0, ptrCert, ptrContext);

            strURI = wxGetTextFromUser(_("Enter URL (including protocol)"), _("URL"), strURI);
            if ( strURI.empty() )
                return;

            URI uri(strURI);
            std::unique_ptr<std::istream> pStr(URIStreamOpener::defaultOpener().open(uri));
            std::stringstream ss;

            StreamCopier::copyStream(*pStr.get(), ss);
            m_text->SetValue(wxString::FromUTF8(ss.str()));
        }
        catch (Exception& exc)
        {
            wxLogError(wxString(exc.displayText()));        
        }
    }
};

class MyApp : public wxApp
{
public:
    virtual bool OnInit()
    {     
        (new MyFrame())->Show();               
        return true;
    }
}; wxIMPLEMENT_APP(MyApp);
raananb
Super wx Problem Solver
Super wx Problem Solver
Posts: 488
Joined: Fri Oct 27, 2006 4:35 pm
Location: Paris, France
Contact:

Re: Get data from a web page

Post by raananb »

After some trials with curl, I finally settled on a webview-based solution since that was the simplest way to extract data from a single page built by WebView (and identical to the page actually displayed with a standard browser).

I created class GetShareQuote which include the function CheckShareValue(wxWebView* browser, wxString sharel, wxString* quote).

The application creates the object GetShareQuote then the wxWebView browser and calls the function CheckShareValue launches a timer to check if quote is not empty and a wxGauge to display progress.

Code: Select all

wxString quote;
GetShareQuote* GSQ = new GetShareQuote(this);
wxWebView* browser = wxWebView::New(this, wxID_ANY, wxEmptyString);
GSQ->CheckShareQuote(browser, share, &quote);
CheckShareValue creates the url with the share and loads it in the browser. The browser is Connected to a wxWebViewEventHandler with the function OnDocumentLoaded.

Code: Select all

// store function arguments in local variables
m_browser = browser;  
m_quote = quote;
m_browser->LoadURL(url);
m_browser->Hide();
m_browser->Connect(m_browser->GetId(), wxEVT_WEBVIEW_LOADED, wxWebViewEventHandler(GetShareQuote::OnDocumentLoaded), NULL, this);
When a document is loaded, the value of the share is put into quote in the event handler.

The timer in the application is stopped when quote has a value, otherwise it makes the wxGauge progress a notch. A limit on the number of timer cycles is used for the case of Internet interruption or website not responding.

As the process is not time-critical, the response times are perfectly acceptable (maximum wait is 50 cycles of 300ms, but usually the values are obtained in a few seconds).

The solution works perfectly on Windows, OSX and GTK (Ubuntu).

Thanks to all who chipped in.
Post Reply