Error: trying to encode undefined Unicode character Topic is solved

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
iwbnwif
Super wx Problem Solver
Super wx Problem Solver
Posts: 282
Joined: Tue Mar 19, 2013 8:52 pm

Error: trying to encode undefined Unicode character

Post by iwbnwif »

The subject text is an assert that happens in the wxLogMessage call of the following code:

Code: Select all

    wxCurlHTTP get ("http://www.wxwidgets.org/");

    wxStringOutputStream stream;

    if (get.Get (stream) && stream.IsOk())
    {
        wxLogMessage (stream.GetString());
    }
Please leave aside the discussion about using wxCurl vs libcurl directly :wink: .

I don't think this is particularly a problem with the contents of the stream, because the following seems to work perfectly:

Code: Select all

    wxCurlHTTP get ("http://www.wxwidgets.org/");

    wxStringOutputStream stream;

    if (get.Get (stream) && stream.IsOk())
    {
        std::cout << stream.GetString() << std::endl;
    }
I guess that the << operator is causing an implicit conversion from wxString to something else, which by the way is stripping the invalid characters. Alternatively, maybe std::cout silently ignores invalid characters, whereas wxLogMessage asserts.

I am really struggling to debug this because of the way wxLogMessage is written.

Is there a way to 'clean' the wxString to be safe for wxLogMessage?

Edit: I should mention that this is using the latest GitHub version of wxWidgets on Ubuntu 16.04.
Last edited by iwbnwif on Thu Aug 31, 2017 6:55 am, edited 1 time in total.
wxWidgets 3.1.2, MinGW64 8.1.0, g++ 8.1.0, Ubuntu 19.04, Windows 10, CodeLite + wxCrafter
Some people, when confronted with a GUI problem, think "I know, I'll use Eclipse RCP". Now they have two problems.
User avatar
doublemax
Moderator
Moderator
Posts: 19116
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: Error: trying to encode undefined Unicode character

Post by doublemax »

It would have been interesting to see the content of a "bad" string.

wxLogXXX methods work like printf, e.g. you can directly write something like:

Code: Select all

wxLogMessage( "error=%d", 10 );
This will lead to a problem when the string you're receiving contains any % chars.

One work around would be:

Code: Select all

wxLogMessage("%s", stream.GetString() );
Use the source, Luke!
iwbnwif
Super wx Problem Solver
Super wx Problem Solver
Posts: 282
Joined: Tue Mar 19, 2013 8:52 pm

Re: Error: trying to encode undefined Unicode character

Post by iwbnwif »

Good call DM!

Code: Select all

        wxString doc = stream.GetString();
        wxLogMessage ("%s", doc);
Works as expected, whilst ...

Code: Select all

        wxString doc = stream.GetString();
        wxLogMessage (doc);
... doesn't (asserts as described in OP).
It would have been interesting to see the content of a "bad" string.
With one of the working versions (std::cout or parameterised wxLogMessage), the string outputs exactly the same text as if you right click on the wxWidgets home page and select "View Page Source...".

I really wanted to debug which character was causing the problem but there doesn't seem to be a simple way to do this. I am not sure if it is because wxLogMessage uses macros or there are no debug symbols (I am pretty sure I built the debug version, but could try again).

There appear to be 16563 characters in the current version of the wxWidgets home page, but the wxString::size() reports 16547.

Anyway, I tried the following:

Code: Select all

for (size_t i = 7000; i < 8000; i++)
        {
            wxString slice = doc.Mid(i - 20, 20);
            wxLogMessage (doc.Left(i));
        }
Which asserts (repeatedly) when i == 7397, and the sliced string looks like this:
") but remains 100% c"
Another thing, if I make a string by pasting the complete page source as initialisation:

Code: Select all

wxString page_source = "<!DOCTYPE html><html><head><meta charset=\"utf-8\"><title>wxWidgets: Cross-Platform GUI ... " 
(obviously truncated in the above sample) then it also works using both versions of wxLogMessage in the snippets above. So the cut / paste / escape quotes process also 'cleans' the string somehow.
wxWidgets 3.1.2, MinGW64 8.1.0, g++ 8.1.0, Ubuntu 19.04, Windows 10, CodeLite + wxCrafter
Some people, when confronted with a GUI problem, think "I know, I'll use Eclipse RCP". Now they have two problems.
User avatar
doublemax
Moderator
Moderator
Posts: 19116
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: Error: trying to encode undefined Unicode character

Post by doublemax »

Escaping the % chars should also work.

Code: Select all

doc.Replace("%", "%%");
Use the source, Luke!
iwbnwif
Super wx Problem Solver
Super wx Problem Solver
Posts: 282
Joined: Tue Mar 19, 2013 8:52 pm

Re: Error: trying to encode undefined Unicode character

Post by iwbnwif »

Yes, that was it!

Of course, it is obvious now. This asserts:

Code: Select all

wxLogMessage ("but remains 100% compatible");
But this is fine:

Code: Select all

wxLogMessage ("but remains 100%% compatible");
Many thanks for your help =D> I only wish the assert message could be a little less cryptic!
wxWidgets 3.1.2, MinGW64 8.1.0, g++ 8.1.0, Ubuntu 19.04, Windows 10, CodeLite + wxCrafter
Some people, when confronted with a GUI problem, think "I know, I'll use Eclipse RCP". Now they have two problems.
Post Reply