wxWidgets 3.1.1 and german letters

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Natulux
Experienced Solver
Experienced Solver
Posts: 95
Joined: Thu Aug 03, 2017 12:20 pm

wxWidgets 3.1.1 and german letters

Postby Natulux » Thu May 03, 2018 12:20 pm

Hey everyone,

I was looking into the new version of wxWidgets (v3.1.1) and was thrilled to use it.
My project was created under:
prerelease version of wxWidgets 3.1.1 and visual studio 2008
(in-between version 3.1.0 and 3.1.1 from github, pulled the dev branch on 9 Nov 2017, after GSOC2017 addition to the code and built by me with visual studio 2008)

and I was able to compile it under
wxWidgets 3.1.1 and visual studio 2017
(wxWidgets 3.1.1 downloaded package from github, built by me with Visual Studio 2017)


But the first MessageBox with german letters threw me off, having problems to show non Ascii letters (german umlaut " Ä " to be precise).

Code: Select all

wxMessageBox("Es liegen ungespeicherte Änderungen vor. Sollen die Daten in der Vorlage gespeichert werden?", "Vorlage speichern?", wxYES_NO|wxICON_QUESTION, this);

I also tried encapsulating like this:

Code: Select all

wxMessageBox(wxT("Es liegen ungespeicherte Änderungen vor. Sollen die Daten in der Vorlage gespeichert werden?"), "Vorlage speichern?", wxYES_NO|wxICON_QUESTION, this);

or this

Code: Select all

wxString s = "Es liegen ungespeicherte Änderungen vor. Sollen die Daten in der Vorlage gespeichert werden?";
wxMessageBox(s, "Vorlage speichern?", wxYES_NO|wxICON_QUESTION, this);

But none was able to display unicode letters correctly. This is no general error though, the same project is able to write, save and load these letters in a textfield. Then again, a custom Dialog has the same error.

wxWidgets is built in unicode naturally. Any ideas?

Best
Natu
Attachments
Umlaut_problem.png
Umlaut_problem.png (7.03 KiB) Viewed 299 times

User avatar
doublemax
Moderator
Moderator
Posts: 12206
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: wxWidgets 3.1.1 and german letters

Postby doublemax » Thu May 03, 2018 12:41 pm

At least the version with wxT() should have worked. Looking at the screenshot, it seems the source file is encoded with UTF-8. Try a local encoding instead (Codepage 1252). If that works, check if it also displays the correct text on a non-german system.

Also: Do you have the "classic theme" enabled on your system? If not, the screenshot look like the executable is missing a manifest. (Although i don't know if that has any effect on text encodings, probably not).
Use the source, Luke!

Natulux
Experienced Solver
Experienced Solver
Posts: 95
Joined: Thu Aug 03, 2017 12:20 pm

Re: wxWidgets 3.1.1 and german letters

Postby Natulux » Thu May 03, 2018 1:45 pm

Hey Max,

doublemax wrote:At least the version with wxT() should have worked. Looking at the screenshot, it seems the source file is encoded with UTF-8. Try a local encoding instead (Codepage 1252). If that works, check if it also displays the correct text on a non-german system.

You might be right. But I can't say why. The project settings are set to use the "unicode charset" (see pic "charset").

To define a font encoding directly also doesn't work:

Code: Select all

//umlaut_problem_1
wxMessageBox(wxString("Es liegen ungespeicherte Änderungen vor. Sollen die Daten in der Vorlage gespeichert werden?", wxFONTENCODING_CP1252), "Vorlage speichern?", wxYES_NO|wxICON_QUESTION, this);


What actually worked though:

Code: Select all

//no prob
wxMessageBox(wxString("Es liegen ungespeicherte Änderungen vor. Sollen die Daten in der Vorlage gespeichert werden?", wxConvAuto()), "Vorlage speichern?", wxYES_NO|wxICON_QUESTION, this);


But its tedious to apply this on every string.. ;-)

doublemax wrote:Also: Do you have the "classic theme" enabled on your system? If not, the screenshot look like the executable is missing a manifest. (Although i don't know if that has any effect on text encodings, probably not).

No classic theme, but there was indeed no auto manifest generated. I enabled the manifest, and you see the difference - AFAICS it has nothing to do with the charset used, though.

Thanks
Natu
Attachments
no_prob.png
no_prob.png (3.48 KiB) Viewed 287 times
umlaut_problem_1.png
umlaut_problem_1.png (2.51 KiB) Viewed 287 times
charset.png
charset.png (4.38 KiB) Viewed 287 times

User avatar
doublemax
Moderator
Moderator
Posts: 12206
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: wxWidgets 3.1.1 and german letters

Postby doublemax » Thu May 03, 2018 2:39 pm

No, don't change any project settings. I was just talking about the encoding the particular source file is saved with. VS never uses UTF-8 by default (i don't know about VS2017 though), so i assume you created it in a different editor?

Take that particular line of code, put it into the "minimal" sample in the samples folder and build it, i'm sure the text will appear correctly.
Use the source, Luke!

Natulux
Experienced Solver
Experienced Solver
Posts: 95
Joined: Thu Aug 03, 2017 12:20 pm

Re: wxWidgets 3.1.1 and german letters

Postby Natulux » Thu May 03, 2018 2:55 pm

doublemax wrote:Take that particular line of code, put it into the "minimal" sample in the samples folder and build it, i'm sure the text will appear correctly.


Actually no, it doesn't. Compiling the minimal sample with vs2017 is alright, but the MessageBox with unicode letters in the text is displayed just the same as in the other project.
You are right though, the other project was set up in another studio (2008), which does not seem to be the main problem here, though.

EDIT: Tomorrow, I'll try compiling wxWidgets 3.1.1 with vs2015 instead to rule the studio out. (Haven't used the new studio alot, vs2015 is fine by my experience). Have a nice evening!

PB
Part Of The Furniture
Part Of The Furniture
Posts: 1519
Joined: Sun Jan 03, 2010 5:45 pm

Re: wxWidgets 3.1.1 and german letters

Postby PB » Thu May 03, 2018 3:20 pm

FWIW, it seems OK with MSVC 2017 Express 15.6.7 and wxWidgets master for me, regardless of using wxT()

Code: Select all

wxMessageBox("Es liegen ungespeicherte Änderungen vor. Sollen die Daten in der Vorlage gespeichert werden?", "Vorlage speichern?", wxYES_NO|wxICON_QUESTION, this);

de.png
de.png (3.74 KiB) Viewed 274 times


Did not matter what encoding I saved the .cpp with, tried the default (whatever it was) and Central Europe (1250) as well as Western Europe (1252).

TBH, I have always believed that using anything outside 7-bit ASCII directly in source files is quite brittle and to be avoided (best with _() and message catalogs).

Manolo
Ultimate wxWidgets Guru
Ultimate wxWidgets Guru
Posts: 558
Joined: Mon Apr 30, 2012 11:07 pm

Re: wxWidgets 3.1.1 and german letters

Postby Manolo » Thu May 03, 2018 3:31 pm

http://docs.wxwidgets.org/trunk/overvie ... rt_default
Pay attention to the "L" before the rest of the text.

The point is the encoding used by the compiler. It must match the one you use for your text.
For a general case, as PB pointed out, is write all texts in ASCII 7-bit, enclose them by _() macro and use catalogs.

PB
Part Of The Furniture
Part Of The Furniture
Posts: 1519
Joined: Sun Jan 03, 2010 5:45 pm

Re: wxWidgets 3.1.1 and german letters

Postby PB » Thu May 03, 2018 3:42 pm

I forgot to write that the string actually does not contain any "true" Unicode characters, all are well in the 8-bit range. The umlauted A is a part of the standard ASCII charset (0xC4).

The only way this could have failed is perhaps wrong interpretation of the string literal, parsing it as if it were variable-length encoded. For example Ä = 0xC4 = 1100 0100, in UTF-8 1100 may be interpreted as the beginning of Byte 1 of a (n invalid) 2-byte code point, see https://en.wikipedia.org/wiki/UTF-8#Description. Are you sure you did not set some MSVC-wide source and executable charsets to UTF-8 or something?

Natulux
Experienced Solver
Experienced Solver
Posts: 95
Joined: Thu Aug 03, 2017 12:20 pm

Re: wxWidgets 3.1.1 and german letters

Postby Natulux » Fri May 04, 2018 7:03 am

Thanks for checking, PB. It seems like I have some issues with my setting here. Good to know it's not an general error that slipped in here.
Actually, since I haven't used Visual Studio 2017 much yet, I am not sure if there is something set of which I don't know.

I can just tell, that this problem never occurred with vs2008 or vs2015 to me before, so Im guessing, that it is some weird thing between vs2017 and wxWidgets3.1.1 in my installation.

Manolo, PB - using a _() macro would indeed be a good choice of design, at the latest when I am going to support multiple languages. Unfortunately this is not the case yet and I need to get going with what I have.

Next step: Compiling with vs2015

[EDIT]: Didn't help. vs2015 and vs2017 both have problems displaying with the new widgets.
I also checked out the new master from github, which is again a little newer than the release I downloaded. Same issue.

I guess I'll keep to wxWidgets 3.1.0 then and take a peek once in a while to see if something changed.
Strange thing though.

Thanks
Natu

Natulux
Experienced Solver
Experienced Solver
Posts: 95
Joined: Thu Aug 03, 2017 12:20 pm

Re: wxWidgets 3.1.1 and german letters

Postby Natulux » Fri May 04, 2018 9:04 am

doublemax wrote:No, don't change any project settings. I was just talking about the encoding the particular source file is saved with. VS never uses UTF-8 by default (i don't know about VS2017 though), so i assume you created it in a different editor?

Take that particular line of code, put it into the "minimal" sample in the samples folder and build it, i'm sure the text will appear correctly.


I re-read your answers and checked the encoding of the source files in the old installation (ANSI) against the encoding in the new widgets installation (UTF-8 without BOM). So you were right, I just didn't understand that this could be the case at all. ;-)

But... even the sample files of wxWidgets 3.1.1 and 3.1.1+ (github master) are encoded in UTF-8 without BOM, even before I build it.

All the best
Natu

[EDIT]:
Actually, vs2017 seems to have changed the default encoding to UTF8 (or even UTF16 for new projects) as other users experienced before:
https://developercommunity.visualstudio.com/content/problem/169566/visual-studio-2017-creates-utf-16-source-code-file.html
Last edited by Natulux on Fri May 04, 2018 11:12 am, edited 1 time in total.

PB
Part Of The Furniture
Part Of The Furniture
Posts: 1519
Joined: Sun Jan 03, 2010 5:45 pm

Re: wxWidgets 3.1.1 and german letters

Postby PB » Fri May 04, 2018 10:51 am

Natulux wrote:But... even the sample files of wxWidgets 3.1.1 and 3.1.1+ (github master) are encoded in UTF-8 without BOM, even before I build it.


What do you mean, all 7-bit ASCII characters are valid UTF-8 characters and I do not think wxWidget samples use any character literals outside this range. What made you believe they are UTF-8 files?

Natulux
Experienced Solver
Experienced Solver
Posts: 95
Joined: Thu Aug 03, 2017 12:20 pm

Re: wxWidgets 3.1.1 and german letters

Postby Natulux » Fri May 04, 2018 11:14 am

PB wrote:
Natulux wrote:But... even the sample files of wxWidgets 3.1.1 and 3.1.1+ (github master) are encoded in UTF-8 without BOM, even before I build it.


What do you mean, all 7-bit ASCII characters are valid UTF-8 characters and I do not think wxWidget samples use any character literals outside this range. What made you believe they are UTF-8 files?


I opened the "minimal.cpp" in Notepad++ and checked the encoding. When comparing widgets3.1.0 and 3.1.1, I have Utf-8 without BOM shown for the new files, while my old files show Ansi.

[EDIT]: Mind you, they are UTF8 right from the get go. I donwloaded the "wxWidgets-3.1.1.zip" from
https://github.com/wxWidgets/wxWidgets/releases/tag/v3.1.1
and checked the minimal.cpp from the sample with Notepad++ - it is shown as UTF8 without BOM.

PB
Part Of The Furniture
Part Of The Furniture
Posts: 1519
Joined: Sun Jan 03, 2010 5:45 pm

Re: wxWidgets 3.1.1 and german letters

Postby PB » Fri May 04, 2018 11:44 am

What I was saying: If a file does not contain a BOM and contains only 7-bit ASCII it is a valid UTF-8 file. However, it still still is an ANSI file. It would stop being a valid ANSI file once it contained a valid multibyte UTF-8 code point, i.e. its content would be interpreted differently with UTF-8 and ANSI.

For example I just created a simple text file in Notepad (Windows 7) containing "abcdef" and saved it with ANSI encoding. When I opened this file in Notepad++ (v6.8.8), it has UTF-8 without BOM checked in the Encoding menu.

OTOH, when I added Ä to the file, Notepad++ says it is ANSI: 0xC4 is not a valid UTF-8 codepoint so the safe guess in the absence of the BOM is ANSI.

The commands in the first part of the Encoding menu in Notepad++ say "Encode xxx" and not "Encoding is xxx", you can select any of them to tell the Notepad++ to interpret the file as if it was using this encoding. Notepad++ selects one of the encodings upon opening the file by guessing from its content (assuming the file has no BOM), see above. The commands in the other part of the menu allow you to convert between the encodings...

I hope it became a bit clearer....

Manolo
Ultimate wxWidgets Guru
Ultimate wxWidgets Guru
Posts: 558
Joined: Mon Apr 30, 2012 11:07 pm

Re: wxWidgets 3.1.1 and german letters

Postby Manolo » Fri May 04, 2018 5:52 pm

Just to clarify:
UTF-8 was designed so the first 128 characters were the same as ASCII. In other words, 7 bits at right are the same in both encodings. The most significant bit is used differently. UTF-8 uses it as "control bit", while ASCII uses it for "extended ASCII" or "Code Pages" allowing some extra characters for the current locale.

wxWidgets only uses ASCII, even for the "internat" sample. Because it's the same as first UTF-8 characters Notepad++ and other editors may recognize it as UTF-8.

Have you tried prepending "L" as I suggested?

Natulux
Experienced Solver
Experienced Solver
Posts: 95
Joined: Thu Aug 03, 2017 12:20 pm

Re: wxWidgets 3.1.1 and german letters

Postby Natulux » Wed May 16, 2018 2:14 pm

Sry guys, I kinda lost track on this a little.

Thanks for your explanations. I see that there is more to know about this than I can appreciate. ;-)

PB wrote:The commands in the first part of the Encoding menu in Notepad++ say "Encode xxx" and not "Encoding is xxx", you can select any of them to tell the Notepad++ to interpret the file as if it was using this encoding.

I see your point here. Evaluating like that doesn't seem to be a reliable choice then.

Manolo wrote:Have you tried prepending "L" as I suggested?

Yes, I tried those macros, all I could remember in fact.

Code: Select all

L
_T()
wxT()

None of them worked. Only converting the string worked so far:

Code: Select all

wxString("Ääää", wxConvAuto())


There got to be some changes compared to widgets prior to 3.1.1 - because all of it predecessors work for me without any changes or macros.

Best
Natu


Return to “C++ Development”

Who is online

Users browsing this forum: No registered users and 5 guests