doublemax wrote: ↑Tue Jul 09, 2019 11:19 am
"Jürgen" becomes "Jürgen" which to my knowledge means, that I used UTF8 encoding on that string twice.
No. "ü" is the UTF-8 encoded version of "ü".
Yes, it is. But if I encode this UTF8 string again with ToUTF8, I get ""Jürgen". And the third time its "Jürgen". So the "junk" is increasing every time I encode that multibyte char again. And I wonder, if "Jürgen" is supposed to be UTF8 or if it rather is an UTF8 string, encoded again as UTF8.
Because if it really is UTF8, it is neither readable, nor does it suffice on my webserver, which expects UTF8, but cant read "Jürgen".
doublemax wrote: ↑Tue Jul 09, 2019 11:19 am
I didn't read the whole thread from Google groups, but i think it's about string literals, which is a different issue.
Ah yes, that would make a whole lot of sence. String literals should use the system locale.
doublemax wrote: ↑Tue Jul 09, 2019 11:19 am
How can I make sure I send and receive UTF8 and what encoding does my wxString have, if I provide none? Can I check that?
On the receiving side, just try to decode the byte array using wxString::FromUTF8. Is the result string is not empty, there's a very high change, it was UTF8 encoded.
Maybe this post helps:
viewtopic.php?t=19565:
Code: Select all
wxString str(buf,wxConvUTF8);
client->Write(str.c_str(),str.Length());
this code does not convert your string to utf8. It creates a wxString from a buffer with utf-8 encoded data.
you'd need something like this:
Code: Select all
wxString s(wxT("üöäÜÖÄ"));
wxCharBuffer buf=s.mb_str(wxConvUTF8);
client->Write(buf.data(), strlen(buf.data())+1);
'+1' to include the trailing 0-byte.
So the UTF8 flag in wxString contructor means READING a utf8 encoded buffer, while the same flag in .mb_str() means WRITING with the encoding? And wxString::ToUTF8() should fullfill the same purpose, right?
The receiving webserver is in PHP, so I cant pass it to FromUTF8. It isnt even developed by me :-/
You should look at wxString as a "black box" that stores a string in a unicode-aware way (always assuming using a unicode build of wxwidgets). You should not worry about how wxString stores its data internally.
But when sending strings over a network, you might want to convert it into a "standard" format that any other computer can understand, even if it runs a different operating system on a different cpu. To convert the string to UTF-8 is one way of doing that.
So for sending over a network, you convert your string to utf-8 and on the receiving side, you create a wxString from the utf-8 data.
To my experience, the encoding I use does matter. (See my Jürgen example above which actually changes the black box buffer) According to this, when I just pass along a wxString, say as a REST POST body, I would get it formatted with my system locale? And if I send it with mb_str(wxConvUTF8) I can be sure that it is UTF8 as desired? But how come than, that I can convert a string multiple times to UTF8 and it visually changes?
Best
Natu