But if in the file I have only 1 byte for character and it writes 4 bytes for the word "test", it will lose information and I can't have a portable file for several languages.
This is a little funky to explain if you're not already familiar with it.
UTF-8 characters are variable size. This means that one wxChar does not necessarily equate to one character. In UTF-8, ASCII characters (such as a-z, A-Z, 0-9, etc) are all represented as 1 byte each, however other characters may need two, three or even four bytes. wxString hides all this from you, usually, so you don't have to worry about it. However when serializing to/from a file, it's important to know the difference.
Rather than attempt to explain exactly how UTF-8 works, I'll link to the wikipedia article here
which outlines the basic idea.
Now, it's not really all that important to understand all that. What IS important to understand is that 8 wxChar's does not necessarily mean 8 characters. Depending on the characters, it could be as little as 2.
capable of representing any Unicode character, so you don't have to worry about losing information when storing strings as UTF-8... it retains all information just fine.
As an example, for the word "colecção" this doesn't works. It creates an empty string on the line: wxString str(buf,wxConvUTF8,len);
Strange, it works fine for me. Here's a quick and sloppy test program I whipped up. This program outputs the expected "colecção" text in the messagebox in Unicode Build, but outputs nonsense in ANSI build (possibly because wxMessageBox can't print Unicode chars when not in Unicode build?).
Code: Select all
class App : public wxApp
char* buf = new char;
"test.bin" is a file which just contains "colecção" in UTF-8 encoding:
63 6F 6C 65 63 C3 A7 C3 A3 6F
Since it's 10 bytes long, I just used a fixed value of 10 in my code rather than trying to determine the length some other way. Note the difference, here -- it's 10 bytes even though it's only 8 characters. When writing your string to the file, make sure you write how many bytes
long the string is, and not how many characters. Either that, or just use a null terminator.