writing/reading uft-8 from special chars to a file Topic is solved

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
mael15
Super wx Problem Solver
Super wx Problem Solver
Posts: 449
Joined: Fri May 22, 2009 8:52 am
Location: Bremen, Germany

writing/reading uft-8 from special chars to a file

Post by mael15 » Tue Jul 14, 2020 8:45 am

I am struggling to write a string containing special chars in UTF-8 to a file and read it back correctly. Problems start when the utf-8 string supposedly has the length of 8 but in the file 12 bytes are written.

Code: Select all

wxString testStr(wxT("abcdêü"));
wxString utf8Str = testStr.ToUTF8();
wxFile uft8File;
wxString filePath = wxGetUserHome() + wxT("\\Desktop\\uft8File.txt");

// write file
bool isOk = uft8File.Create(filePath, true);
isOk &= uft8File.Open(filePath, wxFile::read_write);

wxUint8 strLen = utf8Str.Length();	// 8 ?!?
uft8File.Write(&strLen, 1);
uft8File.Write(utf8Str);		// 12 bytes are written

uft8File.Close();

// read file
isOk &= uft8File.Open(filePath, wxFile::read);
char buf[20];
uft8File.Read(buf, 1);
strLen = buf[0];
uft8File.Read(buf, strLen);
wxString fromFile(buf[0], file.Length());
wxString fromUtf8 = wxString::FromUTF8(fromFile);
uft8File.Close();
The long term goal is to write special char strings in a xml structure. This test here is to understand utf-8 from special chars.

PB
Part Of The Furniture
Part Of The Furniture
Posts: 2566
Joined: Sun Jan 03, 2010 5:45 pm

Re: writing/reading uft-8 from special chars to a file

Post by PB » Tue Jul 14, 2020 10:54 am

I think your code has multiple issues, like this one

Code: Select all

wxString testStr(wxT("abcdêü"));
wxString utf8Str = testStr.ToUTF8();
wxString always uses the platform native UTF encoding (e.g., UTF-16 on MSW), so the above does not make any sense. Similar with this

Code: Select all

wxUint8 strLen = utf8Str.Length();	// 8 ?!?
and others. You can create a wxString from UTF-8 and you can get UTF-8 encoded char buffer from wxString but not UTF-8 encoded wxString.

If you want to write XML, I would recommend using an XML library, wxWidgets has wxXML which is supposed to work seamlessly with wxString without you caring about character encoding.

If you want to use plain text file, use wxTextFile where you make sure that you pass wxConvUTF8 to its Open and Write methods.

If you want to use binary files, see here for an example how to store a variable-length UTF-8-encoded wxString:
viewtopic.php?f=1&t=47221&p=199141#p199131

PB
Part Of The Furniture
Part Of The Furniture
Posts: 2566
Joined: Sun Jan 03, 2010 5:45 pm

Re: writing/reading uft-8 from special chars to a file

Post by PB » Tue Jul 14, 2020 1:44 pm

I found some code which I may have already posted here, perhaps it could help you understand how to deal with UTF-8

Code: Select all

#include <wx/wx.h>
#include <wx/ffile.h>

class MyApp : public wxApp
{
public:
    bool OnInit() override
    {
        // nihongo in kanji
        const char* UTF8literal = "\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e";
        const wxString filePath = "uft8File.txt";

        wxString outStr = wxString::FromUTF8(UTF8literal);
        wxString inStr;
        wxFFile  utf8File;
        size_t   utf8FileSize = 0;

        // create and write a string to file
        if  ( !utf8File.Open(filePath, "w")
               || !utf8File.Write(outStr, wxConvUTF8) )
        {
            return false;
        }
        utf8File.Close();

        // read a string from file
        if  ( !utf8File.Open(filePath, "r")
               || !utf8File.ReadAll(&inStr, wxConvUTF8) )
        {
            return false;
        }
        utf8FileSize = static_cast<size_t>(utf8File.Length());
        utf8File.Close();

        wxLogMessage("outStr: '%s' (length %zu, size in bytes %zu)\n"
                     "inStr: '%s' (length %zu, size in bytes %zu)\n"
                     "UTF-8 string literal size in bytes: %zu\n"
                     "utf8File size in bytes: %zu",
                     outStr, outStr.size(), outStr.size() * sizeof(wxStringCharType),
                     inStr, inStr.size(), inStr.size() * sizeof(wxStringCharType),
                     outStr.ToUTF8().length(),
                     utf8FileSize);

        return true;
    }
}; wxIMPLEMENT_APP(MyApp);
utf8file.png
utf8file.png (6.57 KiB) Viewed 212 times
Just a reminder, UTF-16 encoded wxString (used e.g. on MSW, macOS, or Qt) does not work properly with characters outside the Basic Multilingual Plane (i.e., not fitting into a 16-bit wchar_t), since it does not support surrogate pairs.

mael15
Super wx Problem Solver
Super wx Problem Solver
Posts: 449
Joined: Fri May 22, 2009 8:52 am
Location: Bremen, Germany

Re: writing/reading uft-8 from special chars to a file

Post by mael15 » Thu Jul 16, 2020 5:45 pm

That was really helpful, thanx! I made some simple tests and it took some time but works now. I use libxml2 but had some old unnecessarily complicated details to clean up. It is surprisingly complicated how strings are converted and how to save, read and reverse it.

Post Reply