Arabic encoding in char* to wxString

Posted: Tue May 27, 2008 2:24 pm
by Rakan

Developing a server application that will receive english/arabic text.

When i receive the buffer i convert it to utf8 using
wxString str = wxString::FromUTF8(buff);
wxString str(buff,wxConvUTF8);

Both ways produce an empty string object. I am unable to figure out whats wrong as i thought my wxWidgets unicode solves such problems.

What can i do to save unicode arabic pointer to string into a wxString?


Posted: Tue May 27, 2008 2:43 pm
by doublemax
the lines you posted create a wxString *from* a utf-8 encoded buffer. If the string is empty, the buffer probably didn't contain valid utf-8.

That's what you should check first.

Posted: Tue May 27, 2008 2:52 pm
by Rakan
I tried sending english text, the code shows text received while sending arabic text doesnt

unsigned char len = 0;
        char* buf = new char[1024];

        //read size
        len = client->LastCount();
        if(len > 0)
            buf[len] = '\0';
            //make sure the string ends with \n to process it;
            wxString str(buf,wxConvUTF8);
            txtMsgs->AppendText(_("Text is: ") + str + _("\n"));
                //clear wasted 512 bytes memory
                delete buf;
                //create new memory according to length of data received
                buf = new char[len*sizeof(char)];
                buf[client->LastCount()] = '\0';
                wxString str = wxString::FromUTF8(buf);
                delete buf;
                txtMsgs->AppendText(_("Data received: ") + str);

Posted: Tue May 27, 2008 3:04 pm
by doublemax
the interesting part of the code would be where you convert your string to utf8 and send it.

Posted: Tue May 27, 2008 3:38 pm
by Rakan

wxString str(buf,wxConvUTF8);
Nothing sent back either.

I have added this code to see contents of buffer

 for (int x = 0; x < len; x++){ a.Printf(_("%d\n"), buf[x]); txtMsgs->AppendText(a);}
here is the code's output:

37777777731 37777777730 37777777661 37777777730 37777777655 37777777730 37777777650 37777777730 37777777647 15 12

Posted: Tue May 27, 2008 3:58 pm
by doublemax

wxString str(buf,wxConvUTF8);
this code does not convert your string to utf8. It creates a wxString from a buffer with utf-8 encoded data.

you'd need something like this:

wxString s(wxT("üöäÜÖÄ"));
wxCharBuffer buf=s.mb_str(wxConvUTF8);
client->Write(, strlen(;
'+1' to include the trailing 0-byte.

Posted: Tue May 27, 2008 8:49 pm
by Rakan
Hello doublemax

Thank you for your contribution, but your code deals with constant char* while my code is a variable. you converted the constant char* to a wxString but in my case converting variable value to wxString which is my main problem, i was unable to convert char* to wxString while preserving encoding in the first place.

Can you replace your constant value with a char* variable?

Posted: Tue May 27, 2008 10:05 pm
by doublemax
the literal string i created was just an example. You can convert any wxString to utf-8 this way.

For example, you could also take the value from a wxTextCtrl:

// assumes wxTextCtrl *m_textctrl;
// take any wxString
wxString s=m_textctrl->GetValue();

// convert it to a utf-8 encoded bytestream
wxCharBuffer buf=s.mb_str(wxConvUTF8);

// send it over a network
client->Write(, strlen(;
However, i'm not really sure you understand what the conversion actually does and why it is needed (sorry if i'm wrong).

You should look at wxString as a "black box" that stores a string in a unicode-aware way (always assuming using a unicode build of wxwidgets). You should not worry about how wxString stores its data internally.

But when sending strings over a network, you might want to convert it into a "standard" format that any other computer can understand, even if it runs a different operating system on a different cpu. To convert the string to UTF-8 is one way of doing that.

So for sending over a network, you convert your string to utf-8 and on the receiving side, you create a wxString from the utf-8 data.

E.g. the literal string i used before: "üöäÜÖÄ"
When converted to utf-8, it would be represented by the following bytes:

unsigned char utf8buf[]={
  0xc3, 0xbc,
  0xc3, 0xb6, 
  0xc3, 0xa4, 
  0xc3, 0x9c,
  0xc3, 0x96, 
  0xc3, 0x84,

// create wxString from utf8 buffer:
 wxString s((const char*)utf8buf, wxConvUTF8);

Posted: Fri May 30, 2008 12:36 pm
by Rakan
Thanks doublemax,

It turned out that using telnet to connect to the server was the problem. I tried netcat and arabic text was received correctly.

Thanks alot