Page 1 of 1
Arabic encoding in char* to wxString
Posted: Tue May 27, 2008 2:24 pm
by Rakan
Hello,
Developing a server application that will receive english/arabic text.
When i receive the buffer i convert it to utf8 using
wxString str = wxString::FromUTF8(buff);
or
wxString str(buff,wxConvUTF8);
Both ways produce an empty string object. I am unable to figure out whats wrong as i thought my wxWidgets unicode solves such problems.
What can i do to save unicode arabic pointer to string into a wxString?
Thanks,
Rakan
Posted: Tue May 27, 2008 2:43 pm
by doublemax
the lines you posted create a wxString *from* a utf-8 encoded buffer. If the string is empty, the buffer probably didn't contain valid utf-8.
That's what you should check first.
Posted: Tue May 27, 2008 2:52 pm
by Rakan
I tried sending english text, the code shows text received while sending arabic text doesnt
Code: Select all
unsigned char len = 0;
char* buf = new char[1024];
//read size
client->Peek(buf,1024);
len = client->LastCount();
if(len > 0)
{
buf[len] = '\0';
//make sure the string ends with \n to process it;
wxString str(buf,wxConvUTF8);
txtMsgs->AppendText(_("Text is: ") + str + _("\n"));
if(str.EndsWith(_("\n")))
{
//clear wasted 512 bytes memory
delete buf;
//create new memory according to length of data received
buf = new char[len*sizeof(char)];
client->Read(buf,(len*sizeof(char)));
buf[client->LastCount()] = '\0';
wxString str = wxString::FromUTF8(buf);
client->Write(buf,client->LastCount());
delete buf;
txtMsgs->AppendText(_("Data received: ") + str);
//ParseProtocol(str);
}
}
Posted: Tue May 27, 2008 3:04 pm
by doublemax
the interesting part of the code would be where you convert your string to utf8 and send it.
Posted: Tue May 27, 2008 3:38 pm
by Rakan
Code: Select all
wxString str(buf,wxConvUTF8);
client->Write(str.c_str(),str.Length());
Nothing sent back either.
I have added this code to see contents of buffer
Code: Select all
for (int x = 0; x < len; x++){ a.Printf(_("%d\n"), buf[x]); txtMsgs->AppendText(a);}
here is the code's output:
Code: Select all
37777777731 37777777730 37777777661 37777777730 37777777655 37777777730 37777777650 37777777730 37777777647 15 12
Posted: Tue May 27, 2008 3:58 pm
by doublemax
Code: Select all
wxString str(buf,wxConvUTF8);
client->Write(str.c_str(),str.Length());
this code does not convert your string to utf8. It creates a wxString from a buffer with utf-8 encoded data.
you'd need something like this:
Code: Select all
wxString s(wxT("üöäÜÖÄ"));
wxCharBuffer buf=s.mb_str(wxConvUTF8);
client->Write(buf.data(), strlen(buf.data())+1);
'+1' to include the trailing 0-byte.
Posted: Tue May 27, 2008 8:49 pm
by Rakan
Hello doublemax
Thank you for your contribution, but your code deals with constant char* while my code is a variable. you converted the constant char* to a wxString but in my case converting variable value to wxString which is my main problem, i was unable to convert char* to wxString while preserving encoding in the first place.
Can you replace your constant value with a char* variable?
Posted: Tue May 27, 2008 10:05 pm
by doublemax
the literal string i created was just an example. You can convert any wxString to utf-8 this way.
For example, you could also take the value from a wxTextCtrl:
Code: Select all
// assumes wxTextCtrl *m_textctrl;
// take any wxString
wxString s=m_textctrl->GetValue();
// convert it to a utf-8 encoded bytestream
wxCharBuffer buf=s.mb_str(wxConvUTF8);
// send it over a network
client->Write(buf.data(), strlen(buf.data())+1);
However, i'm not really sure you understand what the conversion actually does and why it is needed (sorry if i'm wrong).
You should look at wxString as a "black box" that stores a string in a unicode-aware way (always assuming using a unicode build of wxwidgets). You should not worry about how wxString stores its data internally.
But when sending strings over a network, you might want to convert it into a "standard" format that any other computer can understand, even if it runs a different operating system on a different cpu. To convert the string to UTF-8 is one way of doing that.
So for sending over a network, you convert your string to utf-8 and on the receiving side, you create a wxString from the utf-8 data.
E.g. the literal string i used before: "üöäÜÖÄ"
When converted to utf-8, it would be represented by the following bytes:
Code: Select all
unsigned char utf8buf[]={
0xc3, 0xbc,
0xc3, 0xb6,
0xc3, 0xa4,
0xc3, 0x9c,
0xc3, 0x96,
0xc3, 0x84,
0x00
};
// create wxString from utf8 buffer:
wxString s((const char*)utf8buf, wxConvUTF8);
http://en.wikipedia.org/wiki/Utf8
Posted: Fri May 30, 2008 12:36 pm
by Rakan
Thanks doublemax,
It turned out that using telnet to connect to the server was the problem. I tried netcat and arabic text was received correctly.
Thanks alot