Arabic encoding in char* to wxString Topic is solved

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
Rakan
Earned a small fee
Earned a small fee
Posts: 18
Joined: Wed Feb 27, 2008 11:00 pm

Arabic encoding in char* to wxString

Post by Rakan » Tue May 27, 2008 2:24 pm

Hello,

Developing a server application that will receive english/arabic text.

When i receive the buffer i convert it to utf8 using
wxString str = wxString::FromUTF8(buff);
or
wxString str(buff,wxConvUTF8);

Both ways produce an empty string object. I am unable to figure out whats wrong as i thought my wxWidgets unicode solves such problems.

What can i do to save unicode arabic pointer to string into a wxString?

Thanks,
Rakan
Rakan

User avatar
doublemax
Moderator
Moderator
Posts: 14187
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Post by doublemax » Tue May 27, 2008 2:43 pm

the lines you posted create a wxString *from* a utf-8 encoded buffer. If the string is empty, the buffer probably didn't contain valid utf-8.

That's what you should check first.
Use the source, Luke!

Rakan
Earned a small fee
Earned a small fee
Posts: 18
Joined: Wed Feb 27, 2008 11:00 pm

Post by Rakan » Tue May 27, 2008 2:52 pm

I tried sending english text, the code shows text received while sending arabic text doesnt

Code: Select all

unsigned char len = 0;
        char* buf = new char[1024];

        //read size
        client->Peek(buf,1024);
        len = client->LastCount();
        if(len > 0)
        {
            buf[len] = '\0';
            //make sure the string ends with \n to process it;
            wxString str(buf,wxConvUTF8);
            txtMsgs->AppendText(_("Text is: ") + str + _("\n"));
            if(str.EndsWith(_("\n")))
            {
                //clear wasted 512 bytes memory
                delete buf;
                //create new memory according to length of data received
                buf = new char[len*sizeof(char)];
                client->Read(buf,(len*sizeof(char)));
                buf[client->LastCount()] = '\0';
                wxString str = wxString::FromUTF8(buf);
                client->Write(buf,client->LastCount());
                delete buf;
                txtMsgs->AppendText(_("Data received: ") + str);
                //ParseProtocol(str);
            }
        }
Rakan

User avatar
doublemax
Moderator
Moderator
Posts: 14187
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Post by doublemax » Tue May 27, 2008 3:04 pm

the interesting part of the code would be where you convert your string to utf8 and send it.
Use the source, Luke!

Rakan
Earned a small fee
Earned a small fee
Posts: 18
Joined: Wed Feb 27, 2008 11:00 pm

Post by Rakan » Tue May 27, 2008 3:38 pm

Code: Select all

wxString str(buf,wxConvUTF8);
client->Write(str.c_str(),str.Length());
Nothing sent back either.

I have added this code to see contents of buffer

Code: Select all

 for (int x = 0; x < len; x++){ a.Printf(_("%d\n"), buf[x]); txtMsgs->AppendText(a);}
here is the code's output:

Code: Select all

37777777731 37777777730 37777777661 37777777730 37777777655 37777777730 37777777650 37777777730 37777777647 15 12
Rakan

User avatar
doublemax
Moderator
Moderator
Posts: 14187
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Post by doublemax » Tue May 27, 2008 3:58 pm

Code: Select all

wxString str(buf,wxConvUTF8);
client->Write(str.c_str(),str.Length());
this code does not convert your string to utf8. It creates a wxString from a buffer with utf-8 encoded data.

you'd need something like this:

Code: Select all

wxString s(wxT("üöäÜÖÄ"));
wxCharBuffer buf=s.mb_str(wxConvUTF8);
client->Write(buf.data(), strlen(buf.data())+1);
'+1' to include the trailing 0-byte.
Use the source, Luke!

Rakan
Earned a small fee
Earned a small fee
Posts: 18
Joined: Wed Feb 27, 2008 11:00 pm

Post by Rakan » Tue May 27, 2008 8:49 pm

Hello doublemax

Thank you for your contribution, but your code deals with constant char* while my code is a variable. you converted the constant char* to a wxString but in my case converting variable value to wxString which is my main problem, i was unable to convert char* to wxString while preserving encoding in the first place.

Can you replace your constant value with a char* variable?
Rakan

User avatar
doublemax
Moderator
Moderator
Posts: 14187
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Post by doublemax » Tue May 27, 2008 10:05 pm

the literal string i created was just an example. You can convert any wxString to utf-8 this way.

For example, you could also take the value from a wxTextCtrl:

Code: Select all

// assumes wxTextCtrl *m_textctrl;
// take any wxString
wxString s=m_textctrl->GetValue();

// convert it to a utf-8 encoded bytestream
wxCharBuffer buf=s.mb_str(wxConvUTF8);

// send it over a network
client->Write(buf.data(), strlen(buf.data())+1);
However, i'm not really sure you understand what the conversion actually does and why it is needed (sorry if i'm wrong).

You should look at wxString as a "black box" that stores a string in a unicode-aware way (always assuming using a unicode build of wxwidgets). You should not worry about how wxString stores its data internally.

But when sending strings over a network, you might want to convert it into a "standard" format that any other computer can understand, even if it runs a different operating system on a different cpu. To convert the string to UTF-8 is one way of doing that.

So for sending over a network, you convert your string to utf-8 and on the receiving side, you create a wxString from the utf-8 data.

E.g. the literal string i used before: "üöäÜÖÄ"
When converted to utf-8, it would be represented by the following bytes:

Code: Select all

unsigned char utf8buf[]={
  0xc3, 0xbc,
  0xc3, 0xb6, 
  0xc3, 0xa4, 
  0xc3, 0x9c,
  0xc3, 0x96, 
  0xc3, 0x84,
  0x00
};

// create wxString from utf8 buffer:
 wxString s((const char*)utf8buf, wxConvUTF8);
http://en.wikipedia.org/wiki/Utf8
Use the source, Luke!

Rakan
Earned a small fee
Earned a small fee
Posts: 18
Joined: Wed Feb 27, 2008 11:00 pm

Post by Rakan » Fri May 30, 2008 12:36 pm

Thanks doublemax,

It turned out that using telnet to connect to the server was the problem. I tried netcat and arabic text was received correctly.

Thanks alot
Rakan

Post Reply