Conversion to UTF8 multibyte Topic is solved

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
chris
I live to help wx-kind
I live to help wx-kind
Posts: 150
Joined: Fri Oct 08, 2004 2:05 pm
Location: Europe

Conversion to UTF8 multibyte

Post by chris » Sun Dec 11, 2005 1:16 pm

Hi all,

I need to convert a wxString to UTF8 multibyte regardless the input encoding/the way wx was compiled (Unicode or ANSI mode) and vice versa.
Having tried several things that all failed I really need someone holding my hand with this. :wink:

The last try looks like this:

Code: Select all

const char * wxStringToUTF8MB(const wxString &str){
#if wxUSE_UNICODE
	return wxConvUTF8.cWX2MB(str.c_str());
#else
	return wxConvUTF8.cWC2MB(wxConvLocal.cWX2WC(str.c_str()));
#endif
}

wxString UTF8MBTowxString(const char *str){

	wxString ret;

#if wxUSE_UNICODE
        wxWCharBuffer buffer(wxConvUTF8.cMB2WX(str));
#else
        wxCharBuffer buffer(wxConvLocal.cWC2WX(wxConvUTF8.cMB2WC(str)));
#endif
	ret=buffer;
	return ret;
}
The problem is that wxStringToUTF8MB() always returns an empty char array, in both Unicode and ANSI builds.
I think a stared too long on the various topics/documentation on Unicode issues and miss the obvious.
Can you guys help me?

TIA, Chris
this->signature=NULL;

leio
Can't get richer than this
Can't get richer than this
Posts: 802
Joined: Mon Dec 27, 2004 10:46 am
Location: Estonia, Tallinn
Contact:

Post by leio » Sun Dec 11, 2005 2:53 pm

wxGTK_CONV macro in the wxGTK sources converts a wxString to a UTF-8 multibyte string, to be suitable for GTK+.
wxGTK_CONV_BACK macro in the wxGTK sources converts a multibyte UTF-8 string to a wxString, for strings gotten out of GTK+.

The code in include/wx/gtk/private.h is as follows:

Code: Select all

#ifdef __WXGTK20__
#if wxUSE_UNICODE
    #define wxGTK_CONV(s) wxConvUTF8.cWX2MB(s)
    #define wxGTK_CONV_BACK(s) wxConvUTF8.cMB2WX(s)
#else
    #define wxGTK_CONV(s) wxConvUTF8.cWC2MB( wxConvLocal.cWX2WC(s) )
    #define wxGTK_CONV_BACK(s)  wxConvLocal.cWC2WX( (wxConvUTF8.cMB2WC( s ) ) )
#endif
#else
    #define wxGTK_CONV(s) s.c_str()
    #define wxGTK_CONV_BACK(s) s
#endif
So you might want this:

Code: Select all

const char * wxStringToUTF8MB(const wxString &str){
#if wxUSE_UNICODE
        return wxConvUTF8.cWX2MB(str);
#else
        return wxConvUTF8.cWC2MB( wxConvLocal.cWX2WC(str) );
#endif
}
Not sure if the return type can be const. If it can, then it'd be nice as const ;)
Compilers: gcc-3.3.6, gcc-3.4.5, gcc-4.0.2, gcc-4.1.0 and MSVC6
OS's: Gentoo Linux, WinXP; WX: CVS HEAD

Project Manager of wxMUD - http://wxmud.sf.net/
Developer of wxGTK;
gtk+ port maintainer of OMGUI - http://www.omgui.org/

chris
I live to help wx-kind
I live to help wx-kind
Posts: 150
Joined: Fri Oct 08, 2004 2:05 pm
Location: Europe

Post by chris » Tue Dec 13, 2005 10:35 am

Hi leio,

Thanks for your reply.

This is essentially what I was doing (though without the explict call to c_str()), but unfortunately still doesn't work when used as a function.
Using the macros you posted everything is fine.
The problem seems to be that a wxCharBuffer is returned by wxConvUTF8. That local object loses scope when the method returns and the char* points to some undefined memory; in my case it was always zeroed.

So if anybody else tries to do something like this:

Use the macros posted by leio,
or if you must use methods, copy the result to some buffer you allocated yourself.

Thanks, Chris
Last edited by chris on Tue Dec 13, 2005 2:35 pm, edited 1 time in total.
this->signature=NULL;

jb_coder
Super wx Problem Solver
Super wx Problem Solver
Posts: 267
Joined: Mon Oct 18, 2004 10:55 am

Post by jb_coder » Tue Dec 13, 2005 12:37 pm

Here are some links that might be of use:

"wxMBConv classes overview": http://www.wxwidgets.org/manuals/2.6.2/ ... onvclasses

wxMBConv: http://www.wxwidgets.org/manuals/2.6.2/wx_wxmbconv.html

The important thing to notice is that the wxMBConv documentation for the cMB2WC states that it allocates "a temporary wxCharBuffer to hold the result". Two possible solution are the following:

1) Assign the wxCharBuffer returned from cMB2WC to a local variable rather than trying to act directly on the buffer returned.
2) Use MB2WC directly and provide your own buffer.

If you need any sample code, here are some links to the wxCode CVS where UTF-8 conversion is used:

wxSQLite3:
http://cvs.sourceforge.net/viewcvs.py/w ... &view=auto

DatabaseLayer:
http://cvs.sourceforge.net/viewcvs.py/w ... &view=auto

wxSpellChecker:
http://cvs.sourceforge.net/viewcvs.py/w ... &view=auto

Just search in those pages for UTF-8 and you should be able to find it.

chris
I live to help wx-kind
I live to help wx-kind
Posts: 150
Joined: Fri Oct 08, 2004 2:05 pm
Location: Europe

Post by chris » Tue Dec 13, 2005 2:33 pm

Thanks for your reply jb_coder.

Once I figured out I'm working on a temporary buffer it was clear how to continue from there.
A shame that I missed that passage from the docs, thought I read them carefully for sure :wink:

I'll give you the Accepted Answer points because leio already sits on a load of them, so you have a better use for them I guess :D

Thanks again to you two for your help,

Chris
this->signature=NULL;

Post Reply