Must I use UNICODE in my code?

RobertWebb · Post by **RobertWebb** » Sat Nov 11, 2017 4:10 am

As I understand it wxWidgets supports both wide-char and narrow-char versions of functions (2-byte and 1-byte chars).

I thought this meant that my code could be either, but it doesn't seem to be the case.

I am porting old code which heavily relies on single-byte chars, but even when I compile with "General->Character Set" set to "Use Multi-Byte Character Set" (Visual Studio 2015), TCHAR is still defined as a wide char, breaking lots of my code.

Some of the code will only break at runtime too, making it hard to track down reliably, eg:

Code: Select all

wxString str;
char name[80] = "Bob";
str.Printf("%s", name);

Is it possible for my code to be compiled so that single-byte chars is the standard, and work in examples like above?

By the way, I notice the docs say that wxPrintf() can handle char* arguments, but it doesn't seem to be the case. Using it above also crashes.

From http://docs.wxwidgets.org/3.1/classwx_string.html:

wxPrintf() replaces both printf() and wprintf() and accepts wxString objects, results of c_str() calls but also char* and wchar_t* strings directly

PB · Post by PB » Sat Nov 11, 2017 7:15 am

RobertWebb wrote:As I understand it wxWidgets supports both wide-char and narrow-char versions of functions (2-byte and 1-byte chars).

wxWidgets supports Unicode and ANSI builds but yolu have to choose one. The size of the char in the Unicode builds depends on the platform, e.g. Windows has 2 bytes per character, Unix has 4 - this is based on the C++ standard implementation (sizeof(wchar_t)). Then there is a UTF-8 build but that one should not concern you...

RobertWebb wrote: I am porting old code which heavily relies on single-byte chars, but even when I compile with "General->Character Set" set to "Use Multi-Byte Character Set" (Visual Studio 2015), TCHAR is still defined as a wide char, breaking lots of my code.

MBCS is not Unicode, if you want to use Unicode, you need to select "Use Unicode character set"

RobertWebb wrote: Some of the code will only break at runtime too, making it hard to track down reliably, eg:
Code: Select all
wxString str;
char name[80] = "Bob";
str.Printf("%s", name);

Such code normally works. Must be because of the wrong build.

RobertWebb wrote:Is it possible for my code to be compiled so that single-byte chars is the standard, and work in examples like above?

Yes, see the answers aboove.

RobertWebb · Post by **RobertWebb** » Sat Nov 11, 2017 8:26 am

PB wrote:
RobertWebb wrote:As I understand it wxWidgets supports both wide-char and narrow-char versions of functions (2-byte and 1-byte chars).
wxWidgets supports Unicode and ANSI builds but you have to choose one. The size of the char in the Unicode builds depends on the platform, e.g. Windows has 2 bytes per character, Unix has 4 - this is based on the C++ standard implementation (sizeof(wchar_t)). Then there is a UTF-8 build but that one should not concern you...

Actually a UTF-8 build is exactly what I want, because it means 1-byte chars and is best for i18n.

What you say "choose one", do you mean when building wxWidgets itself, or just projects using it? From what I can tell now wxWidgets itself is now ALWAYS built with Unicode (as of version 3.0), and I think that's the problem. It means TCHAR is always 2 bytes, regardless of my own project settings. So any old code that isn't compatible with wide chars breaks.

From setup.h:

Code: Select all

// These settings are obsolete: the library is always built in Unicode mode
// now
...
// wxUSE_WCHAR_T is required by wxWidgets now, don't change.

RobertWebb wrote: I am porting old code which heavily relies on single-byte chars, but even when I compile with "General->Character Set" set to "Use Multi-Byte Character Set" (Visual Studio 2015), TCHAR is still defined as a wide char, breaking lots of my code.
MBCS is not Unicode, if you want to use Unicode, you need to select "Use Unicode character set"

No, I DON'T want to use Unicode, or rather, I don't mind if wxWidgets supports both Unicode and 1-byte versions of its functions, but I want single-byte chars to be the default in my own code. That's why I chose the multi-byte option, because it means single-byte chars, not 2-byte chars like Unicode.

RobertWebb wrote: Some of the code will only break at runtime too, making it hard to track down reliably, eg:
Code: Select all
wxString str;
char name[80] = "Bob";
str.Printf("%s", name);
Such code normally works. Must be because of the wrong build.

Does it? But how? It has to guess the types of arguments from variable argument lists, so if wxWidgets is always compiled for Unicode now, then I presume it has to guess that the argument is a wide char string, hence the crash when passing a char*.

PB · Post by PB » Sat Nov 11, 2017 9:09 am

RobertWebb wrote:Actually a UTF-8 build is exactly what I want, because it means 1-byte chars and is best for i18n.

On Windows, it is very unlikely you want to use UTF-8 build but perhaps it may work, I don't know. All I know it is not used. If you want i18n, you basically just wrap all the string literals in _(), create and translate the .PO file.

RobertWebb wrote:What you say "choose one", do you mean when building wxWidgets itself, or just projects using it?

Obviously for such important thing, user code settings must match the settings libraries it is using were built with.

While post v2.8 wxWidgets default to Unicode builds, you can override the setting and build it in the ANSI mode. You need to set the mode in the IDE for the whole library or pass the option on the command line if using NMake. I believe that ANSI mode is rarely used so you may run into an issue when building it. AFAIK Such issues are still being fixed though when encountered (new code is not tested in the ANSI builds).

RobertWebb wrote: I am porting old code which heavily relies on single-byte chars, but even when I compile with "General->Character Set" set to "Use Multi-Byte Character Set" (Visual Studio 2015), TCHAR is still defined as a wide char, breaking lots of my code.

Do not use MBCS, this is a thing of the past (and I believe that MBCS with MSVC = DBCS, i.e., sizeof(char) = 2). Use Unicode if possible, if not go for ANSI (where you need to make sure none of the Unicode or DBCS/MBCS related defines are defined in any of the code you build).

RobertWebb · Post by **RobertWebb** » Sat Nov 11, 2017 12:53 pm

I think I've solved enough of this now. The main reason for concern was that wxString::Printf() was crashing with char* arguments, but this has also been fixed by extracting some ancient inherited properties relating to Visual Leak Detector, as described in this post: viewtopic.php?f=19&t=44010&p=180384#p180384. Aside from that, I don't think I can avoid updating the remaining code to handle TCHAR = 2-bytes.

PB wrote:<I assume you understand the difference between a char (a C++ type) and a character (actual representation of a letter, number etc.), sometimes it seems you do not...>

Yes but unfortunately I am trying to port some very old code in which the two were used synonymously.

On Windows, it is very unlikely you want to use UTF-8 build but perhaps it may work, I don't know. All I know it is not used. If you want i18n, you basically just wrap all the string literals in _(), create and translate the .PO file.

Yep this is the intention. The .PO files will use UTF-8, but I guess that doesn't mean it's a UTF-8 build as such. The appeal was just that I presume that would mean TCHAR gets defined as one byte. Nevermind.

While post v2.8 wxWidgets default to Unicode builds, you can override the setting and build it in the ANSI mode. You need to set the mode in the IDE for the whole library or pass the option on the command line if using NMake. I believe that ANSI mode is rarely used so you may run into an issue when building it. AFAIK Such issues are still being fixed though when encountered (new code is not tested in the ANSI builds).

Oh, OK, but it really seems to recommend against doing this. I don't want to go against the flow that much, so I'll just get it to work with Unicode, but thanks for the tip.

Do not use MBCS, this is a thing of the past (and I believe that MBCS with MSVC = DBCS, i.e., sizeof(char) = 2). Use Unicode if possible, if not go for ANSI (where you need to make sure none of the Unicode or DBCS/MBCS related defines are defined in any of the code you build).

MBCS definitely sets sizeof(TCHAR) = 1, not 2. Unicode sets it to 2. sizeof(char) is always 1 of course (presume that was a typo). I'll go with Unicode though (despite the 1000 errors it currently gives me).

To be honest, I'm not sure what you mean by going with ANSI. VS only gives two choices, Unicode or MBCS.

Thanks for your help. Getting rid of Visual Leak Detector has fixed Printf(), so that should make the transition to Unicode a bit easier.

PB · Post by PB » Sat Nov 11, 2017 3:31 pm

RobertWebb wrote:To be honest, I'm not sure what you mean by going with ANSI. VS only gives two choices, Unicode or MBCS.

It does not. The third option is, as I wrote in my previous post, to define neither Unicode nor _MBCS. I believe that is what I called ANSI and is the mode wxWidgets defaulted to in its pre 2.9 versions. See e.g. here how the defines in MSVS affect the codde: https://msdn.microsoft.com/en-us/library/c426s321.aspx

But if you can, certainly go with Unicode. Sticking to ANSI is a desperate measure when porting legacy code is not an option.

wxWidgets Discussion Forum

Must I use UNICODE in my code? Topic is solved

Must I use UNICODE in my code?

Re: Must I use UNICODE in my code?

Re: Must I use UNICODE in my code?

Re: Must I use UNICODE in my code?

Re: Must I use UNICODE in my code?

Re: Must I use UNICODE in my code?