Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Do you have a question about makefiles, a compiler or IDE you are using and need to know how to set it up for wxWidgets or why it doesn't compile but other IDE's do ? Post your questions here.
Post Reply
gasko
In need of some credit
In need of some credit
Posts: 2
Joined: Mon Apr 13, 2020 11:14 am

Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Post by gasko » Mon Apr 13, 2020 11:36 am

Hi all

We have a large legacy project which uses "Multi-Byte Character Set" i.e. ANSI char options instead of Unicode and therefore needed to build wxWidgets for MBCS as well.

There were lots of issues in doing this, such as the following linker error:
- unresolved external symbol wxAppConsoleBase::OnAssert

This is how we built non-Unicode wxWidgets successfully for Visual Studio 2019 on Windows 10:
1. Download and install wxWidgets and open the installation folder (named WXWIN from now on)
2. Go to WXWIN\include\wx\msw\setup.h and change "wxUSE_UNICODE 1" to "wxUSE_UNICODE 0"
3. Go to WXWIN\build\msw and open wx_vc16.sln (or the version of your choice)
4. BEFORE batch building, highlight all projects in the solution > right-click > properties > Configuration Properties > Advanced and set "Character Set" to "Use Multi-Byte Character Set" for ALL CONFIGURATIONS
5. Then go to C/C++ > Preprocessor > Preprocessor Definitions for EACH project and remove "_UNICODE" manually for ALL CONFIGURATIONS. Not all of the projects have this define set so you may have to select them in small groups or do them individually to ensure you remove it for all.
6. You can then batch build as normal: go to Build > Batch Build and then "Select All" and then "Build"
7. When you include wxWidgets, ensure your project is also set to non-Unicode

Step 5 was the route of our problems. Even though we defined wxUSE_UNICODE 0, it seems this macro merely changes the "overall" configuration for wxWidgets, and not the configuration for each sub-project/library. Further, the _UNICODE define seemingly overrode this option anyway, as visible in the code under "platform.h" which switches BOTH wxUSE_UNICODE and _UNICODE on if EITHER option is on.

Hope this helps someone!

Regards,
Gary

PB
Part Of The Furniture
Part Of The Furniture
Posts: 2466
Joined: Sun Jan 03, 2010 5:45 pm

Re: Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Post by PB » Mon Apr 13, 2020 1:44 pm

Thanks for the info.

If one is not afraid of using command line, MBCS version of wxWidgets can be easily built with nmake like this

Code: Select all

nmake -f makefile.vc UNICODE=0 CPPFLAGS="/D_MBCS"
I assume it would be similarly easy to generate MSVS projects with bundled CMakeFile: just check off wxUSE_UNICODE, define _MBCS, generate the project, build the project in MSVS.

BTW, I am not sure about your formulation "Multi-Byte Character Set" i.e. ANSI char options". I believe that Unicode, ANSI, and MBCS are three different character encodings in MSVC:
  1. ANSI: the original C encoding, every character is exactly one byte, neither _MBCS nor _UNICODE is defined, the string comparison function would be strcmp(). This was the default wxWidgets encoding until wxWidgets 2.9.
  2. Unicode: characters are wchar_t-based (UTF-16 strings), _UNICODE is defined, the string comparison function would be wcscmp(). This is the default wxWidgets encoding since wxWidgets 2.9.
  3. MBCS: was usually used with CJK languages, characters are 1 or 2 bytes long (MSVC supports only DBCS), _MBCS is defined, the string comparison function would be _mbscmp().

User avatar
RobertWebb
Earned a small fee
Earned a small fee
Posts: 23
Joined: Sun Oct 29, 2017 11:14 am
Contact:

Re: Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Post by RobertWebb » Sun Jun 21, 2020 5:16 am

Many thanks for this!

I don't understand why wxWidgets is trying to force UNICODE upon us. Isn't UTF-8 the preferred encoding? And isn't UTF-8 based on single-width characters? Why would you want double-width characters when using UTF-8? MBCS seems the better option, but it's being deprecated everywhere. Why?

The wxWidgets Visual Studio projects don't just have the UNICODE option selected, the ALSO define _UNICODE explicitly everywhere. WHY? The setting already does this automatically. All this does is make it impossible to switch to MBCS without hitting a tonne of errors. This is both in the wxWidgets code itself AND their sample projects.

Step 5 is difficult because even after I edited it out from every configuration combination I could think of, I still had 16 occurrences of _UNICODE in EVERY .vcxproj file! I ended up doing a search and replace from ";_UNICODE" to "" in ALL .vcxproj files. wxWidgets should remove these permanently as they serve no purpose, even in a UNICODE build.

Looking at your step 2, I see in setup.h that it has this comment:

Code: Select all

// These settings are obsolete: the library is always built in Unicode mode
// now, only set wxUSE_UNICODE to 0 to compile legacy code in ANSI mode if
// absolutely necessary -- updating it is strongly recommended as the ANSI mode
// will disappear completely in future wxWidgets releases.
Again, WHY??? The choice is not between UNICODE and ANSI, there's also MBCS, which is different again although both ANSI and MBCS use single-width characters. ANSI is the option you get when choosing to not set either UNICODE or MBCS as an option. MBCS is still important, and as I said above, it's surely the preferred method when using UTF-8? At least that's my understanding.

It is not a simple matter to change a code base from MBCS to UNICODE. It can mean thousands of errors.

When I heard a few years ago that wxWidgets would have a single build which supported both single and double width chars, I thought great! That means we can compile with either UNICODE or MBCS and it will just work. But no, it means you can no longer compile for MBCS.

This is bad!

Should I give up on migrating my old MFC projects to wxWidgets because it won't be usable for MBCS code soon?

stahta01
Super wx Problem Solver
Super wx Problem Solver
Posts: 365
Joined: Fri Nov 03, 2006 2:00 pm

Re: Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Post by stahta01 » Sun Jun 21, 2020 7:40 am

https://stackoverflow.com/questions/329 ... on-windows

UTF8 is not able to be supported by MBCS because MBCS assumes at most two bytes per character.

I would wonder how long MBCS would be supported officially by Windows 10 or unofficially by wxWidgets in your case.

If windows 10 drops MBCS support, then have an wxWidgets port might be a great idea for you.

But, I have no idea why you decided on the MFC to wxWidgets migration.

Tim S.

PB
Part Of The Furniture
Part Of The Furniture
Posts: 2466
Joined: Sun Jan 03, 2010 5:45 pm

Re: Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Post by PB » Sun Jun 21, 2020 10:37 am

I think it is all very simple but I also think you are confusing several things.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
I don't understand why wxWidgets is trying to force UNICODE upon us.
When you in MSVS select "Use Unicode character set" for "Character set", MSVS automaticaly adds both _UNICODE and UNICODE preprocessor defintions, you can try for yourself (do not forget to click Apply after changing the character set).
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
Isn't UTF-8 the preferred encoding?
No, Unicode in C++ is based on wchar_t size, which is 2 bytes on Windows, i.e., the Unicode variant is UTF-16 (with surrogate pairs, a character takes either 2 or 4 bytes). On Linux wchar_t is 4 bytes and the Unicode variant there is UTF-32.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
And isn't UTF-8 based on single-width characters? Why would you want double-width characters when using UTF-8?
In UTF-8 a character can take between 1 and 4 bytes, so it is a variable-width encoding. The only fixed-width encodings are ANSI (1 byte) and UTF-32 (4 bytes).
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
MBCS seems the better option, but it's being deprecated everywhere. Why?
How exactly is MBCS better than Unicode? MBCS had its issues and so was superseded by Unicode by everyone, including your MFC. AFAIK, wxWidgets never used MBCS/DBCS, it used either ANSI (default till 2.9) or Unicode (default since 2.9) encodings. Are you trying to say you had MBCS/DBCS-based wxString in some wxWidgets versions?
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
Step 5 is difficult because even after I edited it out from every configuration combination I could think of, I still had 16 occurrences of _UNICODE in EVERY .vcxproj file! I ended up doing a search and replace from ";_UNICODE" to "" in ALL .vcxproj files. wxWidgets should remove these permanently as they serve no purpose, even in a UNICODE build.
Well, had you done what I described in my previous post, you could have saved yourself lot of time and effort. See the explanation above why _UNICODE is defined.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
MBCS is still important, and as I said above, it's surely the preferred method when using UTF-8? At least that's my understanding.
You must be confusing things. MBCS (MSVS used its subset DBCS) and UTF-8 are both variable-width character encodings, but they are not the same, the latter is Unicode, the former is not and its range is in MS implementation limited to 65k.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
When I heard a few years ago that wxWidgets would have a single build which supported both single and double width chars, I thought great! That means we can compile with either UNICODE or MBCS and it will just work. But no, it means you can no longer compile for MBCS.
You must confused something again here. As I wrote above, wxWidgets never used MBCS so AFAIK its support for it has not changed: There was none to begin with (I have just grepped v2.8 codebase for "DBCS" and "MBCS" and found nothing).
And as shown in previous posts you can still use MBCS if you absolutely have to, it is just that Unicode is default now so you need to change the default build settings.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
Should I give up on migrating my old MFC projects to wxWidgets because it won't be usable for MBCS code soon?
wxWidgets non-Unicode build support will likely be removed after 3.2 (i.e., in stable branch 3.4), see here: https://github.com/wxWidgets/wxWidgets/ ... -612930925

As for migrating MFC projects to wxWidgets, that is on you, considering pros and cons of all available MFC alternatives. I wonder if any of them will still support non-Unicode encodings in 5+ years...

User avatar
RobertWebb
Earned a small fee
Earned a small fee
Posts: 23
Joined: Sun Oct 29, 2017 11:14 am
Contact:

Re: Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Post by RobertWebb » Sun Jun 21, 2020 1:54 pm

PB wrote:
Sun Jun 21, 2020 10:37 am
I think it is all very simple but I also think you are confusing several things.
Thanks for the reply, and I probably am confused on some of this. I see now I was loose in my terminology though which confused things further.

For my purposes, I think of MBCS as the option I choose to compile for single-byte TCHARs, as opposed to UNICODE which is used for double-byte TCHARs. My old code unfortunately uses mostly char rather than TCHAR, strcpy() rather than _tcscpy() etc, so it's not easy to convert (and man why did they come up with such ugly names for everyhing when going multi-byte?!). I don't actually use the encoding part of MBCS at all, so maybe I should be talking about ANSI? The point is I don't care about the encoding because I've only supported English. One reason for moving to wxWidgets is that they make i18n orders of magnitude easier to support than single-byte MFC, as well as being portable to platforms other than Windows.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
I don't understand why wxWidgets is trying to force UNICODE upon us.
When you in MSVS select "Use Unicode character set" for "Character set", MSVS automaticaly adds both _UNICODE and UNICODE preprocessor defintions, you can try for yourself (do not forget to click Apply after changing the character set).
Yes, choosing the Unicode option will add _UNICODE to the FINAL preprocessor definitions, but NOT to "Properties->C++->Processor->Preprocessor Definitions". There is, of course, no need to add this here manually. But almost all the wxWidgets projects have had this manually added here. That means even when you switch to MBCS, the _UNICODE is still defined. There's no reason for this to be set here.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
Isn't UTF-8 the preferred encoding?
No, Unicode in C++ is based on wchar_t size, which is 2 bytes on Windows, i.e., the Unicode variant is UTF-16 (with surrogate pairs, a character takes either 2 or 4 bytes). On Linux wchar_t is 4 bytes and the Unicode variant there is UTF-32.
Ah, sorry I forgot about UTF-16 etc.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
And isn't UTF-8 based on single-width characters? Why would you want double-width characters when using UTF-8?
In UTF-8 a character can take between 1 and 4 bytes, so it is a variable-width encoding. The only fixed-width encodings are ANSI (1 byte) and UTF-32 (4 bytes).
This was me being too loose with terminology. When I said character I meant TCHARs. Yes I knew UTF-8 was variable width, but built from single-width TCHARs (what's the right term for those sub-characters?).

What I don't understand is why wide chars was ever necessary. Single-width TCHARs was enough to use UTF-8 encoding and represent any set of characters from any language. AND it would generally take up less space. Was it just so code could rely on each character having a constant width?
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
MBCS seems the better option, but it's being deprecated everywhere. Why?
How exactly is MBCS better than Unicode? MBCS had its issues and so was superseded by Unicode by everyone, including your MFC. AFAIK, wxWidgets never used MBCS/DBCS, it used either ANSI (default till 2.9) or Unicode (default since 2.9) encodings. Are you trying to say you had MBCS/DBCS-based wxString in some wxWidgets versions?
My thinking was that UTF-8 is built from single-width TCHARs, so compiling to single-width made the most sense.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
Step 5 is difficult because even after I edited it out from every configuration combination I could think of, I still had 16 occurrences of _UNICODE in EVERY .vcxproj file! I ended up doing a search and replace from ";_UNICODE" to "" in ALL .vcxproj files. wxWidgets should remove these permanently as they serve no purpose, even in a UNICODE build.
Well, had you done what I described in my previous post, you could have saved yourself lot of time and effort. See the explanation above why _UNICODE is defined.
See my answer above re _UNICODE being defined. And I do prefer to be able to build in a full development environment rather than just using the command line.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
When I heard a few years ago that wxWidgets would have a single build which supported both single and double width chars, I thought great! That means we can compile with either UNICODE or MBCS and it will just work. But no, it means you can no longer compile for MBCS.
You must confused something again here. As I wrote above, wxWidgets never used MBCS so AFAIK its support for it has not changed: There was none to begin with (I have just grepped v2.8 codebase for "DBCS" and "MBCS" and found nothing).
And as shown in previous posts you can still use MBCS if you absolutely have to, it is just that Unicode is default now so you need to change the default build settings.
Sorry, my loose terminology again. It may not have supported the MBCS encoding, but all I was concerned with was getting single-width TCHARs.
As for migrating MFC projects to wxWidgets, that is on you, considering pros and cons of all available MFC alternatives. I wonder if any of them will still support non-Unicode encodings in 5+ years...
Anything other than Unicode is already deprecated for MFC, so it will disappear eventually. That's one reason for going to wxWidgets, provided they don't do the same. But it seems to be the way everything is going. Refactoring to keep code alive takes up so much time :(

PB
Part Of The Furniture
Part Of The Furniture
Posts: 2466
Joined: Sun Jan 03, 2010 5:45 pm

Re: Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Post by PB » Sun Jun 21, 2020 2:45 pm

RobertWebb wrote:
Sun Jun 21, 2020 1:54 pm
For my purposes, I think of MBCS as the option I choose to compile for single-byte TCHARs, as opposed to UNICODE which is used for double-byte TCHARs. My old code unfortunately uses mostly char rather than TCHAR, strcpy() rather than _tcscpy() etc, so it's not easy to convert (and man why did they come up with such ugly names for everyhing when going multi-byte?!). I don't actually use the encoding part of MBCS at all, so maybe I should be talking about ANSI? The point is I don't care about the encoding because I've only supported English. One reason for moving to wxWidgets is that they make i18n orders of magnitude easier to support than single-byte MFC, as well as being portable to platforms other than Windows.
I am sorry but I still do not understand. You realize that MB in MBCS stands for "multi byte" so why do you keep saying it is single-byte? Based on what you write, you did not use MBCS, you used ANSI where each character is a single byte and different character sets are achieved by switching the character code page which is very problematic and limited.

Regarding TCHAR: TCHAR evaulates to char in ANSI build and wchar_t in Unicode build, that is all that is to it. I remember that using char instead of TCHAR and _t*() functions was considered bad practice in MFC even back in late 1990s when I was using it, at latest around the time MSVC 6 was released (1998) and the mainstream Windows did not even support Unicode (XP was the first mainstream desktop Windows to do that in 2001). I think this is a typical example of code rot, where you did not update your code in 20+ years and it caught up with you...
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
Yes, choosing the Unicode option will add _UNICODE to the FINAL preprocessor definitions, but NOT to "Properties->C++->Processor->Preprocessor Definitions". There is, of course, no need to add this here manually.
This is not true: both symbols are added to the final preprocessor definitions, you can see that in "Command line" tab. You may be confused because they are not added into the Preprocessor definitions field but they are inherited from the project settings. You can see that in the Preprocessor definitions dialog which shows both manually added and inherited preprocessor definitions.
See here for explanation why both symbols must be defined: https://devblogs.microsoft.com/oldnewthing/?p=40643


To conclude, I do not mean to preach but I think you should read up on character encodings to really understand what they mean and how they work. See e.g. the list in my previous post for starters: viewtopic.php?f=19&t=47025&p=199462#p198025

User avatar
RobertWebb
Earned a small fee
Earned a small fee
Posts: 23
Joined: Sun Oct 29, 2017 11:14 am
Contact:

Re: Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Post by RobertWebb » Mon Jun 22, 2020 3:16 pm

PB wrote:
Sun Jun 21, 2020 2:45 pm
I am sorry but I still do not understand. You realize that MB in MBCS stands for "multi byte" so why do you keep saying it is single-byte?

Sorry, my loose terminology again maybe, but selecting the MBCS option sets TCHAR to char, that's what I mean by it using single-byte chars.
Regarding TCHAR: TCHAR evaulates to char in ANSI build and wchar_t in Unicode build
Right, and to char in MBCS builds.

I have obviously used MBCS for some historic reason and maybe I could have just set that option to "Not set", which I believe implies ANSI. They've been pretty much equivalent for my purposes thus far (coding for English and ignoring multi-byte codings). Either way, I think ANSI and MBCS are equally deprecated.
I remember that using char instead of TCHAR and _t*() functions was considered bad practice in MFC even back in late 1990s when I was using it, at latest around the time MSVC 6 was released (1998) and the mainstream Windows did not even support Unicode (XP was the first mainstream desktop Windows to do that in 2001).
I don't think we were ever taught about anything other than char when I was at uni in the early 90s, and the places I worked never cared about it. I know it's been around for ages though.

The other motivation was probably that it's just so damn ugly! Readable: char and strcpy(). Jibberish: _TCHAR and _tcscpy(). At least to me. It seems every time they add something to C++ they try to make it uglier. I've been working mostly in C# and Unity in the last few years and wow it's so much nicer. But I want to maintain my personal programs that I started 20 years ago in C++ and still work on and sell today. But I've dug myself into many holes and need to dig back out before I can release a new version.
I think this is a typical example of code rot, where you did not update your code in 20+ years and it caught up with you...
Actually I updated it heaps over 20 years, and it's already been a lot of work to keep up. But still char -> TCHAR and 32 -> 64 bit are huge changes that I haven't managed yet. I'm not sure if MFC -> wxWidgets is manageable at all, but I'd like to do that too if I can. Also looking to use it for a new project, but will be re-using some previous code, so still have to modernise that.
RobertWebb wrote:
Sun Jun 21, 2020 5:16 am
Yes, choosing the Unicode option will add _UNICODE to the FINAL preprocessor definitions, but NOT to "Properties->C++->Processor->Preprocessor Definitions". There is, of course, no need to add this here manually.
This is not true: both symbols are added to the final preprocessor definitions, you can see that in "Command line" tab. You may be confused because they are not added into the Preprocessor definitions field but they are inherited from the project settings. You can see that in the Preprocessor definitions dialog which shows both manually added and inherited preprocessor definitions.
See here for explanation why both symbols must be defined: https://devblogs.microsoft.com/oldnewthing/?p=40643
[/quote]
We seem to be getting confused here. I made no mention of the UNICODE symbol. Yes that is added too, along with _UNICODE.

Here's the issue, and it's just about the _UNICODE symbol. In the Visual Studio options, setting "Character Set" to "Use Unicode Character Set" will mean _UNICODE is defined. This is good. This is what we want in that case. However, the _UNICODE symbol is ALSO being explicitly set, a second time, elsewhere in the project settings, in "C++->Processor->Preprocessor Definitions". This is not necessary because it will be defined anyway. Setting "Character Set" to "...Unicode..." did not put it there. It's been manually added separately, so it ends up being defined twice.

So if you change "Character Set" to "Multi-byte Character Set", it will define _MBCS instead of _UNICODE, but _UNICODE is STILL set in the C++ section, so things break. The _UNICODE symbol should not be set explicitly in any of the C++ settings. It will already be implicitly set by the "Character Set" setting.

I'm confused about a lot of things, but I don't think this is one of them :)
To conclude, I do not mean to preach but I think you should read up on character encodings to really understand what they mean and how they work. See e.g. the list in my previous post for starters: viewtopic.php?f=19&t=47025&p=199462#p198025
Thanks, you've actually been very patient! It just seems like I have to spend all my spare time keeping up with where I already was for no functional gain, so it gets frustrating. Maybe this discussion will inspire me to make some progress!

PB
Part Of The Furniture
Part Of The Furniture
Posts: 2466
Joined: Sun Jan 03, 2010 5:45 pm

Re: Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Post by PB » Mon Jun 22, 2020 3:55 pm

I finally see what you meant by defining _UNICODE manually. This may be an artifact of how wxWidgets MSVS projects are generated.

I still think that using nmake or CMake(GUI) to change UNICODE-related build settings is much easier, and I am certainly no command line guru.

Well, good luck with your projects, regardless of whichever GUI library you end with.

ONEEYEMAN
Part Of The Furniture
Part Of The Furniture
Posts: 4235
Joined: Sat Apr 16, 2005 7:22 am
Location: USA, Ukraine

Re: Non-Unicode/MBCS wxWidgets build for Visual Studio [How we solved it]

Post by ONEEYEMAN » Mon Jun 22, 2020 5:59 pm

Hi,
If you don't know - MFC and wxWidgets can co-exist. Take a look at the MFC sample provided with wxWidgets.
And also at its project so you understand how its built.

Thank you.

Post Reply