Problems with find() function with "weird" characters on Debian and not on Win7

Do you have a typical platform dependent issue you're battling with ? Ask it here. Make sure you mention your platform, compiler, and wxWidgets version.
Post Reply
User avatar
Parduz
I live to help wx-kind
I live to help wx-kind
Posts: 188
Joined: Fri Jan 30, 2015 1:48 pm
Location: Bologna, Italy

Problems with find() function with "weird" characters on Debian and not on Win7

Post by Parduz »

(Using wxWidgets 3.0.4)

This is a sample code which show my current problem:

Code: Select all

// ============================================================================
// declarations
// ============================================================================

// ----------------------------------------------------------------------------
// headers
// ----------------------------------------------------------------------------

// For compilers that support precompilation, includes "wx/wx.h".
#include "wx/wxprec.h"

#ifdef __BORLANDC__
    #pragma hdrstop
#endif

// for all others, include the necessary headers (this file is usually all you
// need because it includes almost all "standard" wxWidgets headers)
#ifndef WX_PRECOMP
    #include "wx/wx.h"
#endif

#include <wx/app.h>
#include <wx/cmdline.h>

// ============================================================================
// implementation
// ============================================================================
int main(int argc, char **argv)
{
    wxApp::CheckBuildOptions(WX_BUILD_OPTIONS_SIGNATURE, "program");

    wxInitializer initializer;
    if ( !initializer )
    {
        fprintf(stderr, "Failed to initialize the wxWidgets library, aborting.");
        return -1;
    }


	unsigned char s1[20] = {'[',0xA7,']','0','1','2','3','4','5','6','7','8','9',0x30,'[','~',']',0x0D,0x0A,0x00};
	wxString str1(s1,20);

	wxPrintf ("Position of [0xA7] in str1 is %d\n", str1.Find("[§]") );
	wxPrintf ("Position of [~] in str1 is %d\n", str1.Find("[~]") );
	wxPrintf ("Lenght of str1 is %d\n", str1.length() );
	wxPrintf ("Size of str1 is %d\n", str1.size() );
	wxPrintf ("wxStrlen of s1 is %d\n", wxStrlen(s1) );
	wxPrintf ("\n");

	wxPrintf ("Printing - [§] %d - results in a segmentation fault error on my BBB with Debian\n", str1.Find("[§]") );
    return 0;
}


This is the output on Windows:
Position of [0xA7] in str1 is 0
Position of [~] in str1 is 14
Lenght of str1 is 20
Size of str1 is 20
wxStrlen of s1 is 19

Printing - [º] 0 - results in a segmentation fault error on my BBB with Debian
And this is the output on my BeagleBone Black with a Debian image:
Position of [0xA7] in str1 is 0
Position of [~] in str1 is -1
Lenght of str1 is 0
Size of str1 is 0
wxStrlen of s1 is 0

Segmentation fault
Now, the char array represents what i receive from the serial port from an old device. It may contains any kind of "non-ascii" characters, and each data-packet inside the whole transmission stream will always begins with "[§]" and end with "[~]\r\n"; there's nothing i can do about it other than manage it as it is.

I borrowed a function from a C++Builder (2006) program to parse the data-packets using the Borland equivalent of the find() function. It was just a problem of syntax and, as i develop on Windows, it worked well, but then when I built on Debian everything was messed up.

My questions are:
1) is it expected to have the two different results between the two different OS when i use the same code?
2) what should i do to make it working on both the OSs?
User avatar
doublemax
Moderator
Moderator
Posts: 19116
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: Problems with find() function with "weird" characters on Debian and not on Win7

Post by doublemax »

Code: Select all

wxString str1(s1,20);
This is the crucial line, because it will use a local encoding to convert from 8bit data to Unicode. Under Linux this will be UTF8, but the byte sequence is not a valid UTF8 byte sequence, therefore you end up with an empty string.

You can:

1) If you know the encoding of the incoming data, pass a matching wxMBConv instance to the wxString constructor.

2) Use wxString::From8BitData to create a string with the bytes as they are. This may sound promising, but it will ignore the proper encoding and can lead to strange errors later depending on what you're going to do with the wxString

3) Use a std::string which is a "dumb" container and doesn't care about Unicode. In this case you should also use std::string operations for any further processing

What's the best option for you is hard to tell.
Use the source, Luke!
Post Reply