Problem with custom lexer for wxStyledTextCtrl and unicode characters

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
Zaskar
In need of some credit
In need of some credit
Posts: 5
Joined: Thu Jul 03, 2008 2:35 pm

Problem with custom lexer for wxStyledTextCtrl and unicode characters

Post by Zaskar » Mon Apr 20, 2020 2:11 pm

I wrote a custom lexer for a wxStyledTextCtrl derived class using the wxSTC_LEX_CONTAINER/EVT_STC_STYLENEEDED method. For setting the styles, the algorithm has to analyze the string that wxStyledTextCtrl::GetLine returns, and use wxStyledTextCtrl::StartStyling and wxStyledTextCtrl::SetStyling.

The problem is that positions in the wxString that GetLine returns counts "characters", and positions in StartStyling and SetStyling seems to count "bytes". So, when there are multi-byte characters, it's not the same. I couldn't find an easy and fast way to convert the position from the wxString into a offset for wxStyledTextCtrl. Is there any way to do it? Is this something missing in the API? What am I missing?

At the moment, I'm using a workaround based on wxStyledTextCtrl::FindColumn to guess when two positions in the editor are shown as a single character. But it's seems too complicated and very inefficient for something that should be much more straighforwar.

Here's the workaround:

Code: Select all

	int pbeg = wxStyledTextCtrl::PositionFromLine(line);
	int pend = wxStyledTextCtrl::GetLineEndPosition(line);
	static std::vector<int> vpos; vpos.clear(); vpos.push_back(pbeg);
	for(int p,col=0; (p=FindColumn(line,col))!=pend; ++col) 
		if (p!=vpos.back()) vpos.push_back(p);
	auto MySetStyle = [&](int p0, int p1, int style) {
		p0 = p0<vpos.size() ? vpos[p0] : pend;
		p1 = p1<vpos.size() ? vpos[p1] : pend;
		wxStyledTextCtrl::StartStyling(p0);
		wxStyledTextCtrl::SetStyling(p1-p0,style);
	};

New Pagodi
Super wx Problem Solver
Super wx Problem Solver
Posts: 352
Joined: Tue Jun 20, 2006 6:47 pm
Contact:

Re: Problem with custom lexer for wxStyledTextCtrl and unicode characters

Post by New Pagodi » Mon Apr 20, 2020 7:14 pm

There are a number of "raw" methods supplied with wxStyledTextCtrl that allow you to get information from the control without converting from/to wxString. I think in particular GetLineRaw would be helpful here.

Zaskar
In need of some credit
In need of some credit
Posts: 5
Joined: Thu Jul 03, 2008 2:35 pm

Re: Problem with custom lexer for wxStyledTextCtrl and unicode characters

Post by Zaskar » Mon Apr 20, 2020 8:18 pm

The problem is matching the positions. If I have a line with two characters, one is one byte length, the other two bytes. Then the regular string would give a length of 2, and the raw one a size of 3. What I need to know if its 1+2 or 2+1, because the position for the second character would be PostionFromLine(x)+¿1 or 2?

New Pagodi
Super wx Problem Solver
Super wx Problem Solver
Posts: 352
Joined: Tue Jun 20, 2006 6:47 pm
Contact:

Re: Problem with custom lexer for wxStyledTextCtrl and unicode characters

Post by New Pagodi » Mon Apr 20, 2020 9:21 pm

Matching positions for is not easy. An example of how to do this is given by the SurfaceImpl::MeasureWidths method defined in src/stc/PlatWX.cpp from the wxWidgets source.

I assume you have a function for styling line in wxString form. ie you already have this function

Code: Select all

void StylewxString(const wxString& s, std::vector<char>& wxStyles)
{
    ...
}
Then you can use that function and adapt the code from the measure widths method to style a line like this:

Code: Select all

void <whateverclass>::StyleLine(int lineNo)
{
    wxCharBuffer buf = stc->GetLineRaw(lineNo);
    int len = buf.length();

    wxString str = wxString::FromUTF8(buf.data(), len);

    std::vector<char> utf8Styles, wxStyles;
    
    utf8Styles.reserve(len);
    
    StylewxString(str,wxStyles);

#if wxUSE_UNICODE
    // Map the widths back to the UTF-8 input string
    size_t utf8i = 0;
    for (size_t wxi = 0; wxi < str.size(); ++wxi) {
        wxUniChar c = str[wxi];

#if SIZEOF_WCHAR_T == 2
        // For surrogate pairs, the position for the lead surrogate is garbage
        // and we need to use the position of the trail surrogate for all four bytes
        if (c >= 0xD800 && c < 0xE000 && wxi + 1 < str.size()) {
            ++wxi;
            utf8Styles.push_back(wxStyles[wxi]);
            utf8Styles.push_back(wxStyles[wxi]);
            utf8Styles.push_back(wxStyles[wxi]);
            utf8Styles.push_back(wxStyles[wxi]);
            continue;
        }
#endif

        utf8Styles.push_back(wxStyles[wxi]);
        if (c >= 0x80)
            utf8Styles.push_back(wxStyles[wxi]);
        if (c >= 0x800)
            utf8Styles.push_back(wxStyles[wxi]);
        if (c >= 0x10000)
            utf8Styles.push_back(wxStyles[wxi]);
    }

    // should check that utf8Styles.size() == len
    // if not, something has gone wrong.

    stc->StartStyling(stc->PositionFromLine(lineNo));
    stc->SetStyleBytes(len,&utf8Styles[0]);
}
The idea is to map the styles for that wxString to an array of bytes for the uf8 string and then pass that array of bytes to the SetStyleBytes method with that array of bytes. All of this is completely untested, so there may well be a few things that need to be corrected.

Actually this isn't even the whole story since none of this accounts for unicode combining characters. Uncode can sometimes be a pain to deal with.

Zaskar
In need of some credit
In need of some credit
Posts: 5
Joined: Thu Jul 03, 2008 2:35 pm

Re: Problem with custom lexer for wxStyledTextCtrl and unicode characters

Post by Zaskar » Mon Apr 27, 2020 11:43 pm

Searching for another problem, I found the solution to this one: wxStyledTextCtrl::PositionAfter and wxStyledTextCtrl::PositionBefore take code page into account. It's not exactly what I was looking for, but I think it will allow me to write a much better workaround.

Post Reply