regex select

In this forum you can discuss database related issues which can be wxWidgets related, but also generic in nature.
Post Reply
Wolfgang
I live to help wx-kind
I live to help wx-kind
Posts: 180
Joined: Mon Jan 28, 2019 8:22 am

regex select

Post by Wolfgang »

Hello

I would need a select, where it does not see a difference in hebrew if the vowel points are different, or more important if there are any sentence markers in hebrew in. The text is in unicode in a field with name wort, and a table called bibel.
יֵּ֥לֶךְ
יֵּ֣לֶךְ
both should be found and not just on, small difference in the sentence markers here.
Please help
utelle
Moderator
Moderator
Posts: 1125
Joined: Tue Jul 05, 2005 10:00 pm
Location: Cologne, Germany
Contact:

Re: regex select

Post by utelle »

Wolfgang wrote: Sat Apr 11, 2020 8:35 pm I would need a select, where it does not see a difference in hebrew if the vowel points are different, or more important if there are any sentence markers in hebrew in. The text is in unicode in a field with name wort, and a table called bibel.
יֵּ֥לֶךְ
יֵּ֣לֶךְ
both should be found and not just on, small difference in the sentence markers here.
Please help
If you are using SQLite3 and wxSQLite3, then the situation is as follows:

SQLite3 defines an operator REGEXP. However, this operator is undefined by default. Your application has to provide a user-defined function implementing this operator. Fortunately, wxSQLite3 provides an implementation of the REGEXP operator using wxWidgets' implementation for regular expressions. Unfortunately this information is missing from the documentation. However, take a look at wxsqlite3.h (at the bottom). Your application just has to create a scalar function with the name regexp using method wxSQLite3Database::CreateFunction.
Wolfgang
I live to help wx-kind
I live to help wx-kind
Posts: 180
Joined: Mon Jan 28, 2019 8:22 am

Re: regex select

Post by Wolfgang »

Not even sure if regex is the only way. Just need a way to find it.

Have found something for java, which might be useful in a way?

Code: Select all

public static String addTildeOptions(String searchText) {
    return searchText.toLowerCase()
                     .replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
                     .replaceAll("[eéèëê]", "\\[eéèëê\\]")
                     .replaceAll("[iíìî]", "\\[iíìî\\]")
                     .replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
                     .replaceAll("[uúùüû]", "\\[uúùüû\\]")
                     .replace("*", "[*]")
                     .replace("?", "[?]");
}

Code: Select all

SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
But for hebrew I will need other letters for it, that what it makes more difficult, and at the moment I'm not even sure if the sentence markers are not part of the character, as they are not possible to delete seperately in libre office writer.
But could something similar help.

And anyway you regexp explenation was a bit to complicated for me.
Wolfgang
I live to help wx-kind
I live to help wx-kind
Posts: 180
Joined: Mon Jan 28, 2019 8:22 am

Re: regex select

Post by Wolfgang »

Try to convert the java code for c++

Code: Select all

wxString addTildeOptions(wxString searchText) {
wxRegEx mynew(_T("[aáàäâã]"),wxRE_BASIC);

                     mynew.ReplaceAll(&searchText, _T("\\[aáàäâã\\]"))
           //          searchText.ReplaceAll("[eéèëê]", "\\[eéèëê\\]")
             //        searchText.replaceAll("[iíìî]", "\\[iíìî\\]")
               //      searchText.replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
                 //    searchText.replaceAll("[uúùüû]", "\\[uúùüû\\]")
                     searchText.replace("*", "[*]")
                     searchText.replace("?", "[?]");
                         return searchText.LowerCase()
}
But this gives me:
error: expected ';' before 'searchText'
utelle
Moderator
Moderator
Posts: 1125
Joined: Tue Jul 05, 2005 10:00 pm
Location: Cologne, Germany
Contact:

Re: regex select

Post by utelle »

Wolfgang wrote: Sun Apr 12, 2020 5:02 am Not even sure if regex is the only way. Just need a way to find it.
Probably regular expression are not the only way. However, I don't know enough about languages like Hebrew to be able to give you further advice.
Wolfgang wrote: Sun Apr 12, 2020 5:02 am

Code: Select all

SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
But for hebrew I will need other letters for it, that what it makes more difficult, and at the moment I'm not even sure if the sentence markers are not part of the character, as they are not possible to delete seperately in libre office writer.
But could something similar help.
wxSQLite3 allows to implement user-defined SQL functions. So, you could implement functions like addTildeOptions to solve your original problem.
Wolfgang wrote: Sun Apr 12, 2020 5:02 am And anyway you regexp explenation was a bit to complicated for me.
Well, you asked for regular expressions. And I told you that SQLite doesn't support the REGEXP operator right out of the box, but that it can be implemented in your application. The minimal sample coming with wxSQLite3 even shows you how to do it.
utelle
Moderator
Moderator
Posts: 1125
Joined: Tue Jul 05, 2005 10:00 pm
Location: Cologne, Germany
Contact:

Re: regex select

Post by utelle »

Wolfgang wrote: Sun Apr 12, 2020 7:31 am Try to convert the java code for c++

Code: Select all

wxString addTildeOptions(wxString searchText) {
wxRegEx mynew(_T("[aáàäâã]"),wxRE_BASIC);

                     mynew.ReplaceAll(&searchText, _T("\\[aáàäâã\\]"))
           //          searchText.ReplaceAll("[eéèëê]", "\\[eéèëê\\]")
             //        searchText.replaceAll("[iíìî]", "\\[iíìî\\]")
               //      searchText.replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
                 //    searchText.replaceAll("[uúùüû]", "\\[uúùüû\\]")
                     searchText.replace("*", "[*]")
                     searchText.replace("?", "[?]");
                         return searchText.LowerCase()
}
But this gives me:
error: expected ';' before 'searchText'
The signature of method replaceAll in Java differs from that of method ReplaceAll in C++/wxWidgets. Especially, the method replaceAll in Java is a method of the string class; this is not the case in wxWidgets. Therefore you can't convert the code in the way you did. You will have to invoke method ReplaceAll of mynew several times. And each line of invocation has to be terminated with a semicolon (as all C++ statements have to be terminated that way).
Wolfgang
I live to help wx-kind
I live to help wx-kind
Posts: 180
Joined: Mon Jan 28, 2019 8:22 am

Re: regex select

Post by Wolfgang »

Code: Select all

  
wxRegEx mynew(_T("[aáàäāâăã]"),wxRE_BASIC);
mynew.ReplaceAll(&searchText, _T("\\[aáàăäâāã\\]"));
wxRegEx mynew2(_T("[eéèëēĕệê]"),wxRE_BASIC);
mynew2.ReplaceAll(&searchText, _T("\\[eéèëēĕėệê\\]"));
wxRegEx mynew3(_T("[iíìî]"),wxRE_BASIC);
mynew3.ReplaceAll(&searchText, _T("\\[iíìî\\]"));
wxRegEx mynew4(_T("[oóòöôŏōõ]"),wxRE_BASIC);
mynew4.ReplaceAll(&searchText, _T("\\[oóòöôŏōõ\\]"));
wxRegEx mynew5(_T("[uúùüû]"),wxRE_BASIC);
mynew5.ReplaceAll(&searchText, _T("\\[uúùüû\\]"));
wxRegEx mynew6(_T("[sśŝ]"),wxRE_BASIC);
mynew6.ReplaceAll(&searchText, _T("\\[sśŝ\\]"));

                     searchText.Replace("*", "[*]");
                //     searchText.Replace("ħ", "[?]"); [b](should be ignored by the select, will not be in the searchtext)[/b]
                    // searchText.Replace("ė", "[?]"); [b](should be ignored by the select)[/b]
                    //searchText.Replace("ţ", "[?]"); [b](should be ignored by the select)[/b]
                    searchText.Replace("?", "[?]");
                     searchText.LowerCase();

Code: Select all

SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
Like this I managed to get it running for specialised characters. But the problem I have now I would have a couple of characters which should be ignored by the select all together (no matter how often or when they appear), for that I did not find a way with glob.

Is there a way, or only with regex. As I did not work at all with regex up to now. I would need help there how to make the right expression.
utelle
Moderator
Moderator
Posts: 1125
Joined: Tue Jul 05, 2005 10:00 pm
Location: Cologne, Germany
Contact:

Re: regex select

Post by utelle »

Wolfgang wrote: Thu Apr 16, 2020 2:46 pm

Code: Select all

SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
Like this I managed to get it running for specialised characters. But the problem I have now I would have a couple of characters which should be ignored by the select all together (no matter how often or when they appear), for that I did not find a way with glob.

Is there a way, or only with regex. As I did not work at all with regex up to now. I would need help there how to make the right expression.
I have to admit that I can't help you in defining the right pattern to solve your problem. However, the GLOB operator is certainly not enough. I'm pretty sure an elaborate regular expression could be a solution.

For sure, you already know what you can accomplish with regular expressions in wxWidgets. However, since you deal with non-ASCII text, you may consider to use the ICU extension of SQLite, which also implements the REGEXP operator. A pre-compiled version of the SQLite shell with ICU extension is available here. So, you could test and experiment without putting too much effort into programming. Documentation of the ICU regular expression syntax can be found here.
Post Reply