wxRegEx ([\\s\\S]*) problem Topic is solved

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
eetnev
In need of some credit
In need of some credit
Posts: 6
Joined: Sun Dec 17, 2017 9:36 pm

wxRegEx ([\\s\\S]*) problem

Post by eetnev »

Hi,

I need to match following with wxRegEx:

<tag>
.
.
.
</tag>

Normally I'd use "<tag>([\\s\\S]*)</tag>" but manual states that \\S is illegal inside brackets (not sure if C++ related or just wxWidgets). Content between the tags can be anything. Can someone figure out an alternative?
User avatar
doublemax
Moderator
Moderator
Posts: 19115
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: wxRegEx ([\\s\\S]*) problem

Post by doublemax »

Code: Select all

[\\s\\S]
What's that supposed to do? This matches all chars that are either space (\s) or not space (\S). Which means it matches everything.
Use the source, Luke!
eetnev
In need of some credit
In need of some credit
Posts: 6
Joined: Sun Dec 17, 2017 9:36 pm

Re: wxRegEx ([\\s\\S]*) problem

Post by eetnev »

doublemax wrote:This matches all chars that are either space (\s) or not space (\S). Which means it matches everything.
Exactly. I need to match everything between <tag>...</tag> including whitespace. I recall using this pattern successfully in Java.
User avatar
doublemax
Moderator
Moderator
Posts: 19115
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: wxRegEx ([\\s\\S]*) problem

Post by doublemax »

I need to match everything between <tag>...</tag> including whitespace.
In that case i would use:

Code: Select all

"<tag>(.*?)</tag>"
The '?' makes the match "non-greedy", so if you have multiple <tag></tag> it won't match everything between the first <tag> and the last </tag>
Use the source, Luke!
eetnev
In need of some credit
In need of some credit
Posts: 6
Joined: Sun Dec 17, 2017 9:36 pm

Re: wxRegEx ([\\s\\S]*) problem

Post by eetnev »

doublemax wrote:
I need to match everything between <tag>...</tag> including whitespace.
In that case i would use:

Code: Select all

"<tag>(.*?)</tag>"
The '?' makes the match "non-greedy", so if you have multiple <tag></tag> it won't match everything between the first <tag> and the last </tag>
Thanks, that's a good point. I think the biggest problem are the newlines that might exist within the content. The easiest solution would be to remove all whitespace by ReplaceAll() and then try to match but I have to keep the content in it's original form.

Anyway I decided to extract the content manually by looping and checking a line for <tag> and </tag>.
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: wxRegEx ([\\s\\S]*) problem

Post by eranon »

If you may encounter newline between start and ending tags, you could just add the \n in the regular expression:

Code: Select all

<tag>(.|\n)*?<\/tag>
But, frankly any regex is hard to maintain and a more solid way would to go with a dedicated HTML parser (for example an XPath-based one).
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
User avatar
doublemax
Moderator
Moderator
Posts: 19115
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: wxRegEx ([\\s\\S]*) problem

Post by doublemax »

The "s" option can be used for "Non-newline-sensitive matching":
http://docs.wxwidgets.org/trunk/overvie ... metasyntax
Use the source, Luke!
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: wxRegEx ([\\s\\S]*) problem

Post by eranon »

OK, seen doublemax! Then, it would gives something like:

Code: Select all

(?s)<tag>.*?</tag>
Or is it the default (it's indicated as "usual default" in the doc you pointed, but eetnev seemed to say that there was still the problem of newlines)?
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
User avatar
doublemax
Moderator
Moderator
Posts: 19115
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: wxRegEx ([\\s\\S]*) problem

Post by doublemax »

The /s modifier is normally given after the regular expression like this:
https://stackoverflow.com/a/12667389

wxWidgets uses a flag value for it, but it's set to the "correct" value by default:
http://docs.wxwidgets.org/trunk/regex_8 ... d6be56b171

Maybe OP needs to show his code, because the following worked for me:

Code: Select all

wxRegEx rx("<tag>(.*?)</tag>", wxRE_ADVANCED);
if (rx.IsValid())
{
  wxString text = wxT("jldjalkjs dlas jdl<tag>\n.\n.\n.\n</tag> lakdjals kj lskdlka sdlas");
  if (rx.Matches(text))
  {
    wxLogMessage("match: %s", rx.GetMatch(text, 1) );
  }
  else
  {
    wxLogMessage("no match");
  }
}
Use the source, Luke!
User avatar
eranon
Can't get richer than this
Can't get richer than this
Posts: 867
Joined: Sun May 13, 2012 11:42 pm
Location: France
Contact:

Re: wxRegEx ([\\s\\S]*) problem

Post by eranon »

OK, understood ;)

In my previous message, I placed it at beginning because of this sentence in the doc you pointed:
An ARE may begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE. These supplement, and can override, any options specified by the application. The available option letters are:
Then, I tested in RegexBuddy against Tcl ARE syntax.
snap_0005210.png
snap_0005210.png (58.19 KiB) Viewed 2353 times
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
eetnev
In need of some credit
In need of some credit
Posts: 6
Joined: Sun Dec 17, 2017 9:36 pm

Re: wxRegEx ([\\s\\S]*) problem

Post by eetnev »

doublemax wrote:Maybe OP needs to show his code, because the following worked for me:

Code: Select all

wxRegEx rx("<tag>(.*?)</tag>", wxRE_ADVANCED);
if (rx.IsValid())
{
  wxString text = wxT("jldjalkjs dlas jdl<tag>\n.\n.\n.\n</tag> lakdjals kj lskdlka sdlas");
  if (rx.Matches(text))
  {
    wxLogMessage("match: %s", rx.GetMatch(text, 1) );
  }
  else
  {
    wxLogMessage("no match");
  }
}
Thanks, it works perfectly! To tell you the truth I never gave it a try as I was in a middle of something and didn't trust that "(.*?)" would capture newlines as well. Sorry for the inconvenience.
Post Reply