wxRegEx ([\\s\\S]*) problem Topic is solved
wxRegEx ([\\s\\S]*) problem
Hi,
I need to match following with wxRegEx:
<tag>
.
.
.
</tag>
Normally I'd use "<tag>([\\s\\S]*)</tag>" but manual states that \\S is illegal inside brackets (not sure if C++ related or just wxWidgets). Content between the tags can be anything. Can someone figure out an alternative?
I need to match following with wxRegEx:
<tag>
.
.
.
</tag>
Normally I'd use "<tag>([\\s\\S]*)</tag>" but manual states that \\S is illegal inside brackets (not sure if C++ related or just wxWidgets). Content between the tags can be anything. Can someone figure out an alternative?
Re: wxRegEx ([\\s\\S]*) problem
Code: Select all
[\\s\\S]
Use the source, Luke!
Re: wxRegEx ([\\s\\S]*) problem
Exactly. I need to match everything between <tag>...</tag> including whitespace. I recall using this pattern successfully in Java.doublemax wrote:This matches all chars that are either space (\s) or not space (\S). Which means it matches everything.
Re: wxRegEx ([\\s\\S]*) problem
In that case i would use:I need to match everything between <tag>...</tag> including whitespace.
Code: Select all
"<tag>(.*?)</tag>"
Use the source, Luke!
Re: wxRegEx ([\\s\\S]*) problem
Thanks, that's a good point. I think the biggest problem are the newlines that might exist within the content. The easiest solution would be to remove all whitespace by ReplaceAll() and then try to match but I have to keep the content in it's original form.doublemax wrote:In that case i would use:I need to match everything between <tag>...</tag> including whitespace.The '?' makes the match "non-greedy", so if you have multiple <tag></tag> it won't match everything between the first <tag> and the last </tag>Code: Select all
"<tag>(.*?)</tag>"
Anyway I decided to extract the content manually by looping and checking a line for <tag> and </tag>.
- eranon
- Can't get richer than this
- Posts: 867
- Joined: Sun May 13, 2012 11:42 pm
- Location: France
- Contact:
Re: wxRegEx ([\\s\\S]*) problem
If you may encounter newline between start and ending tags, you could just add the \n in the regular expression:
But, frankly any regex is hard to maintain and a more solid way would to go with a dedicated HTML parser (for example an XPath-based one).
Code: Select all
<tag>(.|\n)*?<\/tag>
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
Re: wxRegEx ([\\s\\S]*) problem
The "s" option can be used for "Non-newline-sensitive matching":
http://docs.wxwidgets.org/trunk/overvie ... metasyntax
http://docs.wxwidgets.org/trunk/overvie ... metasyntax
Use the source, Luke!
- eranon
- Can't get richer than this
- Posts: 867
- Joined: Sun May 13, 2012 11:42 pm
- Location: France
- Contact:
Re: wxRegEx ([\\s\\S]*) problem
OK, seen doublemax! Then, it would gives something like:
Or is it the default (it's indicated as "usual default" in the doc you pointed, but eetnev seemed to say that there was still the problem of newlines)?
Code: Select all
(?s)<tag>.*?</tag>
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
Re: wxRegEx ([\\s\\S]*) problem
The /s modifier is normally given after the regular expression like this:
https://stackoverflow.com/a/12667389
wxWidgets uses a flag value for it, but it's set to the "correct" value by default:
http://docs.wxwidgets.org/trunk/regex_8 ... d6be56b171
Maybe OP needs to show his code, because the following worked for me:
https://stackoverflow.com/a/12667389
wxWidgets uses a flag value for it, but it's set to the "correct" value by default:
http://docs.wxwidgets.org/trunk/regex_8 ... d6be56b171
Maybe OP needs to show his code, because the following worked for me:
Code: Select all
wxRegEx rx("<tag>(.*?)</tag>", wxRE_ADVANCED);
if (rx.IsValid())
{
wxString text = wxT("jldjalkjs dlas jdl<tag>\n.\n.\n.\n</tag> lakdjals kj lskdlka sdlas");
if (rx.Matches(text))
{
wxLogMessage("match: %s", rx.GetMatch(text, 1) );
}
else
{
wxLogMessage("no match");
}
}
Use the source, Luke!
- eranon
- Can't get richer than this
- Posts: 867
- Joined: Sun May 13, 2012 11:42 pm
- Location: France
- Contact:
Re: wxRegEx ([\\s\\S]*) problem
OK, understood
In my previous message, I placed it at beginning because of this sentence in the doc you pointed:
In my previous message, I placed it at beginning because of this sentence in the doc you pointed:
Then, I tested in RegexBuddy against Tcl ARE syntax.An ARE may begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE. These supplement, and can override, any options specified by the application. The available option letters are:
[Ind. dev. - wxWidgets 3.0/3.1 under "Win 7 64-bit, TDM64-GCC" + "OS X 10.9, LLVM Clang"]
Re: wxRegEx ([\\s\\S]*) problem
Thanks, it works perfectly! To tell you the truth I never gave it a try as I was in a middle of something and didn't trust that "(.*?)" would capture newlines as well. Sorry for the inconvenience.doublemax wrote:Maybe OP needs to show his code, because the following worked for me:Code: Select all
wxRegEx rx("<tag>(.*?)</tag>", wxRE_ADVANCED); if (rx.IsValid()) { wxString text = wxT("jldjalkjs dlas jdl<tag>\n.\n.\n.\n</tag> lakdjals kj lskdlka sdlas"); if (rx.Matches(text)) { wxLogMessage("match: %s", rx.GetMatch(text, 1) ); } else { wxLogMessage("no match"); } }