how to extract words ? Topic is solved

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
anonbeat
Earned a small fee
Earned a small fee
Posts: 21
Joined: Tue Jul 01, 2008 10:48 am
Contact:

how to extract words ?

Post by anonbeat » Mon Nov 10, 2008 3:00 pm

Hello,
I need to extract words from a wxString but with special case for the " and treating the content between two " as a word.
for example the wxString "One Two "Three Four" Five" should be converted into :
One
Two
Three Four
Five

Ive tried with wxStringTokenizer without success.
Could you please let me know if there is an obvious way to do it ?

Thanks in advance
:?:
Last edited by anonbeat on Wed Nov 12, 2008 7:26 am, edited 1 time in total.

anonbeat
Earned a small fee
Earned a small fee
Posts: 21
Joined: Tue Jul 01, 2008 10:48 am
Contact:

Post by anonbeat » Tue Nov 11, 2008 8:59 am

For now I have this code that solves what I need.

Code: Select all

    wxArrayString Words;
    wxString SearchStr = SearchTextCtrl->GetLineText( 0 );
    wxString ResStr;
    size_t index, len;
    wxRegEx RegEx( wxT( " *([^ ]*|\\" *[^\\"]* *\\") *" ) );
    while( SearchStr.Length() && RegEx.Matches( SearchStr ) )
    {
        RegEx.GetMatch( &index, &len );
        Words.Add( RegEx.GetMatch( SearchStr, 1 ) );
        SearchStr = SearchStr.Mid( len );
    }

Auria
Site Admin
Site Admin
Posts: 6695
Joined: Thu Sep 28, 2006 12:23 am
Contact:

Post by Auria » Tue Nov 11, 2008 3:16 pm


vsp
Knows some wx things
Knows some wx things
Posts: 35
Joined: Mon Feb 21, 2005 12:52 pm

Post by vsp » Wed Nov 12, 2008 3:56 am

you should use wxStringTokenizer to tokenize the strings based on your choice of delimiter.

Code: Select all

wxStringTokenizer tkz(wxT("first:second:third:fourth"), wxT(":"));
while ( tkz.HasMoreTokens() )
{
    wxString token = tkz.GetNextToken();

    // process token here
}

anonbeat
Earned a small fee
Earned a small fee
Posts: 21
Joined: Tue Jul 01, 2008 10:48 am
Contact:

Post by anonbeat » Wed Nov 12, 2008 7:24 am

What I need is extract search words from a input text control. The user can type a single word, some words or some words with some of them enclosed with " chars.
If enclosed with " chars the words will be treated like a literal and should be searched as it is even if it has spaces in it.
So the separator is the space but spaces are allowed if enclosed with ".
Please if you know how to do it better than I did it let me know

Thanks in advance

Grrr
Earned some good credits
Earned some good credits
Posts: 126
Joined: Fri Apr 11, 2008 8:48 am
Location: Netherlands

Post by Grrr » Wed Nov 12, 2008 10:43 am

I think using wxRegExp is a bit too much for this.

You could just for-loop through all characters in the string. Add characters to a temporary string until you get a seperator (space or quote). Then store the temporary string as a search term. Keep a boolean to indicate if you are inside quotes or not. If you are, spaces shouldn't count as seperators. Continue until the end of the input string.

Note that continuously adding characters to a string is not very effecient so maybe use wxStringBuffer for better performance. Or preallocate a large enough buffer with wxString::Alloc().

Post Reply