Are there tests for alphanumeric but including extended latin unicode?

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
User avatar
bsenftner
Experienced Solver
Experienced Solver
Posts: 85
Joined: Thu May 26, 2016 9:19 pm

Are there tests for alphanumeric but including extended latin unicode?

Post by bsenftner »

I have an international application, and I am laying in the unicode handling now. The application has people and place locations in wxStrings, and "people list" names, as well as "location list" names. I would like to restrict these end-user created "names" to be "alphanumeric", plus the ASCII under-bar character. Where I mean "alphanumeric" to include the extended Latin alphabet of "letters" and "numbers", no spaces, no tabs, nor "symbols" (like currency characters, copyright characters and so on are not allowed, nor are the weird non-standard characters that look too much like ASCII characters [and are used to fraud people].)

Is there a routine, set of routines or anything along these lines? Seems to be an ignored issue.
User avatar
bsenftner
Experienced Solver
Experienced Solver
Posts: 85
Joined: Thu May 26, 2016 9:19 pm

Re: Are there tests for alphanumeric but including extended latin unicode?

Post by bsenftner »

No answers yet, but for interested people:

* Where American English only has upper case and lower case, extended Latin also has "title case", a 2nd form of upper case reserved for use in formal names?

* Can anyone explain what "fold case" is? Is that a 4th form of a character, like upper, lower and "title" cases?

* This page here has a useful explainer of the situation for a developer: https://www.regular-expressions.info/unicode.html

Sorry if some of these are basic questions; I've only worked with "American English" and Asian langauges in the past.
ONEEYEMAN
Part Of The Furniture
Part Of The Furniture
Posts: 7459
Joined: Sat Apr 16, 2005 7:22 am
Location: USA, Ukraine

Re: Are there tests for alphanumeric but including extended latin unicode?

Post by ONEEYEMAN »

Hi,
You basically have 2 choices

1. Create a custom validator
or
2. Catch the character typed with EVTKEY_DOWN and filter the input there.

Also for wxComboBox there is an option to allow to type the choices filtering non-existent ones.

Thank you.
User avatar
doublemax
Moderator
Moderator
Posts: 19116
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: Are there tests for alphanumeric but including extended latin unicode?

Post by doublemax »

wxWidgets has nothing suitable for this. There is wxIsalpha(c), but it only works with the current locale.

Usually you would need ICU for it, but it's a real beast, so maybe utf8proc is sufficient for your purpose. https://github.com/JuliaStrings/utf8proc

I haven't tested it, but i would try the method:

Code: Select all

utf8proc_category_t utf8proc_category(utf8proc_int32_t codepoint);
And check if the return value is any of these:

Code: Select all

UTF8PROC_CATEGORY_LU  = 1, /**< Letter, uppercase */
UTF8PROC_CATEGORY_LL  = 2, /**< Letter, lowercase */
UTF8PROC_CATEGORY_LT  = 3, /**< Letter, titlecase */
UTF8PROC_CATEGORY_LM  = 4, /**< Letter, modifier */
UTF8PROC_CATEGORY_LO = 5, /**< Letter, other */
https://github.com/JuliaStrings/utf8pro ... utf8proc.h
Use the source, Luke!
Post Reply