Experience with running Exitool and Exiv2 as external command line utilities to extract/display or edit metadata from mainly JPGs as persuaded me that what I get coming in from these is not always a UTF-8 string. This is especially true for images which contain metadata which include - for my purposes - mainly names, locations, descriptions etc in some of the more common European languages. Of particular interest is Germany, though I must also at least consider some of the surrounding countries as well.
Some of these test were naturally done in the plain DOS shell, a modified DOS shell, supposedly able to handle UTF-8 as well as a number of Powershell variants. None, IIRC, worked even close to well, for the images I tested with.
In images where I have added or edited the data, that is not so much of a problem since I have control over the encoding, though even there I am still learning
Interfacing with Exiftool presents a specific problem in that, in order to avoid delay from loading the executable for every call, it has a feature which allows it to remain resident, but that makes it necessary to use code derived from the piped exec sample app, rather than simply a call to wxExec, which handles the gathering of the data returned from the utility by itself. (Though, your comments here may explain some of the issues I had when I was using wxExec in early development.)
When I looked over the code from the piped frame example, I was concerned that the plain stack resident character buffer of 4K bytes would not be sufficient for some of the lengthy output I was expecting and hence I felt I had to manage that part differently from the example which pumps the data directly (and without any conversion) to a text control. Even playing around with code derived from that sample, and using Exiftool as the executable invoked - with a variety of options - for testing - showed some issues, though I cannot be sure all the details at this time, without revisiting those tests for confirmation.
Converting everything incoming into UTF-8 at the lowest level had me concerned, because, unless I can make the assumption that the application will delivers all data in UTF-8 encoding - or that I can force the application to ensure that assumption is correct, inevitably, will lead to my ending up with data converted to UTF-8, which originally was some different encoding, most likely some ISO8859 variant.
For a worst case scenario, I will likely have to resort to a hex editor to confirm any suspicions.
At this stage, I have to accept you comment, that "A wxString should only contain Unicode characters".
What had me confused was the fact that there are a number of functions in the wxString class which say" wxString can be converted to:" and "Can be created from"
Adapting to this new outlook will take some work, but it still leaves my with the question how do I handle potential input not in UTF-8 encoding after the conversion at the low level routine. Do I need to verify the incoming text is UTF-8 (and how) and raise and exception if it isn't? ...?
In any case, I very much appreciate your time and explanation.