Problem (sometimes) reading utf-8 file
Problem (sometimes) reading utf-8 file
Hi,
In my code, I have a place where I read in an arbitrary, user-specified file. This works fine on ascii files, but quietly does nothing on utf-8 files that have "funky", non-ascii, characters (e.g., upside down question mark rather than apostrophe). If I strip out all of the funky characters, it reads in fine. So, I wrote a very simple program (posted below) to analyze this problem specifically, and... everything works. It reads in the "funky" file with no problems. I included all the same headers in the simple program as are in my real program, but otherwise made no wxwidgets calls. The actual calls for opening/reading the files are identical to my real program.
So, I'm confused. I'm thinking perhaps somewhere along the way in my real program, wxwidgets is doing something with the locale setting, but that's just a total guess. Any help much appreciated.
I've attached the simple program (that works, funky chars or no).
In my code, I have a place where I read in an arbitrary, user-specified file. This works fine on ascii files, but quietly does nothing on utf-8 files that have "funky", non-ascii, characters (e.g., upside down question mark rather than apostrophe). If I strip out all of the funky characters, it reads in fine. So, I wrote a very simple program (posted below) to analyze this problem specifically, and... everything works. It reads in the "funky" file with no problems. I included all the same headers in the simple program as are in my real program, but otherwise made no wxwidgets calls. The actual calls for opening/reading the files are identical to my real program.
So, I'm confused. I'm thinking perhaps somewhere along the way in my real program, wxwidgets is doing something with the locale setting, but that's just a total guess. Any help much appreciated.
I've attached the simple program (that works, funky chars or no).
Re: Problem (sometimes) reading utf-8 file
Hi,
Can you reproduce the problem in the minimal sample?
If you can - can you post a diff to the minimal sample that reproduce the problem?
Thank you.
Can you reproduce the problem in the minimal sample?
If you can - can you post a diff to the minimal sample that reproduce the problem?
Thank you.
Re: Problem (sometimes) reading utf-8 file
Thank you for the quick response (on a weekend!).
No, I was unable to reproduce the "failed" (actually, it quietly does nothing) load of a utf-8 file in the minimal sample.
I should have included the actual code from my real app, but it is completely without context, so I didn't know if it would confuse things, but I'm including it below. This is the callback on a button to "extract" stuff from a file. Again, it works fine with pure ascii files, but quietly doesn't load a utf-8 file if it has non-ascii characters (no poroblem if no non-ascii characters). Yet, the simple program I uploaded has no problem with the non-ascii file...
--
void MyAddRemoveList::OnExtract(wxCommandEvent& event) {
wxArrayInt id_to_extract;
ostringstream ss;
// Assumes the File list is single select.
if (list->GetSelections(id_to_extract)) {
if (list_t == list_types::files) {
ifstream name(list->GetString(id_to_extract[0]));
if (!name) {
wxMessageBox("No file specified, or file empty.", "Error",
wxOK | wxICON_INFORMATION);
}
else {
ss << name.rdbuf();
MyAcceptIgnoreDlg(ss.str());
}
}
}
}
No, I was unable to reproduce the "failed" (actually, it quietly does nothing) load of a utf-8 file in the minimal sample.
I should have included the actual code from my real app, but it is completely without context, so I didn't know if it would confuse things, but I'm including it below. This is the callback on a button to "extract" stuff from a file. Again, it works fine with pure ascii files, but quietly doesn't load a utf-8 file if it has non-ascii characters (no poroblem if no non-ascii characters). Yet, the simple program I uploaded has no problem with the non-ascii file...
--
void MyAddRemoveList::OnExtract(wxCommandEvent& event) {
wxArrayInt id_to_extract;
ostringstream ss;
// Assumes the File list is single select.
if (list->GetSelections(id_to_extract)) {
if (list_t == list_types::files) {
ifstream name(list->GetString(id_to_extract[0]));
if (!name) {
wxMessageBox("No file specified, or file empty.", "Error",
wxOK | wxICON_INFORMATION);
}
else {
ss << name.rdbuf();
MyAcceptIgnoreDlg(ss.str());
}
}
}
}
Re: Problem (sometimes) reading utf-8 file
Hi,
Did you try to debug it?
Where the failure occurred?
Thank you.
Did you try to debug it?
Where the failure occurred?
Thank you.
Re: Problem (sometimes) reading utf-8 file
So, there was no actual failure. The following line just returned with an empty result. It basically just skipped reading the file, with an empty result, with no failure. Yet, if I edit out the non-ascii characters, it slurps it in just fine. And, again, my simple program that I included has no problem reading in the file with the non-ascii chars. It's just when I attempt the same type of read of a file within my real wxwidgets app, it quietly does nothing.
ss << name.rdbuf();
ss << name.rdbuf();
Re: Problem (sometimes) reading utf-8 file
Hi,
Can you put the code that doesn't work inside the code that works, build and run it - what happens?
Also, what is you platform and wxWidgets version?
Thank you.
Can you put the code that doesn't work inside the code that works, build and run it - what happens?
Also, what is you platform and wxWidgets version?
Thank you.
Re: Problem (sometimes) reading utf-8 file
I just tried that (putting the code from my real program into the simple program), and it still works - i.e. it still reads the file with the non-ascii characters with no problem.
I'm running wxWidgets 3.0, POP Os 22.04. I should have mentioned that I'm pretty new to both C++ and wxWidgets,
I'm running wxWidgets 3.0, POP Os 22.04. I should have mentioned that I'm pretty new to both C++ and wxWidgets,
Re: Problem (sometimes) reading utf-8 file
In that case problem somewhere else - try to create small code to reproduce the behaviour.
Re: Problem (sometimes) reading utf-8 file
If you get "funky" characters or "?", it's an encoding problem. And if you're sure that the input data is UTF-8, it means that no UTF-8 decoding took place. However this usually leads to some results that are easy to recognize once you've seen them a couple of times.
The German umlaut "ö" was replaced by the UTF-8 encoding which needs two bytes.
Is that what you're getting?
Code: Select all
Morgenröte -> Morgenröte
Is that what you're getting?
Use the source, Luke!
Re: Problem (sometimes) reading utf-8 file
Thanks for the response.
Actually I'm working with two different examples - one file has the upside down question mark in place of an apostrophe, and one file has several actual utf-8 encoded fractions (i.e. not just number-slash-number like 1/2).
Both of these files can be read in successfully using my paired down example that I posted here, but neither can be read from basically the same line of code in my real program. The read doesn't fail, it just quietly moves on leaving an empty "ss.str()" (from "ss << name.rdbuf();").
I output the "uncorrected" files from my sample program, and the upside down question mark and the real fractions are there, as expected.
If I edit out or change these characters, they then are read in with no problem in my real program.
The only difference between the two programs is my real program is using wxWidgets and making lots of wx-related calls, whereas my example posted here is not. Well, I am also making some postgres calls (through libpqxx). Needless to say, I'm pretty lost here...
Actually I'm working with two different examples - one file has the upside down question mark in place of an apostrophe, and one file has several actual utf-8 encoded fractions (i.e. not just number-slash-number like 1/2).
Both of these files can be read in successfully using my paired down example that I posted here, but neither can be read from basically the same line of code in my real program. The read doesn't fail, it just quietly moves on leaving an empty "ss.str()" (from "ss << name.rdbuf();").
I output the "uncorrected" files from my sample program, and the upside down question mark and the real fractions are there, as expected.
If I edit out or change these characters, they then are read in with no problem in my real program.
The only difference between the two programs is my real program is using wxWidgets and making lots of wx-related calls, whereas my example posted here is not. Well, I am also making some postgres calls (through libpqxx). Needless to say, I'm pretty lost here...
Re: Problem (sometimes) reading utf-8 file
You need to find the exact line where it fails. If you can't use a debugger to single-step through the code, put log outputs at crucial parts of the code.
Use the source, Luke!
Re: Problem (sometimes) reading utf-8 file
Not sure if anyone is still looking at this...
I have written a very simple wxWidgets app that exhibits the same issue as my main program. I'm uploading this along with the simple code I originally uploaded for this question (thankfully renamed). I'm also uploading makefiles for the two programs, and data files.
"badprog" will first read in the "gooddata" file and output the results, followed by an attempt to read in the "baddata" file, but will quietly do nothing.
"goodprog" successfully reads in both of the files. The difference between the two is that while they're both compiled and linked with wxWidgets, badprog is an actual wxWidgets app, whereas good prog makes no actual wxWidgets calls.
If anyone is so inclined, these are both tiny, should compile w/o problems, and the data file names are embedded, so all one need do is compile and run them. The two data files are identical, except "baddata" has an upside down question mark in place of an apostrophe, whereas "gooddata" has the apostrophe. (Note I had to add .txt to the makefiles and data files in order to upload them).
I'm sure this will end up not being the case, but at this point, this seems like a wxWidgets issue to me (probably me doing something wrong with wxWidgets?).
Here are the files I will upload:
badprog.cpp
badmake.txt
baddata.txt
(It seems I can only upload 3 files at a time, so I'll upload these in a followup post).
goodprog.cpp
goodmake.txt
gooddata.txt
I have written a very simple wxWidgets app that exhibits the same issue as my main program. I'm uploading this along with the simple code I originally uploaded for this question (thankfully renamed). I'm also uploading makefiles for the two programs, and data files.
"badprog" will first read in the "gooddata" file and output the results, followed by an attempt to read in the "baddata" file, but will quietly do nothing.
"goodprog" successfully reads in both of the files. The difference between the two is that while they're both compiled and linked with wxWidgets, badprog is an actual wxWidgets app, whereas good prog makes no actual wxWidgets calls.
If anyone is so inclined, these are both tiny, should compile w/o problems, and the data file names are embedded, so all one need do is compile and run them. The two data files are identical, except "baddata" has an upside down question mark in place of an apostrophe, whereas "gooddata" has the apostrophe. (Note I had to add .txt to the makefiles and data files in order to upload them).
I'm sure this will end up not being the case, but at this point, this seems like a wxWidgets issue to me (probably me doing something wrong with wxWidgets?).
Here are the files I will upload:
badprog.cpp
badmake.txt
baddata.txt
(It seems I can only upload 3 files at a time, so I'll upload these in a followup post).
goodprog.cpp
goodmake.txt
gooddata.txt
Re: Problem (sometimes) reading utf-8 file
Here are the three "good" files.
Re: Problem (sometimes) reading utf-8 file
For me the "badcode" did not silently fail, it displayed the content of both text files, just not UTF8 decoded.
Adding this also displays the "baddata" content correctly.
Code: Select all
wxString s(buff2.str().c_str(), wxConvUTF8);
wxMessageBox(s, "utf8-decoded");
Use the source, Luke!
Re: Problem (sometimes) reading utf-8 file
OK, thank you! Do you think my problem could be a versioning issue? I'm using wxWidgets 3.0.
I will use your suggestion. Thanks again.
I will use your suggestion. Thanks again.