Strange problems with à â etc. Topic is solved

If you are using the main C++ distribution of wxWidgets, Feel free to ask any question related to wxWidgets development here. This means questions regarding to C++ and wxWidgets, not compile problems.
Post Reply
Wanderer82
Ultimate wxWidgets Guru
Ultimate wxWidgets Guru
Posts: 675
Joined: Tue Jul 26, 2016 2:00 pm

Strange problems with à â etc.

Post by Wanderer82 »

Hello

In my app I have a strange problem. I tried to use a text with accents like ^ oder `on letters. I copied it into a wxString and then an HTML file is generated in my app. I did this with three texts which all contained these accented letters. With two of them I got the correct accent in Edge, but with one of them I had a strange sign for à and ê. Interestinlgy it didn't happen with é letters. I then tried to put "à" instead of à but only for one à letter. Funny thing was, that the other accented letters then also showed up correctly.

That was yesterday and today I tried to reproduce the problem but now all three texts are correct. I'm quite confused. Is this something maybe about the html-file format (like Unicode-8 etc.)? But then I wonder why it seems to work only by chance.

If I create a text file with the .html ending, what is the encoding usually and how could I influence it?

Thanks,
Thomas
User avatar
doublemax
Moderator
Moderator
Posts: 19160
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: Strange problems with à â etc.

Post by doublemax »

How exactly do you create and save the HTML file? IOW: Which encoding does it use?

Assuming it's UTF-8, the charset should be defined in the header:

Code: Select all

<!doctype html>
<html>
	<head>
		<meta charset="utf-8">
		<title>some title</title>
	</head>  
 
	<body>

	</body>
</html>
Use the source, Luke!
Wanderer82
Ultimate wxWidgets Guru
Ultimate wxWidgets Guru
Posts: 675
Joined: Tue Jul 26, 2016 2:00 pm

Re: Strange problems with à â etc.

Post by Wanderer82 »

I don't do anything special when creating the file, just a file with the .html ending:

Code: Select all

wxString HtmlStringFinal = CreateHtmlString();

    std::ofstream htmlFile;

    htmlFile.open("C:\\users\\thomas\\documents\\programmprojekte\\birkenbihl-test\\testhtml.html");
    htmlFile << HtmlStringFinal;
    htmlFile.close();
But I'll try your suggestion right now. I only wonder why this only happens erratically.

EDIT:

I just tried your suggestion but now it's even worse: Now all the special letters are shown as a question mark in a tilte black rectangle.
User avatar
doublemax@work
Super wx Problem Solver
Super wx Problem Solver
Posts: 474
Joined: Wed Jul 29, 2020 6:06 pm
Location: NRW, Germany

Re: Strange problems with à â etc.

Post by doublemax@work »

Code: Select all

htmlFile << HtmlStringFinal
It's unclear to me what kind of conversion happens here.

Can you try this:

Code: Select all

#include <wx/file.h>

wxFile htmlFile("C:\\users\\thomas\\documents\\programmprojekte\\birkenbihl-test\\testhtml.html", wxFile::write);
if( htmlFile.IsOpened() ) {
  htmlFile.Write( HtmlStringFinal, wxConvUTF8 );
  htmlFile.Close();
}
If this doesn't work, please upload one of the generated HTML files.
Wanderer82
Ultimate wxWidgets Guru
Ultimate wxWidgets Guru
Posts: 675
Joined: Tue Jul 26, 2016 2:00 pm

Re: Strange problems with à â etc.

Post by Wanderer82 »

Your suggestion works fine.

HtmlStringFinal is a wxString which is created in a function. This is how the content of HtmlStringFinal is created:

Code: Select all

wxString CreateHtmlString()
{
    wxString HtmlString;
    wxString ZeilenAbstandInPixel = "6";
    wxString AbsatzAbstandInPixel = "14";
    wxString Schriftart = "Calibri";
    wxString string_WortAbstandInPixel;
    string_WortAbstandInPixel << WortAbstandInPixel;
    wxString fontSize = "12";


    HtmlString = "<!doctype html><html><head><meta charset=\"utf-8\"><style type=\"text/css\">td{padding-right:14px;padding-bottom:" + ZeilenAbstandInPixel + "px;font-family:" + Schriftart + ";font-size:" + fontSize + "pt !important;white-space: nowrap;}";
    HtmlString = HtmlString + "table{padding-right:0;padding-bottom:" + AbsatzAbstandInPixel + "px;font-family:" + Schriftart + ";font-size:" + fontSize + "pt;}";
    HtmlString = HtmlString + "body{margin:0px;padding:0px;overflow:hidden;}</style></head><body>";

    int wordCount = 0;
    int int_AnzahlSpalten = 0;
    int int_FlexGridSizerCount = 0;

    while(wordCount < totalWords)
    {
        HtmlString = HtmlString + "<table><tr>";

        while(int_AnzahlSpalten < AnzahlSpaltenStaticTexts[int_FlexGridSizerCount])
        {
            HtmlString = HtmlString + "<td>" + separateWords[wordCount] + "</td>";
            int_AnzahlSpalten = int_AnzahlSpalten + 1;
            wordCount = wordCount + 1;
        }

        int_AnzahlSpalten = 0;
        wordCount = wordCount - AnzahlSpaltenStaticTexts[int_FlexGridSizerCount];

        HtmlString = HtmlString + "</tr><tr>";

        while(int_AnzahlSpalten < AnzahlSpaltenStaticTexts[int_FlexGridSizerCount])
        {
            HtmlString = HtmlString + "<td><b>" + HtmlData1.ComboBox[wordCount] + "</b></td>";
            int_AnzahlSpalten = int_AnzahlSpalten + 1;
            wordCount = wordCount + 1;
        }

        int_AnzahlSpalten = 0;
        int_FlexGridSizerCount = int_FlexGridSizerCount + 1;

        HtmlString = HtmlString + "</b></tr></table>";
    }

    HtmlString = HtmlString + "</body></html>";

    return HtmlString;
}
Before your suggestion without the parts <!doctype html> and <meta charset=\"utf-8\">. So how do I know what encoding the file is created with if I use my original version? But I guess, your version is the safest one, right?
User avatar
doublemax@work
Super wx Problem Solver
Super wx Problem Solver
Posts: 474
Joined: Wed Jul 29, 2020 6:06 pm
Location: NRW, Germany

Re: Strange problems with à â etc.

Post by doublemax@work »

I guess in your version the local encoding of your machine is used, probably "iso-8859-1". Try using that as value for the charset.

Using UTF-8 is safer though, as it can represent all Unicode characters.
Wanderer82
Ultimate wxWidgets Guru
Ultimate wxWidgets Guru
Posts: 675
Joined: Tue Jul 26, 2016 2:00 pm

Re: Strange problems with à â etc.

Post by Wanderer82 »

Hm okay, so is it advisable to always convert texts which could contain any of the "special" characters to UTF-8 or is this only important for html files?

Your version uses wxFile. I guess, there is no real difference to using my version... except, I assume, that this conversion to UFT-8 only works with wxFile?
User avatar
doublemax
Moderator
Moderator
Posts: 19160
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: Strange problems with à â etc.

Post by doublemax »

Wanderer82 wrote: Tue Jan 31, 2023 9:12 pm Hm okay, so is it advisable to always convert texts which could contain any of the "special" characters to UTF-8 or is this only important for html files?
If it's possible that the text contains non-ascii characters, it should be saved as UTF-8. It almost guarantees that it can be read and displayed correctly on any platform.
Wanderer82 wrote: Tue Jan 31, 2023 9:12 pm Your version uses wxFile. I guess, there is no real difference to using my version... except, I assume, that this conversion to UFT-8 only works with wxFile?
I used wxFile because i knew that it has no internal hidden functionality that might change the result. I don't know that about std::ofstream
Use the source, Luke!
Wanderer82
Ultimate wxWidgets Guru
Ultimate wxWidgets Guru
Posts: 675
Joined: Tue Jul 26, 2016 2:00 pm

Re: Strange problems with à â etc.

Post by Wanderer82 »

Alright.

Having had a look at wxFile I noticed that there is an option "write_excl " which is said to be useful for opening files being vulnerable to race conditions. So I had this problem in another app where different users can write to or read from a database file. I then ended up in using a lockfile to make sure that two users aren't accessing the file at the same time. So, as I've seen this functionality of wxFile, I wonder if there is any catch? I mean it seems so easy using this option, so why would someone ever use a more complicated lockfile solution? I know that @Doublemax, you have already mentioned this in my earlier thread and somehow I just skipped that.
User avatar
doublemax
Moderator
Moderator
Posts: 19160
Joined: Fri Apr 21, 2006 8:03 pm
Location: $FCE2

Re: Strange problems with à â etc.

Post by doublemax »

I'm definitely not a big fan of using a flat textfile that multiple processes write to. It would be better to use a simple database like Sqlite for this. But if you're using a textfile, using the write_excl flag will make it more robust.
Use the source, Luke!
Post Reply