wxPdfDocument and Czech characters

Talk here about issues with one of the components hosted at wxCode, or suggest features for it.
Post Reply
erektus
In need of some credit
In need of some credit
Posts: 3
Joined: Fri Jan 11, 2013 3:53 pm

wxPdfDocument and Czech characters

Post by erektus »

Hi guys,

I want to use pdf output in my program, where I have to use czech characters (ěščřžýáíéůú etc..), but I'm still not able to do that. I tried lot of variants, but without any success, could anyone help me please?

Here is my test code: (as you can see, I tried lot of different encodings but all of them gives me still same output

Code: Select all

wxPdfFontManager* fontManager = wxPdfFontManager::GetFontManager();
wxPdfFont  mujfont     = fontManager->GetFont(wxT("Arial"),     wxPDF_FONTSTYLE_BOLD);

  AddPage();
  SetFont(mujfont, wxPDF_FONTSTYLE_REGULAR, 16);
  Cell(0,10,wxT("This is font  (WinAnsi encoding)ěščřžýáíé"));
  Ln(10);
  Cell(0,10,str);
  Ln(10);
  Cell(0,10,str2);
  Ln(10);
  mujfont.SetEncoding(wxT("cp-1251"));
  SetFont(mujfont, wxPDF_FONTSTYLE_REGULAR, 16);
  Cell(0,10,wxT("This is font (CP-1251 encoding)ěščřžýáíé"));
  Ln(10);
  Cell(0,10,cyrillicText);
  Ln(10);
  Cell(0,10,str);
  Ln(10);
  Cell(0,10,str2);
  Ln(10);
  mujfont.SetEncoding(wxT("Windows-1250"));
  SetFont(mujfont, wxPDF_FONTSTYLE_REGULAR, 16);
  Cell(0,10,wxT("This is font  (windows 1250 encoding)ěščřžýáíé"));
  Ln(10);
  Cell(0,10,cyrillicText);
  Ln(10);
  Cell(0,10,str);
  Ln(10);
  Cell(0,10,str2);
  Ln(10);
	mujfont.SetEncoding(wxT("UTF-8"));
  SetFont(mujfont, wxPDF_FONTSTYLE_REGULAR, 16);
  Cell(0,10,wxT("This is font (utf8 encoding)ěščřžýáíé"));
  Ln(10);
  Cell(0,10,cyrillicText);
  Ln(10);
  Cell(0,10,str);
  Ln(10);
  Cell(0,10,str2);
  Ln(10);
  mujfont.SetEncoding(wxT("iso-8859-2"));
  SetFont(mujfont, wxPDF_FONTSTYLE_REGULAR, 16);
  Cell(0,10,wxT("This is font (iso-8859-2 encoding)ěščřžýáíé"));
  Ln(10);
  Cell(0,10,cyrillicText);
  Ln(10);
  Cell(0,10,str);
  Ln(10);
  Cell(0,10,str2);
  Ln(10);

This is output that I get :-(
This is font (WinAnsi encoding)?š??žýáíé
?????µ??
?›?????™???????-?-?©
This is font (CP-1251 encoding)?š??žýáíé
Tchaikovsky ?š??žýáíé- ????o?????
?????µ??
?›?????™???????-?-?©
This is font (windows 1250 encoding)?š??žýáíé
Tchaikovsky ?š??žýáíé- ????o?????
?????µ??
?›?????™???????-?-?©
This is font (utf8 encoding)?š??žýáíé
Tchaikovsky ?š??žýáíé- ????o?????
?????µ??
?›?????™???????-?-?©
This is font (iso-8859-2 encoding)?š??žýáíé
Tchaikovsky ?š??žýáíé- ????o?????
?????µ??
?›?????™???????-?-?©
Thanks a lot in advance.
Tomáš
utelle
Moderator
Moderator
Posts: 1125
Joined: Tue Jul 05, 2005 10:00 pm
Location: Cologne, Germany
Contact:

Re: wxPdfDocument and Czech characters

Post by utelle »

erektus wrote:I want to use pdf output in my program, where I have to use czech characters (ěščřžýáíéůú etc..), but I'm still not able to do that. I tried lot of variants, but without any success, could anyone help me please?
The answer depends (partially) on which version and which build type of wxWidgets you use. If you are using the ANSI build of wxWidgets 2.8.x then it gets quite complicated to output non-ANSI characters to PDF as you have to handle the correct encoding yourself and have maybe to prepare font files for accomplishing this. Therefore I strongly recommend to use a Unicode build of wxWidgets to handle non-ANSI text.
erektus wrote:Here is my test code: (as you can see, I tried lot of different encodings but all of them gives me still same output
Just as a note if you used the ANSI build: the wxPdfDocument encoding feature is only supported for the Unicode builds of wxWidgets.
erektus wrote:

Code: Select all

wxPdfFontManager* fontManager = wxPdfFontManager::GetFontManager();
wxPdfFont  mujfont     = fontManager->GetFont(wxT("Arial"),     wxPDF_FONTSTYLE_BOLD);

  AddPage();
  SetFont(mujfont, wxPDF_FONTSTYLE_REGULAR, 16);
  Cell(0,10,wxT("This is font  (WinAnsi encoding)ěščřžýáíé"));
Well, maybe the main problem is how you initialize your strings. In Unicode build wxString variables store Unicode characters, that is, you have to use the wxString constructor which uses an appropriate character conversion class to convert the encoding used for your C++ source file to Unicode. I have no idea in which encoding your source file is written - maybe codepage 1252 (which does not include Czech characters) or maybe codepage 1250 (if that's the default codepage of your system. You have to tell the wxString constructor which encoding you are using!

Maybe the following code would work for you:

Code: Select all

  wxCSConv conv_1250(wxT("cp-1250"));
  wxString text("ěščřžýáíé", conv_1250);
Change "cp-1250" to the codepage that is used for your C++ source code.

But this is only one part of the solution. The next required step is to use a font which contains glyphs for all characters you use in your text. If you do not register a font file explicitly the font name "Arial" is equivalent to the PDF core font "Helvetica" (which definitely does not contain all Czech characters). That is, you have to explicitly register an appropriate font using one of the wxPdfFontManager methods (like RegisterFont, RegisterFontDirectory or RegisterSystemFonts).

Regards,

Ulrich
erektus
In need of some credit
In need of some credit
Posts: 3
Joined: Fri Jan 11, 2013 3:53 pm

Re: wxPdfDocument and Czech characters

Post by erektus »

Hi Ulrich,

thanks for your response.

I'm using UNICODE build for Windows of wxWidgets 2.9.3 and my sourcefile coding is UTF-8.

I did print function in same project, where I have no issue with character coding without any wxString conversion and font registering (probably needed only for pdf).

Printing code looks like this

Code: Select all

wxFont fontNadpis(18, wxFONTFAMILY_SWISS, wxNORMAL, wxBOLD, true, _("Arial"), wxFONTENCODING_DEFAULT);
dc->SetFont(fontNadpis);
dc->DrawText(wxT("ěščřžýáíé"),10,30);
and I do not have any issue with any czech characters with Arial or Times new roman fonts (I didn't test other fonts).

Could you please help me? I'm not sure how to use systemfontregistering and so on (I do not want to distribute my exe file with another font file). Documentation is very brief.

Strange thing for me is that I used few different encodings for fonts and wxStrings, but all my outputs were same - see my previous post. What did I wrong?

Thanks for help
Tomáš
utelle
Moderator
Moderator
Posts: 1125
Joined: Tue Jul 05, 2005 10:00 pm
Location: Cologne, Germany
Contact:

Re: wxPdfDocument and Czech characters

Post by utelle »

erektus wrote:I'm using UNICODE build for Windows of wxWidgets 2.9.3 and my sourcefile coding is UTF-8.
Good. This makes handling non-ANSI characters with wxPdfDocument a lot easier.
erektus wrote:I did print function in same project, where I have no issue with character coding without any wxString conversion and font registering (probably needed only for pdf).
For wxString constants in source code it's usually recommended to use a specific encoding for non-ANSI characters. Otherwise your program is not portable. Even another user using a Windows version with different default encoding might experience difficulties.

Even if your source code encoding is UTF-8 the default wxWidgets build for Windows needs to convert string constants as internally Windows uses UCS2 (UTF-16). Take a look at function twoEncodings in tutorial7 sample coming with wxPdfDocument how the cyrillic string constant is defined using wxConvUTF8 in the wxString constructor. It could be that wxWidgets 2.9.x does some magic behind the scenes to convert strings, but I wouldn't rely on this magic.

Regarding font registering I already explained that if you do not take special action "Arial" refers to the PDF core font Helvetica. And the PDF core fonts only support western encodings - wxPdfDocument can't do anything about that.
erektus wrote:Printing code looks like this

Code: Select all

wxFont fontNadpis(18, wxFONTFAMILY_SWISS, wxNORMAL, wxBOLD, true, _("Arial"), wxFONTENCODING_DEFAULT);
dc->SetFont(fontNadpis);
dc->DrawText(wxT("ěščřžýáíé"),10,30);
and I do not have any issue with any czech characters with Arial or Times new roman fonts (I didn't test other fonts).
wxFont does all the work for you: wxFont is associated with a Windows font which supports the default encoding (in your case an encoding supporting Czech characters).

If you use only fonts you are able to select using a wxFont constructor, then the easiest method to get proper results is to use just that same font for PDF printing. That is, use method

bool SetFont (const wxFont&font)

of the wxPdfDocument class and pass in your wxFont object. If you do so, wxPdfDocument won't use a PDF core font but the Windows font associated with the wxFont object. Thereafter no special measurements should be required to ouput Czech characters to PDF.
erektus wrote:Could you please help me? I'm not sure how to use systemfontregistering and so on (I do not want to distribute my exe file with another font file). Documentation is very brief.
If you restrict your application to default fonts, use the approach mentioned above.

RegisterSystemFonts registers all fonts which are registered in the operating system (on Windows usually all fonts residing in the Windows font subdirectory). That makes it possible to show for example a font selection box for the user. It doesn't solve your problem, if the font you want to use is not available on the system where your application is running.
erektus wrote:Strange thing for me is that I used few different encodings for fonts and wxStrings, but all my outputs were same - see my previous post. What did I wrong?
Well, as the characters you want to print are not available in the core font you see questionmarks displayed for them. To analyze what exactly was written to PDF I would need to inspect the resulting PDF file (but that would also require that you call SetCompression(false) on the wxPdfDocument object right at the beginning).

But first I would recommend you set the font for wxPdfDocument as explained above and test whether this already solves your problem.

Regards,

Ulrich
erektus
In need of some credit
In need of some credit
Posts: 3
Joined: Fri Jan 11, 2013 3:53 pm

Re: wxPdfDocument and Czech characters

Post by erektus »

Hi Ulrich,

thanks for your help. I think I find solution base on your comments.

I can use wxFont, but I have to register it, before I can use it, then it works fine.

Code: Select all

wxFont fontText(9, wxFONTFAMILY_SWISS, wxNORMAL, wxNORMAL, false, wxT("Arial"), wxFONTENCODING_DEFAULT);
wxPdfFontManager* fontManager = wxPdfFontManager::GetFontManager();
fontManager->RegisterFont(fontText);
SetFont(fontText);
TextBox(0,0,wxT("ěščřžýáíé"),wxPDF_ALIGN_LEFT,wxPDF_ALIGN_TOP,wxPDF_BORDER_NONE,0);
Regards,
Tomáš
utelle
Moderator
Moderator
Posts: 1125
Joined: Tue Jul 05, 2005 10:00 pm
Location: Cologne, Germany
Contact:

Re: wxPdfDocument and Czech characters

Post by utelle »

erektus wrote:I can use wxFont, but I have to register it, before I can use it, then it works fine.
Yes, you are right. Sorry for missing this detail.

Usually, it's not necessary to explicitly register a wxFont, but in this case wxPdfDocument decides based on the wxFont facename "Arial" to take the PDF core font "Helvetica" and to not register the wxFont. To circumvent this behaviour you have to explicitly register the wxFont before use as you observed.

Regards,

Ulrich
Post Reply