Archiwum

Posts Tagged ‘iText’

RtfTemplate and character encoding

2010-02-03 Komentarze wyłączone

RtfTemplate is quite interesting tool, that enables processing MsOffice serial correspondence (mail merge) templates in java code. The source and the output is in RTF format. It is very good to quickly and easily prepare prints in application. What is more, the prints can be prepared by non-technical users, teached only in using MsOffice (which is common skill by office workers).

Hovewer, there is one serious problem with the tool. It lacks the ability to property encode characters. The data put into template is not displayed properly, not counting those cases, where data is 7-bit ASCII only. However, this was propably the test case for authors, so the result is the tool, that is very useful, but only for the small part of the world.

The solution is not elegant, but quite simple. It requires using another tool, which is free: iText. iText has build-in functionality to print into RTF documents, and its class RtfDocument has methods for encoding characters according to RTF standard. The only thing you have to do, is to use it to encode the data you are going to print. The minus is, that you have to do it for all the data you send to template printer, no matter how deep in the structure sent they are. The function to call is quite simple. This is the sample how to use it:

22     public static String escape(String sentence) {
23         ByteArrayOutputStream baos = new ByteArrayOutputStream();
24         try {
25             new RtfDocument().filterSpecialChar(baos, sentence, true, true);
26         catch (IOException e) {
27             // will never happen for ByteArrayOutputStream
28         }
29         return new String(baos.toByteArray());
30     }

Reklamy

Using dynamic fonts for international texts in iText

2010-01-06 Komentarze wyłączone

On previous posts I concentrated on how quickly internationalize Web DynPro component. I’ve mention that using appropriate font is crutial when generating PDF outputs. The problem is there’s no single font covering whole range of unicode characters. The web browser deals this problem simultanously using many fonts available in system. The user will have fonts for his native language installed in most cases. The input of wide range of unicode characters in web form will usually cause no problem. The problem will be, when we want to build from that data the PDF document.

In PDF a one font must be declared for generated text. PDF does not support what web browser does: dynamically choosing font to display given text. We must decide during PDF generation, what font to use. However, iText provides specialised class for that job: FontSelector. The class is configured by specifying the list of fonts. The most preferred font goes first. FontSelector then generates a Phrase from given text, built from one or more Chunks. FontSelector will use first font from list, when given character is not included in that font it will start new Chunk with the first font including that character etc. All you need than is the list of fonts, covering wide range of unicode characters. The good choice are free fonts. They are usually not so good-looking as commercial ones, but you can freely embed them in generated PDF.

This sample class ilustrates the usage of FontSelector. I’m displaying text resources from various languages, using various character sets. I’m using self-composed set of free and open fonts.
The output of the class is in following documents:
multilang.pdf
multilang2.pdf






001 package pl.linfo.test.itext;
002 
003 import java.io.File;
004 import java.io.FileOutputStream;
005 import java.io.IOException;
006 import java.io.InputStream;
007 import java.net.URL;
008 import java.util.ArrayList;
009 import java.util.List;
010 import java.util.Locale;
011 import java.util.Properties;
012 import java.util.ResourceBundle;
013 import java.util.Set;
014 
015 import org.apache.commons.lang.StringUtils;
016 
017 import com.lowagie.text.Chunk;
018 import com.lowagie.text.Document;
019 import com.lowagie.text.DocumentException;
020 import com.lowagie.text.Font;
021 import com.lowagie.text.Paragraph;
022 import com.lowagie.text.Phrase;
023 import com.lowagie.text.pdf.BaseFont;
024 import com.lowagie.text.pdf.FontSelector;
025 import com.lowagie.text.pdf.PdfWriter;
026 
027 public class MultilangPdf {
028     
029     private String pdfLocation = null;
030     
031     private List<ResourceBundle> bundles = new ArrayList<ResourceBundle>();
032     
033     private FontSelector fsNormal;
034     
035     private FontSelector fsHeader;
036     
037     private Font titleFont;
038     
039     private void init() throws DocumentException, IOException {
040         BaseFont baseSerif = BaseFont.createFont(
041                 BaseFont.HELVETICA, BaseFont.CP1252, false);
042         titleFont = new Font(baseSerif, 14, Font.BOLD);
043         
044         List<BaseFont> bfList = new ArrayList<BaseFont>();
045         String[] fontPaths = new String[] {
046                 // quite nice font with wide european support
047                 "fonts/free/freefonts/FreeSerif.ttf",
048                 // cyrylic, greek and latin extended support
049                 "fonts/free/oldstandard/OldStandard-Regular.ttf",
050                 // quite wide and nice free set of fonts
051                 "fonts/free/uni/CODE2001.TTF",
052                 // nice readable CJK fonts
053                 "fonts/free/asian/zenhei/wqy-zenhei.ttc",
054                 // this one is also ok
055                 "fonts/free/asian/fireflysung/fireflysung.ttf",
056                 // not so nice CJK, but quite wide
057                 "fonts/free/asian/hanazono.ttf"
058                 // hieroglyphs
059                 "fonts/free/ancient/Aegyptus310.otf",
060                 // large symbols set
061                 "fonts/free/Symbola.otf",
062                 // music notation characters
063                 "fonts/free/Musica.otf",
064                 // very large unicode set, but low quoality
065                 "fonts/free/uni/unifont.ttf"
066         };
067         for (String fontPath : fontPaths) {
068             try {
069                 BaseFont bf = getIdentityFont(fontPath);
070                 if (bf == null) {
071                     System.out.println("Font " + fontPath + " does not exist");
072                 else {
073                     bfList.add(bf);
074                 }
075             catch (Exception e) {
076                 System.out.println("Failure when trying to load font "+fontPath);
077                 e.printStackTrace();
078             }
079         }
080         // prepare fsHeader
081         fsHeader = new FontSelector();
082         for (BaseFont baseFont : bfList) {
083             Font font = new Font(baseFont, 12, Font.BOLD);
084             fsHeader.addFont(font);
085         }
086         // prepare fsNormal
087         fsNormal = new FontSelector();
088         for (BaseFont baseFont : bfList) {
089             Font font = new Font(baseFont, 12, Font.NORMAL);
090             fsNormal.addFont(font);
091         }
092     }
093     
094     private BaseFont getIdentityFont(String paththrows DocumentException, 
095             IOException {
096         URL fontResource = getClass().getClassLoader().getResource(path);
097         if (fontResource == null)
098             return null;
099         String fontPath = fontResource.toExternalForm();
100         if (path.toLowerCase().endsWith(".ttc")) {
101 //            String[] ttcNames = BaseFont.enumerateTTCNames(path);
102             // first entry
103             fontPath = fontPath + ",0";
104         }
105         BaseFont baseFont = BaseFont.createFont(fontPath,
106                 BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
107         baseFont.setSubset(true);
108         return baseFont;
109     }
110     
111     private void doGenerate2() {
112         Document document = new Document();
113         try {
114             // step 2:
115             File file = new File(pdfLocation);
116             file.getParentFile().mkdirs();
117             PdfWriter writer = PdfWriter.getInstance(document,
118                     new FileOutputStream(file));
119             writer.setViewerPreferences(PdfWriter.PageLayoutTwoColumnLeft);
120             document.setMargins(36363654);
121             // step 3:
122             document.open();
123             // take bundle
124             Properties props = new Properties();
125             InputStream input = getClass().getResourceAsStream(
126                 "multilang.varia.properties";
127             props.load(input);
128             Set<Object> keys = props.keySet();
129             for (Object key : keys) {
130                 String value = props.getProperty(key.toString());
131                 document.add(new Paragraph(
132                         new Phrase(key.toString(), titleFont)));
133                 Paragraph para = new Paragraph();
134                 para.add(fsNormal.process(value));
135                 document.add(para);
136             }
137             
138         catch (DocumentException de) {
139             System.err.println(de.getMessage());
140         catch (IOException ioe) {
141             System.err.println(ioe.getMessage());
142         }
143         document.close();
144     }
145 
146     private void doGenerate1() {
147         Document document = new Document();
148         try {
149             // step 2:
150             File file = new File(pdfLocation);
151             file.getParentFile().mkdirs();
152             PdfWriter writer = PdfWriter.getInstance(document,
153                     new FileOutputStream(file));
154             writer.setViewerPreferences(PdfWriter.PageLayoutTwoColumnLeft);
155             document.setMargins(36363654);
156             // step 3:
157             document.open();
158             for (ResourceBundle bundle : bundles) {
159                 Paragraph p = new Paragraph();
160                 String hrText = "Resource bundle for locale " 
161                     + bundle.getLocale().toString();
162                 p.add(new Phrase(new Chunk(hrText, titleFont)));
163                 document.add(p);
164                 p = new Paragraph();
165                 p.add(fsHeader.process(bundle.getString("months")));
166                 p.add(new Phrase(": "));
167                 for (int i=0;i<12;i++) {
168                     String monthName = bundle.getString("months."+i);
169                     p.add(fsNormal.process(monthName));
170                     p.add(new Phrase(" "));
171                 }
172                 document.add(p);
173                 
174                 p = new Paragraph();
175                 p.add(fsNormal.process(bundle.getString("name")));
176                 p.add(new Phrase(" "));
177                 p.add(fsNormal.process(bundle.getString("surname")));
178                 p.add(new Phrase(" "));
179                 p.add(fsNormal.process(bundle.getString("position")));
180                 document.add(p);
181                 
182             }
183             
184         catch (DocumentException de) {
185             System.err.println(de.getMessage());
186         catch (IOException ioe) {
187             System.err.println(ioe.getMessage());
188         }
189         document.close();
190 
191     }
192  
193     /**
194      @param args
195      */
196     public static void main(String[] argsthrows Exception {
197         MultilangPdf app = new MultilangPdf();
198         String[] locales = new String[]{
199                 "en""de""pl""ru""uk""cs""sk""ja""zh-CN"
200                 "ko""hi""ar""hr""sr""bg""in""sl""vi"};
201         for (String localeStr : locales) {
202             String[] localeParts = StringUtils.split(localeStr, "-");
203             String country = "";
204             if (localeParts.length > 1)
205                 country = localeParts[1];
206             Locale locale = new Locale(localeParts[0], country);
207             ResourceBundle bundle = ResourceBundle.getBundle(
208                     "pl.linfo.test.itext.multilang", locale);
209             app.bundles.add(bundle);
210         }
211         app.init();
212         app.pdfLocation = "out/itext/multilang.pdf";
213         app.doGenerate1();
214         app.pdfLocation = "out/itext/multilang2.pdf";
215         app.doGenerate2();
216     }
217 
218 }


Here comes the multilang.varia.properties file, used to generate second test PDF. Some of characters may be invisible according to your browser settings and installed character sets, however you should be able to copy and paste it.


lang_learn_sentence_multilang = 外国語の学習と教授

Language Learning and Teaching

Изучение и обучение иностранных языков

Tere Daaheng Aneng Karimah

語文教學・语文教学

Enseñanza y estudio de idiomas

Изучаване и Преподаване на Чужди Езици

ქართული ენის შესწავლა და სწავლება

‚læŋɡwidʒ ‚lɘr:niŋ ænd ‚ti:tʃiŋ

Lus kawm thaib qhia

Ngôn Ngữ, Sự học,

‭‫ללמוד וללמד את השֵפה

L’enseignement et l’étude des langues

말배우기와 가르치기

Nauka języków obcych

Γλωσσική Εκμὰθηση και Διδασκαλία

‭‫ﺗﺪﺭﯾﺲ ﻭ ﯾﺎﺩﮔﯿﺮﯼ ﺯﺑﺎﻥ

Sprachlernen und -lehren

‭‫ﺗﻌﻠﻢ ﻭﺗﺪﺭﻳﺲ ﺍﻟﻌﺮﺑﻴﺔ

เรียนและสอนภาษา

glagolic = ⰔⰅ ⰅⰔⰕⰠ ⰏⰟⰐⰑⰃⰑⰅⰤⰈⰟⰋⰝⰠⰐⰀ ⰅⰤⰍⰫⰍⰎⰑⰒⰡⰄⰊⰡ ⰐⰀⰓⰋⰜⰀⰅⰏⰀ ⰂⰋⰍⰋⰒⰡⰄⰊⰡ ⰦⰤⰆⰅ ⰍⰟⰆⰠⰄⰑ ⰏⰑⰆⰅⰕⰟ ⰋⰈⰏⰡⰐⰡⰕⰋ . ⰂⰋⰍⰋⰒⰡⰄⰊⰡ ⰒⰠⰔⰀⰐⰀ ⰔⰎⰑⰂⰡⰐⰠⰔⰍⰟⰋⰋⰏⰠ ⰅⰤⰈⰟⰋⰍⰑⰏⰠ ⰐⰀⰝⰅⰤⰕⰀ ⰅⰔⰕⰟ ⰣⰐⰡ 2006 ⰎⰡⰕⰀ .
ⰄⰠⰐⰠⰔⰠ ⰂⰋⰍⰋⰒⰡⰄⰊⰋ 410 ⰝⰎⰡⰐⰟ ⰔⰑⰤⰕⰟ

symbols = ∫ ∬ ∭ ∮ ∯ ∰ ∱ ∲ ∳ ∴ ∵ ∶ ∷ ∸ ∹ ∺
⌠ ⌡ ⌢ ⌣ ⌤ ⌥ ⌦ ⌧ ⌨ ⟨ ⟩ ⌫
♔ ♕ ♖ ♗ ♘ ♙ ♚ ♛ ♜ ♝ ♞ ♟ ♠ ♡ ♢ ♣ ♤ ♥ ♦ ♧ ♨ ♩ ♪ ♫ ♬ ♭ ♮ ♯ ♰
✐ ✑ ✒ ✓ ✔ ✕ ✖ ✗ ✘ ✙ ✚ ✛ ✜ ✝ ✞ ✟ ✠
⟸ ⟹
⥠ ⥡
⨀ ⨁ ⨂
&#55348;&#56671; &#55348;&#56672; &#55348;&#56673; &#55348;&#56674;
&#55348;&#56323; &#55348;&#56324; &#55348;&#56325; &#55348;&#56326; &#55348;&#56327; &#55348;&#56328; &#55348;&#56329;

chinese_ulysses = 俺正和首都警署的老特洛伊在凉亭山街角那儿寒喧呢,该死的,冷不丁儿的来了一名扫烟囱的背时家伙,他那长玩意儿差点儿戳进了俺那眼睛里头去。俺转回脑袋,正打算狠々地教训他一顿,没曾想一眼看见石头斜墻街那儿来了个人,道是谁呢,原来是约・哈因斯。
  ___囉,约,俺说。你怎么样?那个扫烟囱的背时家伙,用他的长把儿刷子差点儿把我的眼睛捅掉。你看见了吗?
  ___煤烟到,运气好,约说。你刚才说话的那个老小子是谁?
  ___老特洛伊呗,俺说,原来是部队的。那家伙又是扫帚又是梯子,把交通都堵塞起来了,俺恨不得把他逮起来。
  ___你到这片儿来干吗?约问。
  ___没有什么屁事,俺说。兵营教堂那边,小鸡胡同口上有一个背时的大个子,不要脸的恶棍__老特洛伊就是给我透了那家伙的一点儿底__要了天主知道多少茶叶和糖,他答应每星期付三先令,说是在唐郡还有个农庄。货主是那边海梯斯堡街附近的一个小矮子,名叫摩西・赫佐格的。
  ___割包皮的吗?#1 约说。
==注1:尤太教男人自幼即割去包皮。==
  ___可不吗,俺说。头上去了一点儿。一个姓吉拉蒂的老管子工。我已经钉了他两个星期,可是一个便士也挤不出来。
  ___你现在就干这勾当?约说。

mahjong = &#55356;&#56336; 126992 &#55356;&#56336; 1F010 MAHJONG TILE ONE OF BAMBOOS
&#55356;&#56337; 126993 &#55356;&#56337; 1F011 MAHJONG TILE TWO OF BAMBOOS
&#55356;&#56338; 126994 &#55356;&#56338; 1F012 MAHJONG TILE THREE OF BAMBOOS
&#55356;&#56339; 126995 &#55356;&#56339; 1F013 MAHJONG TILE FOUR OF BAMBOOS
&#55356;&#56340; 126996 &#55356;&#56340; 1F014 MAHJONG TILE FIVE OF BAMBOOS
&#55356;&#56341; 126997 &#55356;&#56341; 1F015 MAHJONG TILE SIX OF BAMBOOS
&#55356;&#56342; 126998 &#55356;&#56342; 1F016 MAHJONG TILE SEVEN OF BAMBOOS
&#55356;&#56343; 126999 &#55356;&#56343; 1F017 MAHJONG TILE EIGHT OF BAMBOOS
&#55356;&#56344; 127000 &#55356;&#56344; 1F018 MAHJONG TILE NINE OF BAMBOOS

armenian_poem = Աեցեհի իմ լավ ?ւղիե լավարար,
Կյաեբս չտայի կասկածի մհգիե…
Այեպհս կ?ւզհի մհկե իեծ ?ավատր,
Այեպհս կ?ւզհի ?ավատալ մհկիե։

hieroglyph = &#55308;&#56320; 77824 &#55308;&#56320; 13000 EGYPTIAN HIEROGLYPH A001
&#55308;&#56321; 77825 &#55308;&#56321; 13001 EGYPTIAN HIEROGLYPH A002
&#55308;&#56322; 77826 &#55308;&#56322; 13002 EGYPTIAN HIEROGLYPH A003
&#55308;&#56323; 77827 &#55308;&#56323; 13003 EGYPTIAN HIEROGLYPH A004
&#55308;&#56324; 77828 &#55308;&#56324; 13004 EGYPTIAN HIEROGLYPH A005
&#55308;&#56325; 77829 &#55308;&#56325; 13005 EGYPTIAN HIEROGLYPH A005A

hangul_sample = 극지탐험 협회결성 체계적 연구

 지구상의 3대 극지라 불리는 남극·북극·에베레스트를 한번이라
도 다녀와야 정회원으로 들어갈 수 있는 한국극지협회가 발족된다
.윤석순(한·러시아극동협회 상임고문)씨과 홍석하(사람과 산 발
행인)씨가 극지탐험과 이곳에서의 학술연구를 체계적 으로 해보자
는데 뜻을 같이하고 협회결성에 나섰다.
 이 협회는 지난 16일 호텔신라에서 20여명의 준비위원이 참
석한 가운데 준비위원회를 가졌으며 내년 3월 정식 출범한다.
 이 협회에는 3극오지를 모두 밟은 세계적인 산악인 허영호씨를
비롯해 에베레스트를 올랐던 엄홍길·박영석·정승권씨 등 국내의 
저명한 산악인들이 회원으로 참여할 것으로 보인다.
 극지협회는 극지탐험가는 물론 학계·경제계인사들도 참가시킬 계
획이다. 단순한 탐험차원을 넘어 지구상에 마지막 남은 자원의 
보고인 극지에서의 연구활동도 하겠다는 의미다.
 한국극지협회가 발족하면 우리 극지탐험이 체계화될 것으로 기대
되고있다.협회는 극지관련자료를 데이터베이스로 축적해 오지탐험가
들에게 제공할 계획이다.

Source code of class MultilangPdf and properties files can be downloaded from here. The jars containing the fonts are available at my box.net folder (asian-1.0.jar is split to 3 zip parts because it was too big for free box.net account limitations). Special thanks to Alan Wood for his great unicode resources page! Without him I wouldn’t propably be able to find this all great free fonts!

Internationalization of Web DynPro components part 2 : resource bundles and PDF’s

2009-12-30 Komentarze wyłączone

In addition to S2X-based i18n you can use classic java properties-based i18n. In some cases it can be more convenient, f.g. property bundles are created by tools such as eclipse string externationalization.

But you must remember that string externationalization tool was designed for heavy client desktop application – in web systems ResourceBundle.getBundle and similar methods are unadequate – they return server locale which is in most cases unadequate, and not the locale the user is using!

The following code will use the user’s current locale:

public static String getString(String key) {
  // Get the locale of the current session
  Locale sessionLocale = WDResourceHandler.getCurrentSessionLocale();
  IWDResourceHandler resourceHandler = WDResourceHandler
    .createResourceHandler(sessionLocale);
  resourceHandler.loadResourceBundle
    BUNDLE_NAME, 
    WnioskiListMessages.class.getClassLoader() );
  try {
    return resourceHandler.getString(key);
  catch (MissingResourceException e) {
    return '!' + key + '!';
  }
}

You can use this code as replacement for Eclipse-generated code.

Another case to be dealt with is the usage of the localized strings. When they are used to display messages, it’s no problem, because SAP engine is using unicode encoding. However, with PDF it’s not so easy.

I will describe what to do if you use popular iText library. First if you add some text, you provide font. Fonts can be created via BaseFont.createFont. There are a couple of fonts ready to use by PDF engine, which names are defined in BaseFont class. However, when using this fonts, you can’t use IDENTITY-H encoding, which covers unicode character range. And when providing multilanguage version, you can’t expect that user will provide characters from limited range. Event with english version it’s the case. Imagine someone writes info about planned business trip to Russia and wants to provide the name of target city in Russian characters… The best solution would be to analyze each character and dynamically change font used in PDF to those covering the given part of text. Good enough in most cases would be to use font that has good coverage of unicode characters. For asian scripts there would be a great problem to find such font, but I will consider european languages only for my needs.

When you digg into iText tutorials, you’ll find that one that uses BaseFont.createFont using as second character windows’ standard arial unicode font. This font covers very large number of national character sets. The problem is, the target machine for your application need not to be Windows… and you don’t have rights to embed that font in PDF or your application.

But there is a plenty of free fonts available. The font I’ve found good enough is FreeSerif from freefonts package, which can be downloaded from http://savannah.gnu.org/projects/freefont/.

After downloading I’ve made jar file from it, placing all fonts in package named fonts. Then I’ve added it to library component, wrapped it with J2EE Server Library DC and deployed to server. After that I was able to embed and use my fonts using following code:

String fontPath = getClass().getClassLoader()
  .getResource("fonts/FreeSans.ttf")
  .toExternalForm();
BaseFont baseFont = BaseFont.createFont(
  fontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED)