Displaying text in different languages in a single PDF document

Tue - 10/08/2019

We show how you can use iText 7 with pdfCalligraph to correctly display different languages in a single PDF, including right-to-left scripts like Hebrew and Arabic.

Share this article

Different languages in one PDF header image

How can I display text in different languages in a single PDF?

Earlier versions of the iText PDF library were already able to render Chinese, Japanese and Korean glyphs in PDF documents, but to correctly display right-to-left scripts like Hebrew and Arabic, we needed the information provided by OpenType fonts to help with handling the complexities of all the world's writing systems. So, for iText 7 we went back to the drawing board to provide OpenType support for advanced font features in PDF documents.

However, we decided to go a step further and created pdfCalligraph, a commercially licensed add-on module for the iText 7 library which was specifically designed to support many more languages and writing systems, and like iText 7 it's available for both the Java and .NET (C#) platforms. For detailed information about the inherent difficulties of supporting multiple languages and writing systems in the PDF standard, and the powerful and unique solutions pdfCalligraph provides, we recommend reading the pdfCalligraph white paper.

In this article, we’ll demonstrate how you can use pdfCalligraph with iText to create a PDF containing text using different languages. But first, here’s a short explanation of how pdfCalligraph works.

How does pdfCalligraph integrate with iText 7?

The iText layout module will automatically look for pdfCalligraph in its dependencies if text if a language or writing system that requires it is encountered by the Renderer Framework. For example, when iText encounters text that contain Indic texts, or a script that's written from right to left, iText checks if pdfCalligraph is available and will then use its functionality to provide the correct glyph shapes to write to the PDF file. However, as the typography logic is complex and can be resource-heavy even for documents that don’t require this functionality, iText won't attempt any advanced shaping operations if the pdfCalligraph module has not been loaded as a binary dependency.

pdfCalligraph features

  • Automatic detection of writing systems
  • Wider language support
  • Right-to-left support
  • Ligatures
  • Kerning
  • Glyph substitution
  • Available for Java and .NET (C#)

Using pdfCalligraph

To use pdfCalligraph you simply load the correct binaries into your project, make sure your valid license file is loaded, and iText 7 will automatically use the pdfCalligraph code when it is required by a document. If you don't have a commercial license for pdfCalligraph, you can get a free trial of the iText 7 Suite which includes the iText 7 Core library, plus all the add-ons.

Usage example

For this example, we'll demonstrate using pdfCalligraph to correctly render text in different languages. First, let’s start with a simple English sentence, which we've translated into three different languages using Google Translate.

This is an example sentence.

Let's see how that looks in Arabic:

.هذه هي الجملة المستخدمة في المثال

Now let’s see how it looks in Hindi:

यह एक उदाहरण वाक्य है।

And finally in Tamil:

இது ஒரு எடுத்துக்காட்டு வாக்கியம்.

Those of you familiar with any of the languages in question may notice the translations are not perfect, but they will be sufficient for our purposes here.

Now we’ll save each piece of text as separate XML files, english.xml, arabic.xmlhindi.xml and tamil.xml. Note that because Arabic is written right-to-left, in order for pdfCalligraph to display the Arabic text starting from the right side of the PDF you’ll need to specify this in the XML. However, the languages will be detected and handled by pdfCalligraph automatically.

To render the text correctly in our PDF, we’ll be making use of the Google Noto fonts which you can download using the link. You may have noticed that sometimes when text is rendered by a computer, certain characters are displayed as little boxes. This indicates your device doesn’t have a font that's able display the text, so unrecognized characters are rendered as these boxes (or “tofu”).

The Noto fonts are Google’s answer to tofu and the name “noto” was chosen to convey the idea that Google’s goal is to see “no more tofu”. The Noto fonts are free to use and have multiple styles and weights available. The Noto font family is comprised of over 100 individual fonts which have been designed to cover all the scripts encoded in the Unicode standard with a harmonious look and feel.

We’ll be using the NotoNaskhArabic-Regular and NotoSansTamil-Regular fonts to render our Arabic, Hindi and Tamil texts as they are intended to appear.

In the following code, we take the text from all four source files and display them as separate paragraphs in a single PDF document. You can click the button in the top-right of the code window to switch between Java and .NET (C#) code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
final String[] sources = {"english.xml", "arabic.xml", "hindi.xml", "tamil.xml"};
        final PdfWriter writer = new PdfWriter(DEST);
        final PdfDocument pdfDocument = new PdfDocument(writer);
        final Document document = new Document(pdfDocument);
        final FontSet set = new FontSet();
        set.addFont("fonts/NotoNaskhArabic-Regular.ttf");
        set.addFont("fonts/NotoSansTamil-Regular.ttf");
        set.addFont("fonts/FreeSans.ttf");
        document.setFontProvider(new FontProvider(set));
        document.setProperty(Property.FONT, new String[]{"MyFontFamilyName"});
        for (final String source : sources) {
            final File xmlFile = new File(source);
            final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            final DocumentBuilder builder = factory.newDocumentBuilder();
            final org.w3c.dom.Document doc = builder.parse(xmlFile);
            final Node element = doc.getElementsByTagName("text").item(0);
            final Paragraph paragraph = new Paragraph();
            final Node textDirectionElement = element.getAttributes().getNamedItem("direction");
            boolean rtl = textDirectionElement != null && textDirectionElement.getTextContent()
                    .equalsIgnoreCase("rtl");
            if (rtl) {
                paragraph.setTextAlignment(TextAlignment.RIGHT);
            }
            paragraph.add(element.getTextContent());
            document.add(paragraph);
        }
        document.close();
        pdfDocument.close();
        writer.close();
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
string[] sources = new string[] { "english.xml", "arabic.xml", "hindi.xml", "tamil.xml" };
            PdfWriter writer = new PdfWriter(DEST);
            PdfDocument pdfDocument = new PdfDocument(writer);
            Document document = new Document(pdfDocument);
            FontSet set = new FontSet();
            set.AddFont("NotoNaskhArabic-Regular.ttf");
            set.AddFont("NotoSansTamil-Regular.ttf");
            set.AddFont("FreeSans.ttf");
            document.SetFontProvider(new FontProvider(set));
            document.SetProperty(Property.FONT, new String[] { "MyFontFamilyName" });
            foreach (string source in sources)
            {
                XmlDocument doc = new XmlDocument();
                var stream = new FileStream(source, FileMode.Open);
                doc.Load(stream);
                XmlNode element = doc.GetElementsByTagName("text").Item(0);
                Paragraph paragraph = new Paragraph();
                XmlNode textDirectionElement = element.Attributes.GetNamedItem("direction");
                Boolean rtl = textDirectionElement != null && textDirectionElement.InnerText.Equals("rtl");
                if (rtl)
                {
                    paragraph.SetTextAlignment(TextAlignment.RIGHT);
                }
                paragraph.Add(element.InnerText);
                document.Add(paragraph);
            }
            document.Close();

 

This will create a PDF containing the following text in the four specified languages, as illustrated below:

An image displaying four different languages in a single PDF
Our example PDF showing the four different languages.

Results

You can see our example PDF here.

Resources

Example language files

Supported languages and scripts

This table shows the additional languages and scripts support enabled by pdfCalligraph, as well as the ones natively supported in iText 7:

Language Script Module
Arabic, Persian, Kurdish, Azerbaijani, Sindhi, Pashto, Lurish, Urdu, Mandinka, Punjabi and others ARABIC pdfCalligraph
Hebrew, Yiddish, Judaeo-Spanish, and Judeo-Arabic HEBREW pdfCalligraph
Bengali BENGALI pdfCalligraph
Hindi, Sanskrit, Pali, Awadhi, Bhojpuri, Braj Bhasha, Chhattisgarhi, Haryanvi, Magahi, Nagpuri, Rajasthani, Bhili, Dogri, Marathi, Nepali, Maithili, Kashmiri, Konkani, Sindhi, Bodo, Nepalbhasa, Mundari and Santali DEVANAGARI, NAGARI pdfCalligraph
Gujarati and Kutchi GUJARATI pdfCalligraph
Punjabi GURMUKHI pdfCalligraph
Kannada, Konkani and others KANNADA pdfCalligraph
Khmer (Cambodia) KHMER pdfCalligraph
Malayalam MALAYALAM pdfCalligraph
Odia ORIYA pdfCalligraph
Tamil TAMIL pdfCalligraph
Telugu (Dravidian language) TELUGU pdfCalligraph
Thai THAI pdfCalligraph
Chinese   Core
Japanese   Core
Korean   Core
  WESTERN Core
Russian, Ukrainian, Belarussian, Bulgarian and others CYRILLIC Core
Greek GREEK Core
Armenian ARMENIAN Core
Georgian GEORGIAN Core

 



Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now