Displaying text in different languages in a single PDF document

How can I display text in different languages in a single PDF?

Earlier versions of the iText PDF library were already able to render Chinese, Japanese and Korean glyphs in PDF documents, but to correctly display right-to-left scripts like Hebrew and Arabic, we needed the information provided by OpenType fonts to help with handling the complexities of all the world's writing systems. So, for iText 7 we went back to the drawing board to provide OpenType support for advanced font features in PDF documents.

However, we decided to go a step further and created pdfCalligraph, a commercially licensed add-on module for the iText 7 library which was specifically designed to support many more languages and writing systems, and like iText 7 it's available for both the Java and .NET (C#) platforms. For detailed information about the inherent difficulties of supporting multiple languages and writing systems in the PDF standard, and the powerful and unique solutions pdfCalligraph provides, we recommend reading the pdfCalligraph white paper.

In this article, we’ll demonstrate how you can use pdfCalligraph with iText to create a PDF containing text using different languages. But first, here’s a short explanation of how pdfCalligraph works.

How does pdfCalligraph integrate with iText 7?

The iText layout module will automatically look for pdfCalligraph in its dependencies if text if a language or writing system that requires it is encountered by the Renderer Framework. For example, when iText encounters text that contain Indic texts, or a script that's written from right to left, iText checks if pdfCalligraph is available and will then use its functionality to provide the correct glyph shapes to write to the PDF file. However, as the typography logic is complex and can be resource-heavy even for documents that don’t require this functionality, iText won't attempt any advanced shaping operations if the pdfCalligraph module has not been loaded as a binary dependency.

pdfCalligraph features

Automatic detection of writing systems
Wider language support
Right-to-left support
Ligatures
Kerning
Glyph substitution
Available for Java and .NET (C#)

Using pdfCalligraph

To use pdfCalligraph you simply load the correct binaries into your project, make sure your valid license file is loaded, and iText 7 will automatically use the pdfCalligraph code when it is required by a document. If you don't have a commercial license for pdfCalligraph, you can get a free trial of the iText 7 Suite which includes the iText 7 Core library, plus all the add-ons.

Usage example

For this example, we'll demonstrate using pdfCalligraph to correctly render text in different languages. First, let’s start with a simple English sentence, which we've translated into three different languages using Google Translate.

This is an example sentence.

Let's see how that looks in Arabic:

.هذه هي الجملة المستخدمة في المثال

Now let’s see how it looks in Hindi:

यह एक उदाहरण वाक्य है।

And finally in Tamil:

இது ஒரு எடுத்துக்காட்டு வாக்கியம்.

Those of you familiar with any of the languages in question may notice the translations are not perfect, but they will be sufficient for our purposes here.

Now we’ll save each piece of text as separate XML files, english.xml, arabic.xml, hindi.xml and tamil.xml. Note that because Arabic is written right-to-left, in order for pdfCalligraph to display the Arabic text starting from the right side of the PDF you’ll need to specify this in the XML. However, the languages will be detected and handled by pdfCalligraph automatically.

To render the text correctly in our PDF, we’ll be making use of the Google Noto fonts which you can download using the link. You may have noticed that sometimes when text is rendered by a computer, certain characters are displayed as little boxes. This indicates your device doesn’t have a font that's able display the text, so unrecognized characters are rendered as these boxes (or “tofu”).

The Noto fonts are Google’s answer to tofu and the name “noto” was chosen to convey the idea that Google’s goal is to see “no more tofu”. The Noto fonts are free to use and have multiple styles and weights available. The Noto font family is comprised of over 100 individual fonts which have been designed to cover all the scripts encoded in the Unicode standard with a harmonious look and feel.

We’ll be using the NotoNaskhArabic-Regular and NotoSansTamil-Regular fonts to render our Arabic, Hindi and Tamil texts as they are intended to appear.

In the following code, we take the text from all four source files and display them as separate paragraphs in a single PDF document. You can click the button in the top-right of the code window to switch between Java and .NET (C#) code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
final String[] sources = {"english.xml", "arabic.xml", "hindi.xml", "tamil.xml"};
        final PdfWriter writer = new PdfWriter(DEST);
        final PdfDocument pdfDocument = new PdfDocument(writer);
        final Document document = new Document(pdfDocument);
        final FontSet set = new FontSet();
        set.addFont("fonts/NotoNaskhArabic-Regular.ttf");
        set.addFont("fonts/NotoSansTamil-Regular.ttf");
        set.addFont("fonts/FreeSans.ttf");
        document.setFontProvider(new FontProvider(set));
        document.setProperty(Property.FONT, new String[]{"MyFontFamilyName"});
        for (final String source : sources) {
            final File xmlFile = new File(source);
            final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            final DocumentBuilder builder = factory.newDocumentBuilder();
            final org.w3c.dom.Document doc = builder.parse(xmlFile);
            final Node element = doc.getElementsByTagName("text").item(0);
            final Paragraph paragraph = new Paragraph();
            final Node textDirectionElement = element.getAttributes().getNamedItem("direction");
            boolean rtl = textDirectionElement != null && textDirectionElement.getTextContent()
                    .equalsIgnoreCase("rtl");
            if (rtl) {
                paragraph.setTextAlignment(TextAlignment.RIGHT);
            }
            paragraph.add(element.getTextContent());
            document.add(paragraph);
        }
        document.close();
        pdfDocument.close();
        writer.close();

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
string[] sources = new string[] { "english.xml", "arabic.xml", "hindi.xml", "tamil.xml" };
            PdfWriter writer = new PdfWriter(DEST);
            PdfDocument pdfDocument = new PdfDocument(writer);
            Document document = new Document(pdfDocument);
            FontSet set = new FontSet();
            set.AddFont("NotoNaskhArabic-Regular.ttf");
            set.AddFont("NotoSansTamil-Regular.ttf");
            set.AddFont("FreeSans.ttf");
            document.SetFontProvider(new FontProvider(set));
            document.SetProperty(Property.FONT, new String[] { "MyFontFamilyName" });
            foreach (string source in sources)
            {
                XmlDocument doc = new XmlDocument();
                var stream = new FileStream(source, FileMode.Open);
                doc.Load(stream);
                XmlNode element = doc.GetElementsByTagName("text").Item(0);
                Paragraph paragraph = new Paragraph();
                XmlNode textDirectionElement = element.Attributes.GetNamedItem("direction");
                Boolean rtl = textDirectionElement != null && textDirectionElement.InnerText.Equals("rtl");
                if (rtl)
                {
                    paragraph.SetTextAlignment(TextAlignment.RIGHT);
                }
                paragraph.Add(element.InnerText);
                document.Add(paragraph);
            }
            document.Close();

This will create a PDF containing the following text in the four specified languages, as illustrated below:

An image displaying four different languages in a single PDF — Our example PDF showing the four different languages.

Results

You can see our example PDF here.

Resources

Example language files

Supported languages and scripts

This table shows the additional languages and scripts support enabled by pdfCalligraph, as well as the ones natively supported in iText 7:

Language	Script	Module
Arabic, Persian, Kurdish, Azerbaijani, Sindhi, Pashto, Lurish, Urdu, Mandinka, Punjabi and others	ARABIC	pdfCalligraph
Hebrew, Yiddish, Judaeo-Spanish, and Judeo-Arabic	HEBREW	pdfCalligraph
Bengali	BENGALI	pdfCalligraph
Hindi, Sanskrit, Pali, Awadhi, Bhojpuri, Braj Bhasha, Chhattisgarhi, Haryanvi, Magahi, Nagpuri, Rajasthani, Bhili, Dogri, Marathi, Nepali, Maithili, Kashmiri, Konkani, Sindhi, Bodo, Nepalbhasa, Mundari and Santali	DEVANAGARI, NAGARI	pdfCalligraph
Gujarati and Kutchi	GUJARATI	pdfCalligraph
Punjabi	GURMUKHI	pdfCalligraph
Kannada, Konkani and others	KANNADA	pdfCalligraph
Khmer (Cambodia)	KHMER	pdfCalligraph
Malayalam	MALAYALAM	pdfCalligraph
Odia	ORIYA	pdfCalligraph
Tamil	TAMIL	pdfCalligraph
Telugu (Dravidian language)	TELUGU	pdfCalligraph
Thai	THAI	pdfCalligraph
Chinese		Core
Japanese		Core
Korean		Core
	WESTERN	Core
Russian, Ukrainian, Belarussian, Bulgarian and others	CYRILLIC	Core
Greek	GREEK	Core
Armenian	ARMENIAN	Core
Georgian	GEORGIAN	Core

Displaying text in different languages in a single PDF document

Share this article

How can I display text in different languages in a single PDF?

How does pdfCalligraph integrate with iText 7?

pdfCalligraph features

Using pdfCalligraph

Usage example

This is an example sentence.

.هذه هي الجملة المستخدمة في المثال

यह एक उदाहरण वाक्य है।

இது ஒரு எடுத்துக்காட்டு வாக்கியம்.

Results

Resources

Supported languages and scripts

Share this article

Category

Popular tags

Ready to use iText?

Still have questions?

Displaying text in different languages in a single PDF document

Share this article

How can I display text in different languages in a single PDF?

How does pdfCalligraph integrate with iText 7?

pdfCalligraph features

Using pdfCalligraph

Usage example

This is an example sentence.

.هذه هي الجملة المستخدمة في المثال

यह एक उदाहरण वाक्य है।

இது ஒரு எடுத்துக்காட்டு வாக்கியம்.

Results

Resources

Supported languages and scripts

Expand sidebar

Ready to use iText?

Still have questions?