How to set RTL direction for Hebrew when converting HTML to PDF?

Is it possible to convert Hebrew HTML to PDF?

5th November 2015
admin-marketing

I'm trying to convert an HTML file with Hebrew characters (UTF-8) to PDF by using iText, but I'm getting all letters in reverse order. As far I understand, I can set RTL only for ColumnText and PdfCell objects. So here's my doubt: is it possible to convert Hebrew HTML to PDF? This is my HTML:

 

/span>
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
html xmlns="http://www.w3.org/1999/xhtml">
head>
  title>Title of document/title>
/head>
body style="font-size:12.0pt; font-family:Arial">
  שלום עולם
/body>
/html>
When I convert this HTML to PDF using XML Worker, I get this result:

Wrong order

These is "Hello World" in Hebrew written from left to right. It should be written from right to left.

Posted on StackOverflow on Jun 15, 2015 by Anatoly

Please take a look at the ParseHtml10 example. In this example, we have take the file hebrew.html:

  1. </p>
  2.  
  3. <p>Hebrew text</p>
  4.  
  5. <div dir="rtl">שלום עולם</div>
  6.  
  7. <p>

And we convert it to PDF using this code:

public void createPdf(String file) throws IOException, DocumentException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter writer =
        PdfWriter.getInstance(document, new FileOutputStream(file));
    // step 3
    document.open();
    // step 4
    // Styles
    CSSResolver cssResolver = new StyleAttrCSSResolver();
    XMLWorkerFontProvider fontProvider =
        new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
    fontProvider.register("resources/fonts/NotoSansHebrew-Regular.ttf");
    CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
    HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
    htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

    // Pipelines
    PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
    HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
    CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

    // XML Worker
    XMLWorker worker = new XMLWorker(css, true);
    XMLParser p = new XMLParser(worker);
    p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));;
    // step 5
    document.close();
}

The result looks like hebrew.pdf:

Text from right to left

Text from right to left

What are the hurdles you need to take?

  • You need to wrap your text in an element such as a

    1. </p>
    2.  
    3.     <div>
    or a  .
  • You need to add the attribute dir="rtl" to define the direction.

  • You need to make sure that you're using a font that knows how to display Hebrew. I used a NOTO font for Hebrew. This is one of the fonts distributed by Google in their program to provide fonts for every possible language.

Important: this solution requires at least iText and XML Worker 5.5.5, because support for the dir attribute was introduced in 5.5.4 and improved in 5.5.5.



Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now