pdfHTML: An Introduction

Tue - 03/14/2017

pdfHTML is an iText 7 add-on that will replace our previous XML Worker, from iText 5.

Share this article

See the recording our webinar done on March 22nd for an in depth intro with some of iText's developers.

Introduction

pdfHTML is an iText 7 add-on that will replace our previous XML Worker, from iText 5. This add-on gives you the ability to transform HTML and CSS content into a PDF document. In the iText 5 XML Worker the number of tags and CSS declarations supported were limited to a subset of most the frequently used tags and properties for PDF. There was no support for video, sound, control and animations etc. This focus was as much a choice as it was a necessity: the iText 5 rendering framework limited what tags and properties could be easily implemented. The rendering framework in iText 5 grew organically over the years, it was built with the static page and drawing concept of the PDF specification in mind and not the free-flowing ways of the browsers intended to paint HTML on your screen.

Now, it is time to unveil pdfHTML, the newer, sleeker XML Worker, built on the iText 7 platform. iText 7 is a rewrite from the ground-up, intended to provide the same functionality as iText 5 but with a more robust and easily extendable foundation. The renderer-framework in particular, received some attention, and now functions by building a tree of renderers and their child-renderers, and traversing it bottom-up to perform the final lay-out operations. This approach is much better suited to dealing with HTML to PDF conversion than the old model in iText 5, and allows us to easily implement future support for tags and CSS properties.

Like XMLWorker, pdfHTML is designed to work straight out of the box, no customization or configuration necessary. Just feed the HTML and any resource into the convertor and the PDF rolls out, as seen in figure 1 and the code below:

ConverterProperties converterProperties = new ConverterProperties().setBaseUri(resoureLocation);
HtmlConverter.convertToPdf(new FileInputStream(HTMLSource), new FileOutputStream(pdfDestination), converterProperties);

Figure 1: Html2pdf high-level flow

Figure 1: pdfHTML high level flow

To illustrate, this input HTML and Css will result in this output. Like XML Worker, it is still possible for the user to define their own, custom way for parsing tags and css, but that is a more advanced topic for a different blogpost.

pdfHTML Workflow

pdfHTML works in 2 phases. In the first phase, the HTML is parsed into an internal document format. In addition to creating an hierarchical tree-structure, this step also takes care of commonly found malformed HTML, adding missing closing tags and other common mistakes. The css-sheets are parsed into an internal style-sheet as well.

In the second phase, pdfHTML walks through the tree in a depth-first manner. Each tag is processed in 2 steps: A first processing step when the tag is encountered for the first time in the walk, and a final step after all of its children have been processed. How each tag is processed depends on which TagWorker class and CssApplier class it is mapped to. During the processing the TagWorker creates an iText layout object, then processes every child, applies css using the CssApplier instance and finally returns the lay-out object to its parent. The process is visualized in figure 2 and figure 3.

Figure 2: Html2pdf internal flow
Figure 2: pdfHTML internal flow

Figure 3: Html2pdf tag processing flow
Figure 3: pdfHTML tag processing

Short Comparison to XML Worker

When comparing pdfHTML to our old solution XML Worker, pdfHTML can be seen as XML Worker++ since it has a much better support for a wider array of HTML tags and css properties. The foundation of iText 7 is also better suited to any updates, changes and patches down the road. While there will not be a lot of difference in output when processing simple HTML and css, XML Worker quickly stumbles when confronted with more complex structure or certain css properties. Examples where pdfHTML offers more than XML Worker: Borders on non-table elements, quality on table elements, handling of Arabic and other right-to-left scripts, support of responsive design through media queries, and more.

In conclusion

iText 7 add-on pdfHTML is the successor to the old iText 5 XML Worker. Like its predecessor, it provides you with a powerful tool for transforming HTML and accompanying CSS into beautiful PDFs using a broad spectrum of css properties, even more than XML Worker, while maintaining the ease of use and extensibility of the former.

Interested in trying it yourself? Download our Free Trial and get started today.

iText 7 pdfHTML

Share this article

Popular tags

Digital signatures

iText 7

iText

iText Suite release

PDF 2.0

HTML to PDF

Fonts

CSS

Comparing the iText 5 Chunk and iText 8 Text Classes

Wed - 10/11/2023

Though it was superseded in 2016 by iText 7, iText 5 is still an extremely popular PDF library. While its ease of use and features are still competitive, this article looks at its high-level text and content capabilities to compare them against the revised API in iText 7 and 8. You'll learn why iText 8 is a far superior option, especially for new implementations.

Read full article

iText 7 Suite: version 7.1.14 released

Wed - 01/13/2021

In our first release for 2021, a number of new features and improvements are included, such as extended SVG support for Core and pdfHTML, more PDF form border styles and revised OpenType Font processing.

Read full article

iText Suite 7.1.13 released

Mon - 10/26/2020

In our final release for 2020, we’ve a number of new features and improvements, including improved Core word wrapping for special scripts, full support for CSS background properties in pdfHTML, and much more!

Read full article

Ready to use iText?

As always, if you have any technical questions, you can contact support with your valid support subscription or head over to one of our community support pages on Stack Overflow to see if your question has already been answered for our open source AGPL users.

Get started with iText Refer to API Documentation

Contact

Still have questions?

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now

pdfHTML: An Introduction

Share this article

Introduction

pdfHTML Workflow

Short Comparison to XML Worker

In conclusion

Expand sidebar

Related content

Comparing the iText 5 Chunk and iText 8 Text Classes

iText 7 Suite: version 7.1.14 released

iText Suite 7.1.13 released

Ready to use iText?

Still have questions?