iText pdf library
Website search

Parsing XML and XHTML

There are a lot of questions about HTMLWorker on StackOverflow. Many of these questions remain unanswered as HTMLWorker has been abandoned in favor of XML Worker. HTMLWorker was initially meant as a parser for a small selection of HTML tags. People started using it as if it were a full-blown HTML to PDF converter and then complained because HTMLWorker doesn't support CSS parsing. The HTMLWorker code grew organically up until a point where it was no longer maintainable.

We started another project, called XML Worker. It can be used to convert XHTML to PDF. It's not an URL to PDF converter in the sense that it won't "print your web site to PDF". In HTML, you can encounter content at the end of the file that needs to be added at the start of the document. When this happens, one would expect that the start of the document is the first page. That isn't possible with iText as iText flushes finished pages to the OutputStream as soon as possible and there is no way to return to a previous page to add the extra content.

XML Worker is meant to create simple reports using an easy language such as HTML (and some CSS). It won't resolve ASP pages, nor execute JavaScript. It will only deal with finished XHTML.

Ready to use iText?

Try our iText 7 Library and add-ons FREE for 30 days. Test your proof of concept, and see if our solution is right for you.

Get my FREE trial

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now