Does my HTML have to be valid XML?

Similar questions were posted on Stack Overflow, for instance on Oct 30 '14 by Kannu Verma

If you are still using iText 5 and XML Worker, you have to provide XHTML. For instance: a single  <br> wasn't allowed in your HTML; you needed to have a  <br /> . All tags needed to be closed. Nesting of tags needed to be done correctly. To solve this problem when confronted with incomplete HTML syntax, we advised the use of jsoup to tidy up the HTML before converting it to PDF with XML Worker.

This is no longer necessary with pdfHTML. We have integrated jsoup into the pdfHTML add-on, so that you don't need to call it separately. All HTMLs are cleaned up before converting them to PDF. Take for example the incomplete.html HTML file:

<head><title>Test incomplete HTML</title></head>
<p>Hello World
<p>Hello Universe
<img src="img/logo.png" alt="iText logo">


It doesn't have any <body> tags, the <h1>, <p>, <br>, and <img> tags are never closed. This is a mighty incomplete HTML file, but a browser renders it anyway, and so does pdfHTML.


Incomplete HTML rendered in a browser and as PDF

Incomplete HTML rendered in a browser and as PDF

You can try this for yourself by running the C07E07_IncompleteHTML example.

Ready to use iText?

Try our iText 7 Library and add-ons FREE for 30 days. Test your proof of concept, and see if our solution is right for you.

Get my FREE trial

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now